llmstxt-architect
AI-powered llms.txt generation with LLM
Overview
Section titled “Overview”llmstxt-architect is a Python tool that uses language models (Claude, GPT, Ollama) to automatically generate quality descriptions in llms.txt.
Repository: github.com/rlancemartin/llmstxt_architect
Advantages
Section titled “Advantages”Unlike simple meta-description extraction, llmstxt-architect:
- Analyzes content — reads page contents
- Generates descriptions — uses LLM to create meaningful descriptions
- Explains relevance — describes when and why to read the page
Installation
Section titled “Installation”# Quick run via uvxcurl -LsSf https://astral.sh/uv/install.sh | shuvx --from llmstxt-architect llmstxt-architect --help
# Or install via pippip install llmstxt-architectBasic Usage
Section titled “Basic Usage”With Anthropic Claude
Section titled “With Anthropic Claude”export ANTHROPIC_API_KEY=sk-ant-...
llmstxt-architect \ --urls https://docs.example.com \ --max-depth 2 \ --llm-name claude-3-5-sonnet-latest \ --llm-provider anthropic \ --project-dir outputWith OpenAI
Section titled “With OpenAI”export OPENAI_API_KEY=sk-...
llmstxt-architect \ --urls https://docs.example.com \ --max-depth 2 \ --llm-name gpt-4o \ --llm-provider openai \ --project-dir outputWith Local Model (Ollama)
Section titled “With Local Model (Ollama)”# Start Ollamaollama serve
llmstxt-architect \ --urls https://docs.example.com \ --max-depth 1 \ --llm-name llama3.2:latest \ --llm-provider ollama \ --project-dir outputParameters
Section titled “Parameters”| Parameter | Description | Default |
|---|---|---|
--urls | URL to process | Required |
--max-depth | Crawl depth (1-5) | 5 |
--llm-name | Model name | claude-3-sonnet |
--llm-provider | Provider (anthropic, openai, ollama) | anthropic |
--project-dir | Results directory | llms_txt |
--output-file | Output file name | llms.txt |
--blacklist-file | File with URLs to exclude | - |
--extractor | Extraction method (default, bs4) | default |
Updating Existing llms.txt
Section titled “Updating Existing llms.txt”Update Descriptions Only
Section titled “Update Descriptions Only”llmstxt-architect \ --existing-llms-file https://example.com/llms.txt \ --update-descriptions-only \ --llm-name claude-3-5-sonnet-latest \ --llm-provider anthropic \ --project-dir updatedPreserves structure and sections, updates only link descriptions.
From Local File
Section titled “From Local File”llmstxt-architect \ --existing-llms-file ./llms.txt \ --update-descriptions-only \ --llm-provider ollama \ --llm-name llama3.2 \ --project-dir updatedExcluding URLs
Section titled “Excluding URLs”Create a blacklist.txt file:
# Deprecated pageshttps://example.com/old-api/https://example.com/v1/
# Not relevanthttps://example.com/jobs/https://example.com/legal/Run:
llmstxt-architect \ --urls https://example.com \ --blacklist-file blacklist.txt \ --project-dir outputPython API
Section titled “Python API”import asynciofrom llmstxt_architect.main import generate_llms_txt
async def main(): await generate_llms_txt( urls=["https://docs.example.com"], max_depth=2, llm_name="claude-3-5-sonnet-latest", llm_provider="anthropic", project_dir="output", )
asyncio.run(main())Custom Extractor
Section titled “Custom Extractor”from bs4 import BeautifulSoup
def custom_extractor(html: str) -> str: soup = BeautifulSoup(html, 'html.parser')
# Remove navigation and footer for tag in soup.find_all(['nav', 'footer', 'aside']): tag.decompose()
# Extract only main content main = soup.find('main') or soup.find('article') if main: return main.get_text(separator='\n', strip=True)
return soup.get_text(separator='\n', strip=True)
await generate_llms_txt( urls=["https://docs.example.com"], extractor=custom_extractor, # ...)Output File Structure
Section titled “Output File Structure”project-dir/├── llms.txt # Final file├── summaries/│ ├── summarized_urls.json # Progress checkpoint│ ├── docs_example_com_page1.txt│ ├── docs_example_com_page2.txt│ └── ... # Individual summariesGeneration Prompt
Section titled “Generation Prompt”By default, LLM receives this prompt:
Create a concise summary for this documentation page.The summary should explain:1. When should an LLM read this page?2. What key topics are covered?
Keep the summary to 1-2 sentences maximum.Custom Prompt
Section titled “Custom Prompt”llmstxt-architect \ --urls https://example.com \ --summary-prompt "Summarize this API documentation. Focus on endpoints and use cases." \ --project-dir outputCheckpoints and Resumption
Section titled “Checkpoints and Resumption”The tool saves progress every 5 documents:
# If the process was interrupted, just run againllmstxt-architect --urls https://example.com --project-dir output
# Processed URLs will be skippedMCP Integration
Section titled “MCP Integration”Use the generated llms.txt with MCP server:
{ "mcpServers": { "docs": { "command": "uvx", "args": [ "--from", "mcpdoc", "mcpdoc", "--urls", "Docs:./output/llms.txt", "--transport", "stdio" ] } }}Example Results
Section titled “Example Results”Projects using llmstxt-architect:
Comparison with llmstxt CLI
Section titled “Comparison with llmstxt CLI”| Aspect | llmstxt CLI | llmstxt-architect |
|---|---|---|
| Description source | Meta tags | AI generation |
| Description quality | Depends on site | High |
| Speed | Fast | Slower (API calls) |
| Cost | Free | API payment |
| Analysis depth | Shallow | Content analysis |