llmstxt-architect

AI-powered llms.txt generation with LLM

Overview

llmstxt-architect is a Python tool that uses language models (Claude, GPT, Ollama) to automatically generate quality descriptions in llms.txt.

Repository: github.com/rlancemartin/llmstxt_architect

Advantages

Unlike simple meta-description extraction, llmstxt-architect:

Analyzes content — reads page contents
Generates descriptions — uses LLM to create meaningful descriptions
Explains relevance — describes when and why to read the page

Installation

# Quick run via uvx
curl -LsSf https://astral.sh/uv/install.sh | sh
uvx --from llmstxt-architect llmstxt-architect --help

# Or install via pip
pip install llmstxt-architect

Basic Usage

With Anthropic Claude

export ANTHROPIC_API_KEY=sk-ant-...

llmstxt-architect \
  --urls https://docs.example.com \
  --max-depth 2 \
  --llm-name claude-3-5-sonnet-latest \
  --llm-provider anthropic \
  --project-dir output

With OpenAI

export OPENAI_API_KEY=sk-...

llmstxt-architect \
  --urls https://docs.example.com \
  --max-depth 2 \
  --llm-name gpt-4o \
  --llm-provider openai \
  --project-dir output

With Local Model (Ollama)

# Start Ollama
ollama serve

llmstxt-architect \
  --urls https://docs.example.com \
  --max-depth 1 \
  --llm-name llama3.2:latest \
  --llm-provider ollama \
  --project-dir output

Parameters

Parameter	Description	Default
`--urls`	URL to process	Required
`--max-depth`	Crawl depth (1-5)	5
`--llm-name`	Model name	claude-3-sonnet
`--llm-provider`	Provider (anthropic, openai, ollama)	anthropic
`--project-dir`	Results directory	llms_txt
`--output-file`	Output file name	llms.txt
`--blacklist-file`	File with URLs to exclude	-
`--extractor`	Extraction method (default, bs4)	default

Updating Existing llms.txt

Update Descriptions Only

llmstxt-architect \
  --existing-llms-file https://example.com/llms.txt \
  --update-descriptions-only \
  --llm-name claude-3-5-sonnet-latest \
  --llm-provider anthropic \
  --project-dir updated

Preserves structure and sections, updates only link descriptions.

From Local File

llmstxt-architect \
  --existing-llms-file ./llms.txt \
  --update-descriptions-only \
  --llm-provider ollama \
  --llm-name llama3.2 \
  --project-dir updated

Excluding URLs

Create a blacklist.txt file:

# Deprecated pages
https://example.com/old-api/
https://example.com/v1/

# Not relevant
https://example.com/jobs/
https://example.com/legal/

Run:

llmstxt-architect \
  --urls https://example.com \
  --blacklist-file blacklist.txt \
  --project-dir output

Python API

import asyncio
from llmstxt_architect.main import generate_llms_txt

async def main():
    await generate_llms_txt(
        urls=["https://docs.example.com"],
        max_depth=2,
        llm_name="claude-3-5-sonnet-latest",
        llm_provider="anthropic",
        project_dir="output",
    )

asyncio.run(main())

Custom Extractor

from bs4 import BeautifulSoup

def custom_extractor(html: str) -> str:
    soup = BeautifulSoup(html, 'html.parser')

    # Remove navigation and footer
    for tag in soup.find_all(['nav', 'footer', 'aside']):
        tag.decompose()

    # Extract only main content
    main = soup.find('main') or soup.find('article')
    if main:
        return main.get_text(separator='\n', strip=True)

    return soup.get_text(separator='\n', strip=True)

await generate_llms_txt(
    urls=["https://docs.example.com"],
    extractor=custom_extractor,
    # ...
)

Output File Structure

project-dir/
├── llms.txt                    # Final file
├── summaries/
│   ├── summarized_urls.json    # Progress checkpoint
│   ├── docs_example_com_page1.txt
│   ├── docs_example_com_page2.txt
│   └── ...                     # Individual summaries

Generation Prompt

By default, LLM receives this prompt:

Create a concise summary for this documentation page.
The summary should explain:
1. When should an LLM read this page?
2. What key topics are covered?

Keep the summary to 1-2 sentences maximum.

Custom Prompt

llmstxt-architect \
  --urls https://example.com \
  --summary-prompt "Summarize this API documentation. Focus on endpoints and use cases." \
  --project-dir output

Checkpoints and Resumption

The tool saves progress every 5 documents:

# If the process was interrupted, just run again
llmstxt-architect --urls https://example.com --project-dir output

# Processed URLs will be skipped

MCP Integration

Use the generated llms.txt with MCP server:

{
  "mcpServers": {
    "docs": {
      "command": "uvx",
      "args": [
        "--from", "mcpdoc",
        "mcpdoc",
        "--urls", "Docs:./output/llms.txt",
        "--transport", "stdio"
      ]
    }
  }
}

Example Results

Projects using llmstxt-architect:

Comparison with llmstxt CLI

Aspect	llmstxt CLI	llmstxt-architect
Description source	Meta tags	AI generation
Description quality	Depends on site	High
Speed	Fast	Slower (API calls)
Cost	Free	API payment
Analysis depth	Shallow	Content analysis