llmstxt CLI
Generate llms.txt from sitemap.xml
Overview
Section titled “Overview”llmstxt is a CLI tool for automatically generating llms.txt files from your site’s sitemap.xml.
Repository: github.com/dotenvx/llmstxt
Installation
Section titled “Installation”# Global installationnpm install -g llmstxt
# Or run via npx (no installation)npx llmstxt gen https://example.com/sitemap.xmlCommands
Section titled “Commands”gen — Basic Generation
Section titled “gen — Basic Generation”Creates llms.txt with a list of pages and descriptions:
npx llmstxt gen https://example.com/sitemap.xml > llms.txtResult:
# Example Site
> Description from meta tags
## Section
- [Page Title](https://example.com/page): Meta descriptiongen-full — Full Generation
Section titled “gen-full — Full Generation”Creates a file with full content of all pages:
npx llmstxt gen-full https://example.com/sitemap.xml > llms-full.txtIncludes:
- Table of Contents
- Full text of each page in Markdown
- Last updated dates
Options
Section titled “Options”Path Filtering
Section titled “Path Filtering”# Exclude pathsnpx llmstxt gen https://example.com/sitemap.xml \ --exclude-path "**/blog/**" \ --exclude-path "**/privacy**" \ --exclude-path "**/terms**"
# Include only specific pathsnpx llmstxt gen https://example.com/sitemap.xml \ --include-path "**/docs/**" \ --include-path "**/api/**"Title Cleanup
Section titled “Title Cleanup”Remove repeating text from titles:
# Remove "| Example" from all titlesnpx llmstxt gen https://example.com/sitemap.xml \ --replace-title 's/\| Example//'Custom Metadata
Section titled “Custom Metadata”npx llmstxt gen https://example.com/sitemap.xml \ --title "My Documentation" \ --description "Official docs for My Project"Concurrency Control
Section titled “Concurrency Control”# Limit number of concurrent requestsnpx llmstxt gen https://example.com/sitemap.xml \ --concurrency 3Full Example
Section titled “Full Example”npx llmstxt@latest gen https://docs.example.com/sitemap.xml \ -ep "**/blog/**" \ -ep "**/changelog/**" \ -ep "**/privacy**" \ -ep "**/terms**" \ -rt 's/\| Docs//' \ -t 'Example Docs' \ -d 'Official documentation for Example' \ -c 5 \ > llms.txtOptions (Short Form)
Section titled “Options (Short Form)”| Long Form | Short | Description |
|---|---|---|
--exclude-path | -ep | Exclude paths (glob) |
--include-path | -ip | Include paths (glob) |
--replace-title | -rt | Regex replacement in titles |
--title | -t | Custom title |
--description | -d | Custom description |
--concurrency | -c | Max concurrent requests |
How It Works
Section titled “How It Works”- Parse sitemap.xml — extract all URLs
- Load pages — parallel loading with concurrency control
- Extract metadata:
<title>— page title<meta name="description">— description<meta property="og:description">— fallback<meta name="twitter:description">— fallback
- Grouping — pages are grouped by URL paths
- Markdown generation — formatting into llms.txt
CI/CD Integration
Section titled “CI/CD Integration”GitHub Actions
Section titled “GitHub Actions”name: Generate llms.txt
on: push: branches: [main] schedule: - cron: '0 0 * * 0' # Every Sunday
jobs: generate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Generate llms.txt run: | npx llmstxt gen https://docs.example.com/sitemap.xml \ -ep "**/blog/**" > public/llms.txt
- name: Commit changes run: | git config --local user.email "action@github.com" git config --local user.name "GitHub Action" git add public/llms.txt git commit -m "Update llms.txt" || exit 0 git pushTroubleshooting
Section titled “Troubleshooting”Empty Result
Section titled “Empty Result”Check that:
- sitemap.xml is accessible and contains URLs
- Pages have
<title>tags - Filters don’t exclude all pages
Slow Generation
Section titled “Slow Generation”Reduce --concurrency to avoid rate limiting:
npx llmstxt gen https://example.com/sitemap.xml -c 2Incorrect Titles
Section titled “Incorrect Titles”Use --replace-title for cleanup:
npx llmstxt gen url -rt 's/ - My Site$//'