Skip to content

llmstxt CLI

Generate llms.txt from sitemap.xml

llmstxt is a CLI tool for automatically generating llms.txt files from your site’s sitemap.xml.

Repository: github.com/dotenvx/llmstxt

Окно терминала
# Global installation
npm install -g llmstxt
# Or run via npx (no installation)
npx llmstxt gen https://example.com/sitemap.xml

Creates llms.txt with a list of pages and descriptions:

Окно терминала
npx llmstxt gen https://example.com/sitemap.xml > llms.txt

Result:

# Example Site
> Description from meta tags
## Section
- [Page Title](https://example.com/page): Meta description

Creates a file with full content of all pages:

Окно терминала
npx llmstxt gen-full https://example.com/sitemap.xml > llms-full.txt

Includes:

  • Table of Contents
  • Full text of each page in Markdown
  • Last updated dates
Окно терминала
# Exclude paths
npx llmstxt gen https://example.com/sitemap.xml \
--exclude-path "**/blog/**" \
--exclude-path "**/privacy**" \
--exclude-path "**/terms**"
# Include only specific paths
npx llmstxt gen https://example.com/sitemap.xml \
--include-path "**/docs/**" \
--include-path "**/api/**"

Remove repeating text from titles:

Окно терминала
# Remove "| Example" from all titles
npx llmstxt gen https://example.com/sitemap.xml \
--replace-title 's/\| Example//'
Окно терминала
npx llmstxt gen https://example.com/sitemap.xml \
--title "My Documentation" \
--description "Official docs for My Project"
Окно терминала
# Limit number of concurrent requests
npx llmstxt gen https://example.com/sitemap.xml \
--concurrency 3
Окно терминала
npx llmstxt@latest gen https://docs.example.com/sitemap.xml \
-ep "**/blog/**" \
-ep "**/changelog/**" \
-ep "**/privacy**" \
-ep "**/terms**" \
-rt 's/\| Docs//' \
-t 'Example Docs' \
-d 'Official documentation for Example' \
-c 5 \
> llms.txt
Long FormShortDescription
--exclude-path-epExclude paths (glob)
--include-path-ipInclude paths (glob)
--replace-title-rtRegex replacement in titles
--title-tCustom title
--description-dCustom description
--concurrency-cMax concurrent requests
  1. Parse sitemap.xml — extract all URLs
  2. Load pages — parallel loading with concurrency control
  3. Extract metadata:
    • <title> — page title
    • <meta name="description"> — description
    • <meta property="og:description"> — fallback
    • <meta name="twitter:description"> — fallback
  4. Grouping — pages are grouped by URL paths
  5. Markdown generation — formatting into llms.txt
name: Generate llms.txt
on:
push:
branches: [main]
schedule:
- cron: '0 0 * * 0' # Every Sunday
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate llms.txt
run: |
npx llmstxt gen https://docs.example.com/sitemap.xml \
-ep "**/blog/**" > public/llms.txt
- name: Commit changes
run: |
git config --local user.email "action@github.com"
git config --local user.name "GitHub Action"
git add public/llms.txt
git commit -m "Update llms.txt" || exit 0
git push

Check that:

  1. sitemap.xml is accessible and contains URLs
  2. Pages have <title> tags
  3. Filters don’t exclude all pages

Reduce --concurrency to avoid rate limiting:

Окно терминала
npx llmstxt gen https://example.com/sitemap.xml -c 2

Use --replace-title for cleanup:

Окно терминала
npx llmstxt gen url -rt 's/ - My Site$//'