Skip to content

Best Practices

Best practices for creating llms.txt — effective link descriptions, optimal file size, common mistakes, security guidelines, validation checklist and CI/CD

The link description is context for the LLM. It determines whether the model will load the page.

❌ Bad:

- [Docs](https://example.com/docs): Documentation
- [API](https://example.com/api): API
- [Guide](https://example.com/guide): Guide

✅ Good:

- [Quick Start](https://example.com/docs/start): Installation, configuration, and first run in 5 minutes
- [REST API](https://example.com/api/rest): Authentication, endpoints, request and response examples
- [Migration Guide](https://example.com/docs/migrate): Upgrading from v2 to v3, breaking changes

Rules for descriptions:

  • Specificity — what exactly will the LLM find on the page
  • Keywords — terms the LLM will search for
  • Uniqueness — each description differs from others
  • Length — 10-20 words, enough to understand context
FileRecommended SizeMaximum
llms.txt1-4K tokens~8K tokens
llms-full.txt10-50K tokens~100K tokens

With a large number of links (rough guideline — over 50), effectiveness drops. The LLM must scan the entire file — more links means each resource is more “diluted.” The exact threshold depends on the model and context window.

Optimal structure:

  • 3-6 sections
  • 5-15 links per section
  • 20-50 links total

For comparison: an average documentation site is 500K+ tokens in HTML. llms.txt provides the same structural information in 1-4K tokens. Context savings: 99%+.

❌ 12 sections with 2-3 links each — bad. Visual noise with no benefit.

✅ 4-6 sections with logical grouping.

❌ - [Auth](https://example.com/auth)
✅ - [Auth](https://example.com/auth): OAuth 2.0, JWT tokens, session management

Without a description, the LLM doesn’t know whether to load the page.

Links to non-existent pages are worse than having no llms.txt. The LLM “wastes” a request on a 404.

Don’t include content that shouldn’t be public in llms.txt:

  • /admin/, /internal/
  • Staging URLs
  • Pages behind authentication

llms.txt points to deleted or moved pages. Regenerate the file when your site structure changes.

sitemap.xml is a complete list of all URLs for search crawlers. llms.txt is a curated set of key pages with descriptions. These are different tools for different purposes.

Do not include in llms.txt:

  • Internal/admin URLs
  • Staging/dev environments
  • API keys and credentials
  • Private documentation
  • URLs with authentication tokens

llms.txt is a public file. Everything you add to it is visible to the entire internet.

Before publishing, verify:

  • H1 title — project name
  • Blockquote — brief description (1-2 sentences)
  • All links work (no 404s)
  • Every link has a description
  • No duplicate links
  • No internal/admin URLs
  • Optional section — only secondary resources
  • File size < 8K tokens
#!/bin/bash
# validate-llms-txt.sh — llms.txt validation
FILE="${1:-llms.txt}"
echo "Validating: $FILE"
echo "---"
# H1 check
if grep -q '^# ' "$FILE"; then
echo "✓ H1 title found"
else
echo "✗ Missing H1 title"
fi
# Blockquote check
if grep -q '^> ' "$FILE"; then
echo "✓ Blockquote found"
else
echo "⚠ No blockquote (recommended)"
fi
# Link count
LINKS=$(grep -c '^\- \[' "$FILE")
echo " Links: $LINKS"
# Links without descriptions
NO_DESC=$(grep '^\- \[' "$FILE" | grep -cv ':')
if [ "$NO_DESC" -gt 0 ]; then
echo "⚠ $NO_DESC links without descriptions"
else
echo "✓ All links have descriptions"
fi
# Check for broken URLs
echo "Checking URLs..."
grep -oP 'https?://[^\)]+' "$FILE" | while read -r url; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 "$url")
if [ "$STATUS" != "200" ]; then
echo " ✗ $STATUS $url"
fi
done
echo "---"
echo "Done."
  • Documentation page added/removed
  • Site structure changed
  • URLs moved (redirects)
  • Major product release

For frameworks with plugins (Astro, MkDocs, Nuxt) — llms.txt is generated automatically on every build.

For others — add a CI/CD step:

.github/workflows/update-llms-txt.yml
name: Update llms.txt
on:
push:
paths: ['docs/**']
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npx llmstxt gen https://example.com/sitemap.xml > public/llms.txt
- run: |
git add public/llms.txt
git diff --staged --quiet || git commit -m "Update llms.txt"
git push

Periodically check links in llms.txt. Use the script from the checklist above or tools like linkchecker.