Best Practices
Best practices for creating llms.txt — effective link descriptions, optimal file size, common mistakes, security guidelines, validation checklist and CI/CD
Effective Descriptions
Section titled “Effective Descriptions”The link description is context for the LLM. It determines whether the model will load the page.
❌ Bad:
- [Docs](https://example.com/docs): Documentation- [API](https://example.com/api): API- [Guide](https://example.com/guide): Guide✅ Good:
- [Quick Start](https://example.com/docs/start): Installation, configuration, and first run in 5 minutes- [REST API](https://example.com/api/rest): Authentication, endpoints, request and response examples- [Migration Guide](https://example.com/docs/migrate): Upgrading from v2 to v3, breaking changesRules for descriptions:
- Specificity — what exactly will the LLM find on the page
- Keywords — terms the LLM will search for
- Uniqueness — each description differs from others
- Length — 10-20 words, enough to understand context
Optimal Size
Section titled “Optimal Size”| File | Recommended Size | Maximum |
|---|---|---|
| llms.txt | 1-4K tokens | ~8K tokens |
| llms-full.txt | 10-50K tokens | ~100K tokens |
Diminishing Returns
Section titled “Diminishing Returns”With a large number of links (rough guideline — over 50), effectiveness drops. The LLM must scan the entire file — more links means each resource is more “diluted.” The exact threshold depends on the model and context window.
Optimal structure:
- 3-6 sections
- 5-15 links per section
- 20-50 links total
llms.txt vs Full HTML
Section titled “llms.txt vs Full HTML”For comparison: an average documentation site is 500K+ tokens in HTML. llms.txt provides the same structural information in 1-4K tokens. Context savings: 99%+.
Common Mistakes
Section titled “Common Mistakes”1. Too Many Sections
Section titled “1. Too Many Sections”❌ 12 sections with 2-3 links each — bad. Visual noise with no benefit.
✅ 4-6 sections with logical grouping.
2. Missing Descriptions
Section titled “2. Missing Descriptions”❌ - [Auth](https://example.com/auth)✅ - [Auth](https://example.com/auth): OAuth 2.0, JWT tokens, session managementWithout a description, the LLM doesn’t know whether to load the page.
3. Broken URLs
Section titled “3. Broken URLs”Links to non-existent pages are worse than having no llms.txt. The LLM “wastes” a request on a 404.
4. Internal/Admin Pages
Section titled “4. Internal/Admin Pages”Don’t include content that shouldn’t be public in llms.txt:
/admin/,/internal/- Staging URLs
- Pages behind authentication
5. Outdated Content
Section titled “5. Outdated Content”llms.txt points to deleted or moved pages. Regenerate the file when your site structure changes.
6. Confusion with sitemap.xml
Section titled “6. Confusion with sitemap.xml”sitemap.xml is a complete list of all URLs for search crawlers. llms.txt is a curated set of key pages with descriptions. These are different tools for different purposes.
Security
Section titled “Security”Do not include in llms.txt:
- Internal/admin URLs
- Staging/dev environments
- API keys and credentials
- Private documentation
- URLs with authentication tokens
llms.txt is a public file. Everything you add to it is visible to the entire internet.
Validation Checklist
Section titled “Validation Checklist”Before publishing, verify:
- H1 title — project name
- Blockquote — brief description (1-2 sentences)
- All links work (no 404s)
- Every link has a description
- No duplicate links
- No internal/admin URLs
- Optional section — only secondary resources
- File size < 8K tokens
Auto-Validation Script
Section titled “Auto-Validation Script”#!/bin/bash# validate-llms-txt.sh — llms.txt validation
FILE="${1:-llms.txt}"
echo "Validating: $FILE"echo "---"
# H1 checkif grep -q '^# ' "$FILE"; then echo "✓ H1 title found"else echo "✗ Missing H1 title"fi
# Blockquote checkif grep -q '^> ' "$FILE"; then echo "✓ Blockquote found"else echo "⚠ No blockquote (recommended)"fi
# Link countLINKS=$(grep -c '^\- \[' "$FILE")echo " Links: $LINKS"
# Links without descriptionsNO_DESC=$(grep '^\- \[' "$FILE" | grep -cv ':')if [ "$NO_DESC" -gt 0 ]; then echo "⚠ $NO_DESC links without descriptions"else echo "✓ All links have descriptions"fi
# Check for broken URLsecho "Checking URLs..."grep -oP 'https?://[^\)]+' "$FILE" | while read -r url; do STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 "$url") if [ "$STATUS" != "200" ]; then echo " ✗ $STATUS $url" fidone
echo "---"echo "Done."Update Strategy
Section titled “Update Strategy”When to Regenerate
Section titled “When to Regenerate”- Documentation page added/removed
- Site structure changed
- URLs moved (redirects)
- Major product release
CI/CD Auto-Generation
Section titled “CI/CD Auto-Generation”For frameworks with plugins (Astro, MkDocs, Nuxt) — llms.txt is generated automatically on every build.
For others — add a CI/CD step:
name: Update llms.txton: push: paths: ['docs/**']jobs: update: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: npx llmstxt gen https://example.com/sitemap.xml > public/llms.txt - run: | git add public/llms.txt git diff --staged --quiet || git commit -m "Update llms.txt" git pushMonitoring Broken Links
Section titled “Monitoring Broken Links”Periodically check links in llms.txt. Use the script from the checklist above or tools like linkchecker.