File Format
Complete llms.txt format specification — file structure, required and recommended elements, link format, Optional section, Python parsing example
Overview
Section titled “Overview”The llms.txt file uses Markdown format with a defined structure. It is placed at the site root at /llms.txt. Created by Jeremy Howard (Answer.AI) in September 2024.
Key advantage: llms.txt contains 1-4K tokens — a complete description of the site structure. For comparison: the HTML version of the same site is 500K+ tokens. Context savings: 99%+.
File Structure
Section titled “File Structure”# H1 Title (required)
> Blockquote with brief description (recommended)
Additional description text (optional)
## H2 Section (optional)
- [Link Name](URL): Resource description- [Another Link](URL): What this resource contains
## Optional
- [Additional Resource](URL): Can be skipped when context is limitedRequired Elements
Section titled “Required Elements”H1 Title
Section titled “H1 Title”The only required element is a first-level heading with the project name:
# FastHTMLor
# API Documentation v2Recommended Elements
Section titled “Recommended Elements”Blockquote
Section titled “Blockquote”Brief project description in 1-2 sentences:
> FastHTML is a Python library for building fast web applications> using HTMX and modern web standards.Descriptive Text
Section titled “Descriptive Text”Additional information after the blockquote:
Key features:- No JavaScript frameworks required- Server-Side Rendering support- FastAPI integrationLink Sections
Section titled “Link Sections”H2 Section Format
Section titled “H2 Section Format”Each section starts with an H2 heading:
## Documentation
- [Quick Start](https://example.com/quickstart): Start here- [Installation](https://example.com/install): System requirements and installationLink Format
Section titled “Link Format”Each link consists of:
- [Name](URL): Description| Element | Required | Example |
|---|---|---|
| Name | Required | [API Reference] |
| URL | Required | (https://docs.example.com/api) |
| Description | Recommended | : Complete endpoint documentation |
Link Examples
Section titled “Link Examples”- [Getting Started](https://docs.example.com/start): Quick start in 5 minutes- [API Reference](https://docs.example.com/api): REST API documentation- [Examples](https://github.com/example/repo/examples): Code examples on GitHubSpecial Optional Section
Section titled “Special Optional Section”The ## Optional section has special meaning:
## Optional
- [Changelog](https://example.com/changelog): Version history- [Contributing](https://example.com/contributing): Contributor guide- [Advanced Topics](https://example.com/advanced): In-depth topicsFile Location
Section titled “File Location”Primary Location
Section titled “Primary Location”https://example.com/llms.txtAlternative Paths
Section titled “Alternative Paths”For site subsections, you can create separate files:
https://example.com/docs/llms.txthttps://example.com/api/llms.txtRelated Files
Section titled “Related Files”Markdown Versions of Pages
Section titled “Markdown Versions of Pages”It’s recommended to provide .md versions of HTML pages. Markdown is clean content without navigation, ads, and scripts, which is significantly more efficient for LLMs:
https://example.com/page.html → https://example.com/page.html.mdhttps://example.com/docs/ → https://example.com/docs/index.html.mdExtended Formats
Section titled “Extended Formats”Some projects generate additional files:
| File | Purpose |
|---|---|
llms.txt | Base file with links |
llms-full.txt | Full content of all pages |
llms-small.txt | Minimal version |
llms-ctx.txt | Context without URLs (for prompt insertion) |
Parsing
Section titled “Parsing”The file can be parsed with simple regex (~20 lines of code):
import re
def parse_llms_txt(content: str) -> dict: result = {'title': '', 'description': '', 'sections': {}}
# Extract H1 h1_match = re.search(r'^# (.+)$', content, re.MULTILINE) if h1_match: result['title'] = h1_match.group(1)
# Extract blockquote bq_match = re.search(r'^> (.+)$', content, re.MULTILINE) if bq_match: result['description'] = bq_match.group(1)
# Extract sections and links current_section = 'default' for line in content.split('\n'): if line.startswith('## '): current_section = line[3:].strip() result['sections'][current_section] = [] elif line.startswith('- ['): match = re.match(r'- \[(.+?)\]\((.+?)\)(?:: (.+))?', line) if match: result['sections'].setdefault(current_section, []).append({ 'title': match.group(1), 'url': match.group(2), 'description': match.group(3) or '' })
return resultValidation
Section titled “Validation”Check your llms.txt:
- Has H1 title
- All links work
- Descriptions are informative for LLM
- No duplicate links
- Optional section contains only secondary resources
Detailed checklist and auto-validation script: Best Practices