Methodology
How TxtFeed scores public llms.txt files. Six dimensions, transparent weights, refreshed every 24 hours. The short rubric reference lives at /standard; this page is the full explanation.
What we measure, and why
An llms.txt file is the public manifest a website publishes to tell AI agents (ChatGPT, Claude, Perplexity, Gemini, and the long tail of crawlers) which pages are canonical, which datasets are licensed for ingestion, and which sections should be prioritised when the site is cited in an LLM answer. The spec is young, the format is contested, and adoption varies wildly: some sites ship five-line manifests, some ship 30 KB ones, some ignore the freshness signal entirely. TxtFeed scores every public file we discover so creators, publishers, and developers can see who is doing this well — and copy what works.
The score is intentionally a 0–100 number on a single scale, not a verdict or a grade. A high score means the file is closer to the canonical form that maximises LLM-citation readiness; a low score means the file exists but leaves citation gravity on the table. Below 50 = AI agents may treat the site as ambiguous; above 75 = the file is among the canonical examples worth studying.
The six dimensions
Spec Compliance
weight: 20%Does the file follow the published llms.txt spec? Are required sections present? Is the markdown syntax valid?
Measures:
- Presence of the required #-prefixed site name on line 1
- Presence of the > description block
- Valid section headers (## Documentation, ## API, etc.)
- Link list syntax compliance (- [title](url): description)
- UTF-8 encoded, no invisible characters or BOM
Crawler Coverage
weight: 20%Does the file address the major AI crawlers explicitly, and does the site's robots.txt remain consistent with the llms.txt intent?
Measures:
- Explicit mention of GPTBot / ClaudeBot / PerplexityBot / Googlebot-Extended
- Consistency with /robots.txt allow/deny rules
- Presence of /llms-full.txt or section-specific manifests
- ai.robots.txt directory listing (cross-reference)
Content Quality
weight: 20%Are the linked sections substantive and unique? Or is the manifest pointing at thin / scraped / duplicate content?
Measures:
- Word count on linked /docs/, /guide/, /reference/ targets
- Schema.org markup presence on referenced pages
- Originality score (rough N-gram overlap with public sources)
- Author / Person schema presence (E-E-A-T signal)
Freshness
weight: 15%When was the file last updated? When were the linked sections last modified? Does the file age out of currency or stay live?
Measures:
- HTTP Last-Modified header on /llms.txt
- Linked-page dateModified consistency
- Distance from the operator's stated publication cadence
- Detection of stale entries (deleted target URLs)
Structure
weight: 15%Is the manifest navigable? Are sections logically grouped? Are licence and contact directives present where needed?
Measures:
- Section header count and depth
- Presence of ## License directive
- Presence of contact-point directive (mailto: or URL)
- Internal link consistency (no orphans)
Citation Readiness
weight: 10%If an LLM cites this site tomorrow, will the cited content be: (a) accessible, (b) attributable to a real Organization or Person, and (c) durable enough that the citation will still resolve in 6 months?
Measures:
- Organization schema + sameAs entries
- Stable URLs (no /[id]-based slugs likely to rotate)
- Wikipedia / Wikidata cross-references where applicable
- Documented update cadence
Refresh cadence
Every domain in the directory is re-fetched every 24 hours. The score recomputes automatically when any of the underlying signals change — a new /llms.txtupload, a robots.txt edit, an updated Last-Modified header on a linked page, or a structural change to the file. The historical score is preserved in the per-domain page's fetchedAt timestamp so creators can see when an improvement landed.
What is not measured
- Domain authority. A score of 92 on a small indie site is more informative than a 92 on a major SaaS — both are doing the same thing well; the size of the audience is orthogonal.
- Aesthetic of the site rendering the file. Sites with minimal visual design can ship excellent
llms.txt. - Quality of the site's actual product or service. We score the manifest, not the company behind it.
- AI-bot traffic volume to the site. We measure the readiness of the citation surface, not the conversion of that readiness into citations.
Source & transparency
The rubric is published in full and refined publicly. Score regressions are flagged with a per-dimension delta on each domain's page. If you believe a score is wrong, email us at [email protected] with the URL and the dimension you're contesting; we walk through the per-page evidence and either correct the score or explain the reasoning. The short reference card lives at /standard; the editorial perspective on individual dimensions lives in the blog.