Methodology

How TxtFeed scores public llms.txt files. Six dimensions, transparent weights, refreshed every 24 hours. The short rubric reference lives at /standard; this page is the full explanation.

What we measure, and why

An llms.txt file is the public manifest a website publishes to tell AI agents (ChatGPT, Claude, Perplexity, Gemini, and the long tail of crawlers) which pages are canonical, which datasets are licensed for ingestion, and which sections should be prioritised when the site is cited in an LLM answer. The spec is young, the format is contested, and adoption varies wildly: some sites ship five-line manifests, some ship 30 KB ones, some ignore the freshness signal entirely. TxtFeed scores every public file we discover so creators, publishers, and developers can see who is doing this well — and copy what works.

The score is intentionally a 0–100 number on a single scale, not a verdict or a grade. A high score means the file is closer to the canonical form that maximises LLM-citation readiness; a low score means the file exists but leaves citation gravity on the table. Below 50 = AI agents may treat the site as ambiguous; above 75 = the file is among the canonical examples worth studying.

The six dimensions

Spec Compliance

weight: 20%

Does the file follow the published llms.txt spec? Are required sections present? Is the markdown syntax valid?

Measures:

Presence of the required #-prefixed site name on line 1
Presence of the > description block
Valid section headers (## Documentation, ## API, etc.)
Link list syntax compliance (- [title](url): description)
UTF-8 encoded, no invisible characters or BOM

Crawler Coverage

weight: 20%

Does the file address the major AI crawlers explicitly, and does the site's robots.txt remain consistent with the llms.txt intent?

Measures:

Explicit mention of GPTBot / ClaudeBot / PerplexityBot / Googlebot-Extended
Consistency with /robots.txt allow/deny rules
Presence of /llms-full.txt or section-specific manifests
ai.robots.txt directory listing (cross-reference)

Content Quality

weight: 20%

Are the linked sections substantive and unique? Or is the manifest pointing at thin / scraped / duplicate content?

Measures:

Word count on linked /docs/, /guide/, /reference/ targets
Schema.org markup presence on referenced pages
Originality score (rough N-gram overlap with public sources)
Author / Person schema presence (E-E-A-T signal)

Freshness

weight: 15%

When was the file last updated? When were the linked sections last modified? Does the file age out of currency or stay live?

Measures:

HTTP Last-Modified header on /llms.txt
Linked-page dateModified consistency
Distance from the operator's stated publication cadence
Detection of stale entries (deleted target URLs)

Structure

weight: 15%

Is the manifest navigable? Are sections logically grouped? Are licence and contact directives present where needed?

Measures:

Section header count and depth
Presence of ## License directive
Presence of contact-point directive (mailto: or URL)
Internal link consistency (no orphans)

Citation Readiness

weight: 10%

If an LLM cites this site tomorrow, will the cited content be: (a) accessible, (b) attributable to a real Organization or Person, and (c) durable enough that the citation will still resolve in 6 months?

Measures:

Organization schema + sameAs entries
Stable URLs (no /[id]-based slugs likely to rotate)
Wikipedia / Wikidata cross-references where applicable
Documented update cadence

Refresh cadence

Every domain in the directory is re-fetched every 24 hours. The score recomputes automatically when any of the underlying signals change — a new /llms.txtupload, a robots.txt edit, an updated Last-Modified header on a linked page, or a structural change to the file. The historical score is preserved in the per-domain page's fetchedAt timestamp so creators can see when an improvement landed.

What is not measured

Domain authority. A score of 92 on a small indie site is more informative than a 92 on a major SaaS — both are doing the same thing well; the size of the audience is orthogonal.
Aesthetic of the site rendering the file. Sites with minimal visual design can ship excellent llms.txt.
Quality of the site's actual product or service. We score the manifest, not the company behind it.
AI-bot traffic volume to the site. We measure the readiness of the citation surface, not the conversion of that readiness into citations.

Source & transparency

The rubric is published in full and refined publicly. Score regressions are flagged with a per-dimension delta on each domain's page. If you believe a score is wrong, email us at [email protected] with the URL and the dimension you're contesting; we walk through the per-page evidence and either correct the score or explain the reasoning. The short reference card lives at /standard; the editorial perspective on individual dimensions lives in the blog.

What we measure, and why

The six dimensions

Spec Compliance

weight: 20%

Does the file follow the published llms.txt spec? Are required sections present? Is the markdown syntax valid?

Measures:

Presence of the required #-prefixed site name on line 1
Presence of the > description block
Valid section headers (## Documentation, ## API, etc.)
Link list syntax compliance (- [title](url): description)
UTF-8 encoded, no invisible characters or BOM

Crawler Coverage

weight: 20%

Does the file address the major AI crawlers explicitly, and does the site's robots.txt remain consistent with the llms.txt intent?

Measures:

Explicit mention of GPTBot / ClaudeBot / PerplexityBot / Googlebot-Extended
Consistency with /robots.txt allow/deny rules
Presence of /llms-full.txt or section-specific manifests
ai.robots.txt directory listing (cross-reference)

Content Quality

weight: 20%

Are the linked sections substantive and unique? Or is the manifest pointing at thin / scraped / duplicate content?

Measures:

Word count on linked /docs/, /guide/, /reference/ targets
Schema.org markup presence on referenced pages
Originality score (rough N-gram overlap with public sources)
Author / Person schema presence (E-E-A-T signal)

Freshness

weight: 15%

When was the file last updated? When were the linked sections last modified? Does the file age out of currency or stay live?

Measures:

HTTP Last-Modified header on /llms.txt
Linked-page dateModified consistency
Distance from the operator's stated publication cadence
Detection of stale entries (deleted target URLs)

Structure

weight: 15%

Is the manifest navigable? Are sections logically grouped? Are licence and contact directives present where needed?

Measures:

Section header count and depth
Presence of ## License directive
Presence of contact-point directive (mailto: or URL)
Internal link consistency (no orphans)

Citation Readiness

weight: 10%

Measures:

Organization schema + sameAs entries
Stable URLs (no /[id]-based slugs likely to rotate)
Wikipedia / Wikidata cross-references where applicable
Documented update cadence

Refresh cadence

What is not measured

Domain authority. A score of 92 on a small indie site is more informative than a 92 on a major SaaS — both are doing the same thing well; the size of the audience is orthogonal.

Aesthetic of the site rendering the file. Sites with minimal visual design can ship excellent llms.txt.

Quality of the site's actual product or service. We score the manifest, not the company behind it.

AI-bot traffic volume to the site. We measure the readiness of the citation surface, not the conversion of that readiness into citations.

Source & transparency