Is llms.txt a real standard?

It is a community convention with strong adoption momentum and partial support from major AI platforms. It is not formally standardized by a body like IETF. Treat it as durable and influential even while the spec stabilizes.

Do I need llms.txt if my robots.txt is well-configured?

Yes. They serve different purposes. robots.txt controls access. llms.txt provides AI-specific curation and context that robots.txt cannot express. A good access strategy uses both.

Will AI systems cite my llms.txt directly?

Rarely. llms.txt itself is not typically cited. Instead, AI systems use it as a guide to decide which of your pages to crawl, quote, and cite.

How do I write my first llms.txt?

Start with your site name as the H1, a one-paragraph brand summary as a blockquote, and three H2 sections: "Core Pages," "Documentation," and "Research." Under each, list 5 to 10 URLs with one-line descriptions. Ship that, then refine.

llms.txt FAQ: 25 Questions About the AI Access File

llms.txt is the emerging convention for telling AI systems how to interact with your content. Adoption is rising rapidly but the spec, the tooling, and the expectations are all still stabilizing. These 25 questions cover what brands should know before adopting, how to write a good one, and how it relates to the rest of the AI access stack.

Basics

Q: What is llms.txt?

llms.txt is a plain text file served at your domain root (yoursite.com/llms.txt) that provides AI systems with a curated map of your most important content and how you want it to be used. Think of it as robots.txt for AI, with richer semantics and an emphasis on directing AI toward your best content rather than only blocking access.

Q: Who created the llms.txt spec?

The original proposal came from Jeremy Howard in 2024. It has since been adopted and extended by multiple community contributors and AI platforms. It is not a formal IETF standard yet, but it is the strongest community convention in circulation.

Q: Is llms.txt the same as robots.txt?

No. robots.txt gates access at the URL level for any crawler. llms.txt is AI-specific, adds prose context about your content, and can recommend preferred pages in priority order. The two coexist: use robots.txt for access control and llms.txt for AI-specific guidance.

Q: Which AI systems respect llms.txt?

Anthropic and Perplexity have publicly indicated that they read llms.txt. OpenAI has not formally confirmed but hints at awareness. Other platforms are inconsistent. Expect respect to broaden through 2026 as the convention matures.

Format and Structure

Q: What does an llms.txt file look like?

It is a Markdown-style plain text file with an H1 for the site name, an optional blockquote summary, and H2 sections listing preferred pages with short descriptions. Advanced variants can include Allow/Disallow directives and metadata fields.

Q: Do I need llms.txt if I already have a sitemap.xml?

They serve different purposes. Sitemap.xml lists every indexable URL for crawlers. llms.txt is curated, often small (10 to 30 entries), and designed to point AI at your best content with context. Use both.

Q: How long should my llms.txt be?

Typically 1 to 5 kilobytes. Most useful llms.txt files contain between 10 and 40 curated links with short descriptions. Files larger than 20 kilobytes lose their curation value and look like a sitemap.

Q: Should I include every page in llms.txt?

No. llms.txt should be curated. Include your highest-value, most-citable pages: canonical product pages, pricing, core documentation, definitive glossary entries, top case studies. Skip marketing fluff, blog archive lists, and anything that would not be a useful citation.

Adoption and Respect

Q: How do I know if AI systems are actually reading my llms.txt?

Server logs show which user-agents requested /llms.txt. Expect requests from ClaudeBot, PerplexityBot, GPTBot, and a growing list. For a non-log-based check, Presenc AI audits whether each major AI platform respects your llms.txt directives in its actual responses.

Q: What is typical llms.txt adoption by sector?

Technology and blockchain lead with roughly 20 to 30 percent adoption. Media, legal, and insurance sit under 10 percent. Adoption is rising quickly across all sectors in 2026.

Q: Does having llms.txt improve my AI citation rate?

Indirectly yes. A well-structured llms.txt helps AI systems find your canonical pages faster and cite them instead of peripheral pages. The causal uplift is modest but real, especially for sites with sprawling navigation or weak internal linking.

Q: Can llms.txt hurt my visibility if written poorly?

Yes. A llms.txt full of Disallow directives, pointing only at marketing pages, or contradicting robots.txt can signal hostile or confused intent and suppress retrieval. A clear, permissive llms.txt is better than a bad one.

Strategy

Q: Should I publish llms.txt if my industry has low adoption?

Yes, especially then. Low-adoption sectors reward early movers. Publishing a clean llms.txt before competitors signals quality to AI systems and establishes your brand as a canonical source.

Q: Can I use llms.txt to block AI training?

Partially. llms.txt supports directives that discourage training use of specific URLs, and some platforms honor them. For hard blocking, combine with robots.txt blocks on training-focused crawlers (GPTBot, ClaudeBot, CCBot).

Q: How does llms.txt relate to AI licensing?

It does not grant or deny licenses. It expresses preferences. Actual licensing agreements with AI platforms are handled through direct deals or via intermediaries. llms.txt is a complementary signal layer.

Q: Should I include contact info in llms.txt?

Yes, as an optional field. A contact email or licensing URL in your llms.txt metadata gives AI platforms a channel to reach you for rate limits, licensing, or data correction.

Tactics and Common Mistakes

Q: What pages should I list first in llms.txt?

Canonical pages that AI should cite when asked about your brand: your homepage, core product page, pricing, main documentation index, and a definitive about page. Follow with high-value content clusters like comparison pages and research reports.

Q: Should I include a summary blockquote?

Yes. A single blockquote that describes your brand in one paragraph gives AI systems a canonical description to quote. Write it carefully. It is one of the highest-leverage two sentences you will write.

Q: How often should I update llms.txt?

Monthly is a reasonable baseline. Update whenever you ship a major new page, retire an old one, or change core brand positioning. Treat it like a curated index, not a static file.

Q: Can llms.txt contain multiple languages?

Yes. Publish a single llms.txt with sections per language or separate files per locale. Match whatever URL structure your multilingual content uses.

Q: What is the most common llms.txt mistake?

Copying a template and never customizing it. The power of llms.txt is in the curated description and the specific URL selection. A generic llms.txt is worse than none because it signals low investment.

Q: Should I list blog posts in llms.txt?

Only your most evergreen, citation-worthy posts. Skip timely news posts. llms.txt is for canonical content, not feed content.

Q: Does llms.txt help with AI hallucinations about my brand?

It helps by pointing AI at authoritative canonical pages. It does not retroactively fix hallucinations in already-trained models, but it reduces the chance of fresh hallucinations in retrieval-based responses.

Q: Is llms.txt enough on its own?

No. It is one layer in the AI access stack, alongside robots.txt, schema.org structured data, sitemap.xml, and strong internal linking. llms.txt amplifies the rest of the stack rather than replacing any of it.

llms.txt FAQ