Story Needle

How will bots see your content?

Your customers aren’t that into your website anymore. Most websites have noticed a drop in traffic as users query bots and bots supply answers. Bots generate few clicks to web pages, and the proportion of referral clicks seems to be falling.

Web publishers are aware of the existential threat they face. So far, they’ve tried to make themselves more lovable for bots. They scheme to get noticed by bots (GEO – generated engine optimization). Or they try to make their pages “friendlier” for bots (Google’s WebMCP is the latest example). The legacy thinking still frames the problem as one of visibility — getting noticed in a crowd.

Yet bots aren’t people, and don’t need to be wooed. The old psychology of wooing is no longer relevant. If bots need something, they will take it from your website, whether you invite them or not. In many cases, they will take content even if you don’t want them to.

The problem websites must solve now is how to ensure bots extract the right content from your site. And your existing HTML content, built for web browsers and surfers, isn’t what bots need, if your organization cares about ensuring the accuracy and relevance of what bots provide. JavaScript, the foundation of websites, is a liability for bots.

AI platforms are evolving quickly. They are pivoting away from indiscriminate web scraping for “training” and towards RAG, where they search first for information before generating answers. AI platforms have also embraced the Model Context Protocol (MCP) standard, which, when enabled, allows them to access enterprise content directly. Already, third-party MCP platforms such as Scite and Tollbit have emerged to connect content publishers with AI platforms.

Publishers will continue to publish webpages for human readers, but they need to ensure that AI platforms access the right content for bot users. The best practices for doing this are still emerging, and several initiatives are underway to define protocols and standards.

What’s becoming apparent is that MCP will play an important role in controlling bot access and content governance. The diagram below illustrates a potential content pipeline for a scholarly publisher. A similar pipeline might be adopted by a website publisher — but some additional steps are needed to transform HTML-centric content into bot-ready content.

Example pipeline. Source: Scholarly Kitchen

How are publishers getting ready? Let’s look at how Tollbit helps web publishers. Tollbit works with the Associated Press and other publishers to make their content ready for AI platforms.

The first task is to “clean” the web content to remove material that’s not relevant or canonical. This can be done through DOM filtering to exclude certain classes of content, such as navigation text, promotional assets, or customer comments.

Additional filtering can be done by excluding pages or directories that are procedural or administrative rather than substantive in focus.

Next, the content should be transformed by removing clunky HTML tags to convert the content into a bot-readable format. Many organizations opt to convert content into Markdown, which preserves heading hierarchies (useful for bots) while striping away extraneous markup that bots don’t need.

Bots benefit from metadata, but need help identifying it. The content transformation process should address metadata that’s not visible to human readers. This includes descriptive metadata (such as schema.org) about the content for external systems like search engines, and internal administrative and technical metadata (such as geolocation coordinates) used for web page delivery. This conversion, known as re-serialization, makes the metadata queriable. The metadata can be “hydrated” into the bot’s payload.

AI platforms, ever motivated to increase the sophistication of their products, will take advantage of these content enhancements.

Getting content “bot-ready” will become crucial as AI platforms expand their agentic capabilities. Publishers will need to define access rights and permissions. What materials can bots read, re-publish, or process?

Publishers will shape these affordances through both explicit statements and implicit decisions that influence the ease with which bots can perform actions.

— Michael Andrews

Exit mobile version