Categories
Content Engineering

First-party AI in the post-webpage era

My previous post on the demise of webpages and the need for AI-native content has elicited good feedback and questions. I wanted to elaborate more on how publishers will need to take greater ownership of AI applications as users visit webpages less and less.

Some questions concerned how consumers will access AI-native content. Many folks imagined that customers would access the content through a third-party AI platform such as ChatGPT, Google, Claude, Perplexity, X, or Microsoft Bing Copilot. That’s certainly possible, but it is not what I envision as the default.

The goal of AI-native content is for publishers to take ownership of their AI pipeline rather than delegate that responsibility to a third party. The result is first-party AI tools, where the process and outcomes are entirely under the control and supervision of the publisher.

In the current era, third parties such as Google scrape webpages, extract information, rewrite the content, and publish it themselves. Most of the web traffic goes directly to Google rather than to the publisher, which is why traffic levels are down.

But numerous risks are associated with the third-party extraction of webpage content. The major one is that the third party won’t represent the content in the same way that the original publishers would. The third party is interpreting your content based on their bot’s internal (often opaque) criteria.

No one will care more about your content than you will. What’s good enough for a third party may be damaging for your organization in some cases. Consider how a third party might get their summary wrong, even if their technology is generally robust and popular with users:

  • Leaving out information you or your customers would consider essential
  • Using the wrong tone of voice
  • Substituting words that have specific meaning to your customers
  • Providing misleading information by drawing on similar products or different timeframes that aren’t relevant to the user’s needs

All these potential issues can be quality checked, but only when the AI bot is overseen by the publisher who understands these nuances.

But today, even enterprises that are developing their own AI tools tend to rely on general-purpose third-party platforms that have generic settings, which pretend to provide everything needed in a single platform. Results, unsurprisingly, have been disappointing.

Few publishers have yet invested in the foundations necessary for AI-native content:

  • a stable LLM controlled by the publisher that can be tuned if necessary;
  • organization of resources according to their role in content generation;
  • mappings to other resources the AI engine must access (for example, RAG and MCP connectors); or
  • libraries of repeatable prompts, output patterns, and rule engines.

With AI-native content, there are no webpages for third parties to crawl and misinterpret. Third parties can’t mislead customers because they are starved of the source material on which to base their summaries.

Instead, customers will get content directly from the source organization using first-party AI tools.

First-party AI will be a radical shift from past decades, where Google supplied answers directly and was always the first, and sometimes only, port of call. In the post-webpage era, users will interact with many AI bots, both directly and indirectly.

If your enterprise is an airline or a major retailer that customers regularly use, they will access your AI tools via an app. If they are an infrequent or first-time customer, they may start with a traditional search, but instead of getting a full website, they get a URL that is a portal to your AI tools.

It may also be possible for the publisher to directly supply AI-native content to third parties, such as Google or ChatGPT, as a feed. What’s important is that the publishers retain control over how AI-generated content is provided in this scenario. This scenario is unlike the current wave of licensing deals, where certain publishers grant permission for their content to be crawled by third parties in exchange for payment, and where the third party assumes responsibility for generating the summary.

With first-party AI, publishers can gate access to content in terms of topics, details, and quantity.

Already, we see examples in the market of vendors such as Cloudflare offering “pay per crawl” tools to limit AI platforms from using publishers’ content unless those platforms pay a license fee to access the content. This kind of contractual arrangement can easily be extended to AI-native content. And the growing availability of AI connection protocols will support controlling access much the same way APIs do.

For high-value content and interactions, firms will want to steer customers directly to their AI tools, and they will limit third parties from intermediating these interactions.

But for lower-value content and interactions, firms may allow AI platforms limited direct access to their AI-native content. The publishers retain control over how the content is offered but gain wider exposure through the third-party platform’s reach.

For content that is entirely promotional in nature, firms may supply AI-native content to third-party platforms on a fee basis, paying the platforms to show this content in generic queries, similar to how search ads work today. Despite the reliance on the platform for visibility, the publisher retains control over how messages appear, instead of allowing third parties to decide for themselves.

AI-native content enables publishers to provide first-party AI experiences. Publishers can control many parameters of content to ensure that generated content aligns with publisher goals.

In my previous post, I mentioned the need for a new kind of schema to support AI-native content. This schema will be richer than a traditional content model or data model. It will allow the mixing of structured data within semistructured narrative (text, video, audio). It will describe recurring word patterns that should appear in an exact way, while allowing for adaptable text that must merely conform in a general way to style or other governance guidelines. It will allow defined content variables to be referenced by prompts or agents. It may have factual rules against which statements must agree.

While we are still in the earliest days of this transition, I am impressed by how quickly language models have become commoditized and open-sourced, and RAG and MCP tools have become widespread. Medium and large-sized firms now have the opportunity to build first-party AI tools without outsourcing their customers’ AI experience to third parties.

— Michael Andrews

Categories
Content Engineering

The death of the webpage and rise of AI-native content

The internet is undergoing its most fundamental shift since the rise of the World Wide Web in the 1990s. The shift is so significant that many content professionals don’t appreciate how radically it will change current practices. The webpage is dying, yet what will replace it has yet to be defined.

Organizations have been building web pages for three decades — only a few of us remember the internet before webpages. Webpages are all that most of us have ever known.

AI promises to make building webpages even easier. Some experts imagine AI will trigger an explosion of webpages, increasing their number manyfold. According to this thinking, AI will make it easier to build personalized webpages. We will finally realize the dream of having webpages designed for an audience of one.

That vision is one embraced by developers, for whom building webpages has been the major preoccupation.

But AI isn’t revolutionary because it makes doing the same thing easier. AI is disruptive because it changes user behavior — not developer behavior.

Already, we see evidence that visitor traffic to webpages is down significantly. Users aren’t that into webpages anymore.

It would be a mistake to assume that if webpages became more personalized, people would visit them more often. The website is a declining channel. There is little possibility that it will regain its historic status.

AI bots and agents can provide information more directly than a webpage. Publishers are working on how AI can:

  • Answer questions and provide updates
  • Book travel or tickets
  • Plan tasks
  • Find and buy the best product
  • Solve customer problems

These topics are the bread and butter of webpages. As attention spans get ever shorter, information must be delivered immediately to be used. Hardly anyone wants to scroll through a webpage anymore if they don’t have to. That’s especially true for users whose expectations have been conditioned by algorithmic feeds such as TikTok.

Some will doubt that webpages will be displaced. They believe that many people prefer webpages over other channels. Or else they believe that webpages will remain necessary.

Sceptics imagine AI bots and agents will be just another channel, and that webpages will remain vital in the future.

In the short run, as the internet undergoes its dramatic transition, we can expect a mixed environment, with AI and webpages coexisting. Organizations will need to support both.

But in the longer term, webpages will become unnecessary for most topics currently addressed by websites. This could happen faster than many people expect.

Three factors will influence how quickly webpages disappear:

  1. How quickly user behavior shifts to the adoption of AI tools
  2. How effectively AI bots and agents can address both customer and enterprise tasks
  3. How readily organizations can support AI bots and agents without creating webpages.

The first two factors are somewhat interrelated. User adoption depends partly on the quality of AI tools in addition to their perceived convenience. The evolution of tools will depend on practicalities relating to AI infrastructure, such as models, orchestration, and ecosystems. While uncertainties remain with both, the amazing strides realized already and phenomenal investments underway suggest that progress on both will continue.

However, the third factor — webpage-free content for AI — remains largely unaddressed.

Unfortunately, the AI engineers developing tools have little expertise in content management. Their tools assume webpages, at least as an initial input. Webpages are published to be crawled, before being tokenized.

But it makes little sense to publish webpages when the webpage’s main audience will be bots seeking to crawl them. It would be more ideal to create content in a format that AI tools can use without needing to convert it.

AI methods and protocols require access to information in machine-readable formats. While they process human-readable text, they also rely on parameters that convey information about the text.

What’s been missing is a definition of what constitutes AI-native content. So far, most efforts have been focused on retrofitting webpage content so that AI tools understand it. Examples include:

  • Creating parallel LLM.txt pages
  • Adding additional schema.org structured data to webpages to help orient AI bots
  • Training bots to understand XML tags, such as the NISO Standards Tag Suite, used to exchange or assemble content that becomes webpages

None of these approaches is truly AI-native, because they still presume the creation of webpages before making content available to AI tools.

Once legacy webpages have been made AI-ready, the focus will shift to how to create new content efficiently — how best to create AI-native content.

Some original content can be database-generated. But much narrative content will still require editorial oversight. Writers will need to decide on the messages to use, the emphasis of information, and the best phrasing. And some new content will be unique in the sense that it isn’t derived from prior content, and must be drafted by humans.

We are still missing the content authoring and management tools that support the development of AI-native content. Human writers need guidance on what bots need so they make good decisions and don’t get confused. Bots require predictability that the information needed to address a question or task is available.

The current approach of creating more webpages and expecting bots to untangle them and find what’s needed won’t be sustainable. Bots are thrown off by duplication. And crunching through repetitive webpages is wastes time, money, and environmental resources.

We have many of the pieces to build an AI-native content development system. (I’m not calling a CMS, since CMSs are intrinsically linked to the website era we are leaving.)

What will be needed is to combine:

  • AI editorial writing tools such as prompts, guidelines, and information QA
  • Schemas to shape message fragments, roles, descriptions, and provenance information (and also build on the granular structures developed by the translation industry)
  • Connection points to protocols for agents and other databases

AI-native content will be very different from the webpage era. Content creators who are comfortable thinking in terms of small pieces will be well-positioned to make the transition.

— Michael Andrews