Attestation across the AI Supply Chain

Nick Vincent April 9, 2026
Source

April 27, 2026: I made some edits throughout to condense this longer post and improve flow. I also wrote this additional bullet point summary:

I am feeling hopeful that momentum in the AI evaluation space can help improve data markets and data ecosystems more broadly. I wrote a longer post on the reasons why, currently framing this as “attestations across the AI supply chain”. Here’s the bullet point summary:

This post argues that a shared family of attestation objects could improve several markets across the AI supply chain: consumer choice among AI products, upstream markets for training and retrieval data, markets for audits and evaluations, and downstream markets for AI-assisted artifacts. Broadly, I'll use the term "attestation object" here to refer to a verifiable record about who contributed what, in what role, to which model or output, under what validation process. We can enable this "shared family" by making progress on the design of an appropriate schema for the objects (technical/protocol work), the implementation of supporting policy and regulation, and by building relevant platforms that enable markets for attested evaluation and other data.

The ideas discussed here build on work already underway in at least three adjacent areas: the dataset documentation space (inclusive of efforts such as datasheets, model cards, and data provenance); the Frontier AI Auditing world; and the content authenticity and provenance (e.g. C2PA) world. If we get the design of attestation objects right, interoperable schemas could improve visibility and traceability across training, evaluation, and usage.

I am especially excited about the broad idea here because progress in one part of the AI supply chain can bootstrap progress in others. Existing momentum around audits could foster more training data transparency and better markets for training and retrieval data. Parties interested in provenance for AI-generated outputs, such as people assessing AI-generated code, could help promote transparency upstream. And if communities adopt norms around transparent AI usage, that may drive further demand for transparency up the chain.

Specifically, as of April 2026, I think evaluation as a field has much more momentum, attention, and funding than advocacy around training data documentation or retrieval markets, and thus I think is the most likely "entry point" that can enable full-chain provenance.

I'll note that, in addition to momentum around auditing, we're starting to see some evidence that consumers might care about attestations for AI products. See, for instance, the "Introducing ChatGPT Health" blog post, which emphasizes working with "260 physicians who have practiced in 60 countries and dozens of specialties" to obtain "feedback on model outputs over 600,000 times across 30 areas of focus". I'll refer to attestation in the context of an AI health application as a running example throughout.

One reason I think that discussing attestation matters right now is that data policy conversations often involve two somewhat different strategies for improving data markets. "Data protection" strategies are focused on trying to stop scraping, distillation, or reuse through a combination of law, platform rules, anti-scraping, and poisoning; all these serve to incentivize market participation by introducing friction and means to exclude information. An "attestation-forward" strategy tries to get AI users and/or regulators to care about verified provenance relationships upstream of AI products. These approaches can complement each other (in the short term, effective data protection may produce more pressure to get attestations working) but also likely benefit from somewhat divergent policy approaches. Data protection efforts try to block unwanted scraping and distillation; attestation efforts that succeed in making verified provenance, evaluation, and output identity matter could make "just scraping anyway" less valuable.

Very concretely, an attestation-forward data strategy would involve roughly four broad types of attestation objects:

Here's a short version of the key claims that motivate an attestation-forward strategy:

If the schema for attestation and provenance can be synchronized across the supply chain, we can achieve a win-win-win-win:

The main downside -- the answer to "why aren't we literally doing this today if it's so good?" -- is that attestation will impose new costs on various actors. Training and evaluation labor may cost more under this paradigm (though hopefully, with real returns in terms of model utility, safety, etc.). And, we still need to answer a number of technical and design questions to make attestation work.

While much of my previous writing has focused on Win 1, I'm excited about all of these angles. (Of course this is the optimistic view -- see the end of this post for a more negative set of "likely objections".)

Most of this post will be aimed at motivating attestations across the supply chain. I hope to write more in the future about realistic pathways towards achieving this vision. I'll briefly note that there are at least five ways to create incentives for attestations and that I do think we have some viable low-hanging fruit in the policy space. On the incentives side, the main channels I see are: direct mandates for some kind of attestation in narrow high-stakes settings; pressure via procurement rules from enterprise buyers and public bodies; asks from insurers and other liability-sensitive counterparties; self-interest from labs seeking product differentiation and/or anti-scraping incentives; and direct user preference / consumer demand. There is already real progress around auditing, public procurement guidance, and regulatory transparency requirements. The clearest consumer-facing progress so far is probably around output provenance via standards such as C2PA, and insurance has some early movement from firms such as Marsh.

On the policy side, some "minimal asks" include: getting policymakers to "bless" a small number of interoperable formats and a credible process for deciding who counts as a verifier; requiring attestations in certain contexts; disincentivizing bogus provenance and evaluation claims using consumer protection tactics; and maintaining some data protection and licensing "backdrop" since contributors are less likely to participate in markets at all if the baseline expectation is still that their work will simply be scraped and reused anyway.

Next, I'll describe roughly how I think buyers are choosing AI today and why the current approach is incomplete. Second, I'll summarize the state of play for upstream markets for training data and evaluation labor. Third, I'll describe how interoperable attestation objects can link contributions, evaluations, and outputs. Finally, I'll try to tie the whole thing together by arguing that this kind of attestation layer can improve incentives for buyers, labs, and knowledge workers. More broadly, I want to show how these existing approaches can be treated as parts of a common AI trust stack, and to argue that connecting them has underappreciated consequences for market design, labor visibility, and downstream product choice.

Because this post also revisits some points from earlier Data Leverage posts and related references, I'll add an appendix at the end briefly noting those connections.

Or put another way, this is yet another blog making the case for data transparency and healthy market flow as a coalition-building cause that can unite auditors, content creators, AI labs, and the general public around a mutually beneficial set of incentives.

Core assumptions behind the proposal

The claims above rest on a core set of assumptions:

With that overview in place, here's the longer argument:

I. The market problem: buyers of AI products can see outputs, but not the chain behind them

Imagine someone who is choosing from a menu of AI products. Perhaps this person is a consumer who just wants to use AI for small personal projects ("help me build personalized note-taking software"), a researcher seeking some occasional help with research tasks ("help me debug this LaTeX issue"), a hobbyist software developer seeking to find a tool to use heavily, somebody seeking a rigorously tested AI system that can offer medical advice, or a company's CTO looking to make an enterprise contract for org-wide AI access.

Across all these contexts and scales, a person buying an AI product is (for now) mainly buying "information outputs" such as answers to questions, documents that fulfill some requested purpose, or pieces of code that can complete certain tasks. A user buys an AI subscription or AI credits, and now can enter an input (a prompt/query) and get an output. Modern frontier models can provide a dazzlingly varied set of outputs: a single tool can first give you something like a search engine results page, followed by a complex working piece of code, a working spreadsheet, and then finally a poem and an accompanying piece of visual art. Some systems are even agentic and can produce outputs that are themselves actionable (e.g., the output is a series of actions, and then the system performs some "actuation," often using the user's computer and/or API keys).

Generally (and we'll get more precise about this later in the article), we choose to spend money (perhaps a lot of money!) on AI if we think those outputs will be "good" in some sense. We might think the AI is good in a statistical sense because we inspected the outputs, or because we read an evaluation study with robust evaluation of many outputs, or because we make some assumptions about how an AI was developed that make it "good." However, users could also consider facts about how the model was built (such as information about the training data).

If you are using AI to make personalized note-taking software, facts about the training data may not be very high stakes. However, if we want to buy an AI product to ask medical questions, we might care deeply about both AI outputs and inputs to said AI. We most likely want to know that a doctor has looked at some of the model's outputs and attested they are "good"; we might also want to know that there were some inputs from actual doctors such as content from medical textbooks and research papers used in the development of the original model. Finally, when we get a medically relevant output from an AI system -- e.g. the system we paid to access has printed out some text telling us to take a medicine or take some action -- we ideally want that output to be verifiably linked to information about the evaluation of the model, and ideally the training data as well.

II. How AI products are chosen today: four channels of evidence

If we look at the key factors that might cause someone shopping for AI products today to pick one product over another, I think we can bucket these into four groups. Put another way, most "AI buyer choices" can be traced to evidence or beliefs from one of four channels of information:

The product channel matters a lot if you're buying an AI product today. However, it's the channel I'll discuss the least in this post, in part because the focus here is on data-related differentiation but also because generally getting information about product factors currently works pretty well and thus enables relatively healthy market dynamics. We also won't talk much about the internals channel, e.g. making decisions based on the results of interpretability-based studies, though I think this will be increasingly important to think about going forward. Instead, this essay focuses on the missing value of the inputs channel, and on how attestations can connect inputs to outputs.

At present, I think most AI users are choosing primarily based on beliefs about outputs plus some broad assumptions about inputs. Those beliefs about outputs may range from "my friends told me Claude Code outputs good code" to "I read the recent model cards" to "I run my own private eval set." The corresponding assumptions about inputs are usually something like: the frontier labs probably sourced strong data, had capable people inspect and filter it, and used best-in-class training practices.

That is often reasonable. Outputs matter most for many use-cases, and if I only care about outputs, information about training is mainly useful as a proxy for output quality. Still, distinguishing these channels can improve AI literacy by more clearly separating the facts about outputs and inputs that might drive consumer behavior. That helps both consumers choose and developers differentiate.

III. Why outputs alone may not always be enough

How do we know if the information outputs from an AI system are good? Let's return to our first example: imagine a user who is looking to buy an AI product to help write personal note-taking software (and more generally play around with AI-coded personal software). This user probably plans to send a prompt to an AI model that looks something like this: "write me the code for software I can use for daily note-taking" (perhaps with a bit more detail regarding their personal preferences for how the software should work).

Say this user can do a "trial run" and send this prompt to three different AI providers, A, B, and C. How will this user decide which provider product to buy?

Option 1: inspect outputs yourself

One option would be to look at the output from each model and assess the quality of that output. If the user is already an expert in this domain, perhaps they can just read the code produced by all three models and identify an obvious "best" model. Of course, if they wanted to have some statistical rigor in evaluation they'd probably want to look at more than one sample per model, though this will create some budgeting challenges and basically means the user needs to conduct their own benchmarking study. For a fuller treatment of the topic of looking at model outputs statistically, see e.g. evalstats from Ian Arawjo.

Option 2: rely on published evaluations

Another option that also uses the outputs channel would be to look at the results of an already published benchmarking study, either from the model provider or from a third party. In this case, the user would basically be making the assumption that "other people tried similar inputs, and performance on those related tasks will probably proxy for performance on the task I care about." Here the rigor of the benchmark matters: how many total variants were tried, who checked that the output worked, etc.

When AI developers themselves or third party evaluation organizations produce a benchmarking study, the results of these studies will impact the organizations' reputation, so generally we should expect these kinds of studies to be reasonably consistent in terms of providing real signal. Benchmarking will always be noisy, but over time, there are incentives to evaluate models in ways that accurately reflect differences in capabilities. In other words, this approach should be pretty decent, and if we only ever have access to these kinds of signals for consumer decision-making, we could sustain a reasonably efficient market for AI products.

Option 3: rely on vibes and reputation

A third option after "do your own evals" or "read the best recent evaluation report" is to simply seek out a more general vibes-based ranking for the "most intelligent" model or the "best" AI developer. I think this is what many people are doing in practice right now, though there is certainly a small population of power users who diligently run their own personal benchmarking processes each time a new model releases.

This approach is also not necessarily that ineffective. A vibes-based approach likely does integrate reputation and social proof in a way that matches other kinds of consumer decision-making and creates decent outcomes. Not everybody needs to read the auditing deep cuts or full review history for every product they buy.

What's interesting is that using vibes to generally rank AI companies actually starts to mix information from the outputs channel and the inputs channel, because people are making some assumptions about training data and the training process itself. I think most people choosing with vibes are indeed assuming that AI developers themselves and third parties have evaluated the outputs of the model and found those outputs to be good, and perhaps that AI developers have special internal-only evaluation practices and employees that are especially skilled at AI evaluation. But this approach likely also involves assuming that the labs have used "best in class" training approaches, have sourced the best data they can get their hands on, and have a large team doing whatever they can to make models better. Critically, vibes-based decision-making is not entirely inputs-agnostic, and it's important to acknowledge this because I think it means people will care about inputs in the long run.

The issues with current status quo

However, this is also where we can start to envision some room for improvement in the AI consumer experience. If consumers had more specific knowledge about the inputs to an AI product, both broad facts about the training data and facts about the evaluation process, this could be useful to them. Even as underlying training techniques enable models to become more general and more broadly intelligent, it seems likely that the presence of certain types of data, data about certain topics, and data from certain experts in model inputs will proxy for output quality.

In the medical case, information about the outputs might be more important. If sketchy training data leads to outputs that top doctors give a thumbs up to, we might take that. And overemphasis on training data may create metric chasing, e.g. including social media posts from medical doctors in pre-training data to drive up the volume of tokens from MDs. But input information should still provide some signal. Either we want real medical data in the training set or we want solid evidence our AI system has figured out medicine from first principles.

More broadly, both output studies and input details are proxies for future quality. For statistical systems, that is often the best we're going to get. The under-emphasized problem is that as models become more general, comprehensive output evaluation across all the things they can do becomes prohibitively expensive. Many people have commented on this; see, for example, the "Evaleval" project and its "Every Eval Ever" effort, Raji et al. on "everything in the whole wide world benchmarking", and recent online discussions in which people contrast their personal experience with a model and that model's benchmark results.

The labor required to really, fully, comprehensively evaluate increasingly general technologies will be very costly. We're going to need a whole lot of doctors and scientists to give feedback on a whole lot of LLM outputs to be sure we can use LLMs effectively in those contexts. But we're also going to need a whole lot of general users, with distinct combinations of niche interests, needs, preferences, etc., to give feedback as well. All this feedback will be relevant to evaluation of current models, but also relevant to the training of new models, the design of reinforcement learning environments, etc. It's going to take time to do all this evaluation, and will in some sense involve a kind of commandeering of the entire knowledge economy.

Simple napkin math suggests the relevant numbers get big quickly. Suppose we want rigorous evaluation coverage across 30 major medical specialties, with 1,000 representative cases per specialty, 10 independent physician reviews per case, and 30 minutes per review. That is 300,000 physician judgments, or 150,000 physician-hours, for one pass over one model version; at $200/hour for specialist time, that is about $30 million just in doctor-review labor. And that is before benchmark design, adjudication, compliance, or program management.

It's worth thinking about how many hours of physician labor you want upstream of a chatbot prescribing your medicines!

Note that here I've focused on a running example of a consumer from the general population. I think this story is useful, but you might be thinking "I just don't believe consumers will care that much". You may be right -- and critically, all the above argumentation applies directly to the context of enterprise AI users, who have much more bargaining power and are very likely to care about exactly all these issues described above.

IV. What current "upstream markets" look like: deals, marketplaces, and labor

Before an AI product is brought to market for consumers to consider, developers must also participate in two important upstream markets: acquiring training data and acquiring evaluations. Some companies may get their training and evaluation data without direct outside assistance, e.g. just scrape the training data and use in-house employees to do evals, but I expect as the industry matures both of these domains will become more market-like, with a broader set of data sellers at the training level and evaluation providers to choose from. Note here I'll sometimes use "data" a bit loosely to cover both training inputs and evaluation labor, since both are forms of human knowledge work that developers are trying to source.

There is already an informal market for evaluators. It is not yet as legible or standardized as something like AWS Marketplace, but frontier labs already hire or contract with domain experts, benchmark designers, red teamers, and specialized third-party evaluators. Organizations like METR and Apollo along with similar evaluation and auditing groups, are early examples of a more specialized evaluation layer emerging around frontier AI systems.

Now, let's step away from the perspective of someone picking between AI products and instead consider a developer who wants data to build their AI product, or a seller, either an individual or an organization, looking to monetize their content.

Three current market forms

I'll bucket the ways we can buy training data and evaluation labor right now into three broad categories. These are ideal types rather than perfectly separate boxes, but they capture a lot of the current landscape:

One nearby fourth form, if one wanted to break the taxonomy out further, would be recurring API or feed access rather than transfer of a bulk dataset: paying for ongoing, metered access to a live corpus or stream. I think that form mostly collapses back into Categories 1 and 2 depending on how standardized the arrangement is, so I am leaving it as an adjacent case rather than promoting it to a main bucket.

See also the data deals tracker here from the authors of "A Sustainable AI Economy Needs Data Deals That Work for Generators".

Collectives and intermediaries

Another idea that's gained some popularity over the last decade is the notion of joining a data cooperative, collective, union, or intermediary. Generally, the idea here is that you join the coop, contribute data, and some actor in the coop transacts in a market on your behalf. In fact, there are some ways to join a data cooperative today. Organizations such as Swash and Brave Rewards monetize various online actions and tasks. The user bases of data-coop-style tools are very small compared to general Internet usage. For reference, Brave reported 101M monthly active users as of September 30, 2025, while StatCounter has recently put Brave at roughly 1% of worldwide desktop browser share. Some of the means of monetization, e.g. watching ads, are also quite different from an aspirational vision of data markets in which data sellers are spending their time crafting valuable documents and records and then being rewarded for their contributions to shared epistemics. I do really believe that a useful North Star is trying to shoot for a world in which more overall human work looks like the good parts of journalism, scientific peer review, editing Wikipedia, and participating in Q&A, with the bad parts minimized or automated away.

While the above data cooperatives involve a technical approach to enabling cooperation, and participation remains niche, I actually think there is a lot of low-hanging fruit for applying the collectives/coops/unions idea to the three types of markets described above.

First, for the big boardroom deals, we could imagine pools of users directly weighing in on the deal conditions. For example, when it comes time for Reddit to renegotiate with Google or others, they loop users in, either voluntarily or under threat of collective action. Second, data coops can post their own quality-assessed CSV file for sale on AWS Data Marketplace right now, but they still need social and technical support to actually organize in the first place and produce a good CSV. And there is a large body of work, including ongoing efforts, that aim to support collective bargaining for data workers, see e.g. the body of work from Dr. Saiph Savage and Data Workers' Inquiry from DAIR.

One consideration is that successful organization of data workers may actually involve a shift from crowdwork markets to collectives that sell bundled data, i.e. moving more overall information from Category 3 to Category 2, or even 1.

Two policy levers: data protection and attestation

There is also an emerging set of actors interested in supporting various forms of what we might term "data protection for the AI age". Cloudflare's public support of anti-scraping is one example.

I think it is useful to start from a fairly grim baseline: many people assume their data will be scraped, distilled, and reused unless they actively block it through anti-scraping, anti-distillation, poisoning, preference signals, legal enforcement, or some mix of these.

In general, supporting data protection action can make it more likely that data sellers will engage with markets rather than give up because everything will be scraped anyway. In other words, some actor, public or private, needs to do some level of policing for a market to emerge. This is a bitter pill from an open-culture perspective, since open culture is foundational to modern ML, including both peer production of critical training data and open-source implementations of algorithms and tooling. Still, there is likely a middle ground in which open culture continues in many domains while others become more financialized. Any actions that move us away from the current situation are probably net good in the short term, though we should be cautious about a pendulum-swing-too-far data-cartel scenario in which nobody can use any information without paying.

Under an attestation-centered regime, data might still be readable or even openly available, but what becomes scarce is the ability to make a trusted downstream claim about provenance, licensing, evaluation, or output identity. Scraping without the attestation may no longer be enough to compete in markets where buyers, enterprise customers, insurers, or regulators care about verified inputs and verified outputs.

This is why attestation is particularly appealing for public AI strategy. It suggests a path where openness and bargaining do not always have to be direct opposites. In the strongest version, a dataset or corpus might be free for research or public-interest AI use, while commercial actors must pay if they want to rely on it in an attested product or otherwise market the trusted provenance relationship. These approaches are complements rather than substitutes: better data protection may be needed to get participation off the ground, but attestation changes the long-run market game by moving value toward attestations rather than raw tokens.

V. Why attestation improves incentives

We need to address three fundamental constraints that affect value measurement and value transfer in the context of AI:

We could somewhat address these challenges by just trying to coarsely distribute value to data creators: just worry less about provenance and distribute resources through something like a data-dividend public wealth fund. I think that may be a useful transitional approach in the short term, but likely not the ideal end state.

Why AI buyers care

For buyers, the benefit is straightforward: inputs and outputs become inspectable product features. Instead of relying only on benchmarks, reputation, or vibes, a buyer could ask more specific questions: How many credentialed authors contributed to training data? What process verified their work? What upstream sources were actually licensed? Was this particular output generated by a model version whose provenance and evaluation records I can inspect? In higher-stakes domains, that still does not replace output inspection, but it gives downstream choice much more structure.

In some settings, we can expect transparency around model outputs themselves to become a requirement rather than a premium feature. A regulator or enterprise buyer may increasingly want not just "trust us" but verifiable disclosure that a given artifact came from a given model and from a model with a certain attested pedigree.

If those attestations become known to be predictive -- e.g. the AI model with stronger attestations around its training data is actually better at giving health advice -- they also reduce some of the appeal of indiscriminate scraping or distillation. A rival might still imitate outputs, but the attested relationship itself becomes part of what the buyer is purchasing.

Why contributors and collectives care

For creators, evaluators, and data workers, the important shift is from selling anonymous tokens to participating in visible, renewable labor relationships. Attestations for training and evaluation can be managed and aggregated by some kind of intermediary, collective, or union so people do not need to transact individually with AI companies.

If verified attestations become important to consumers or regulators, that would immediately bring more data work out into the open. It would make it easier to treat these jobs as real careers and harder to treat data workers as invisible or precarious labor. Note that while I've been using doctors as a convenient running example -- a relatively small group with easy credentials to verify and existing social infrastructure for collective action -- the same logic applies to many other kinds of knowledge work.

The economics of maintaining a trusted relationship are also very different from the economics of acquiring tokens. The actual bundles of information in training data are and will remain mostly non-rival, and often hard to exclude. But contracts with people or collectives are not just about buying rights to transfer bits. These are contracts for labor which must be renewed and maintained. A world of data-with-attestations is therefore a world with more ongoing relationships and more room for bargaining.

Why labs, regulators, and public AI builders care

For labs, credible attestation could become a meaningful source of differentiation, though adopting any particular standard is a collective action problem. This is where auditing, evaluation, and safety cases begin to connect to regulation, insurance, procurement rules, and industry self-governance.

The building blocks for this market already exist in the upstream markets described above. The main difference with attestations is that the fact a person made or evaluated some portion of the data would no longer be hidden, but rather made prominent. If an AI builder who uses attested or full-consent data early on is able to produce good models and prove that buyers care about those records, that creates pressure for other AI developers to play along as good-faith actors in the market.

An immediate downside for some labs is that data and evaluation may cost more.

A version of an attestation-based provenance system that gets things really right could be good for AI developers, data creators who want to add knowledge via pretraining, content creators who want to be rewarded when their content is retrieved, public-interest model builders who want to distinguish civic from purely commercial use, and people who want to use AI to produce things and sell them, e.g. software developers selling vibe-coded software. In the strongest version, the flow of attested data and AI outputs also becomes a governance lever, in the sense that healthy markets for anything act as both a kind of computation over preferences and a kind of governance.

VI. Implementation and caveats

Enforcement will be partly technical and partly social

Here I've talked in broad strokes about attestation, provenance, and verification a bunch, but have remained relatively agnostic to any particular implementation details. Attestation and verification will involve a mix of technical and social means of enforcement. They might lean more heavily on technical innovations, e.g. new approaches for embedded cryptographic signatures, or more heavily on social forces, e.g. organizations that enforce attestations through reputation.

At a high level, there are at least six areas of existing/ongoing work that could support attestation:

How we might transition towards attestations

Recent trends in dataset documentation suggest a world of transparent data for AI models is not yet around the corner. Dataset details in model and system cards remain extremely short and vague, as reflected in the Foundation Model Transparency Index, and understandably are likely to remain short and vague until additional legal clarity is achieved. That said, for both researchers and consumers, it will be very valuable to keep an eye out for new open releases, such as models from EleutherAI, AI2, and academic groups (see e.g. recent work from Fan et al. on model training using highly compliant web data).

But critically, almost all the actions that data protectors might take now, including developing and offering anti-scraping technologies, helping to socialize anti-scraping and AI preference signals, and supporting related research on data value estimation, full-consent LLMs, partnerships with national labs, etc., are likely to also make it easier to work toward attested data.

Appendix I: how this proposal connects to earlier posts

This appendix is optional background for returning readers (and so I can keep track of what points I've been making in too many distinct blogs!). It shows how the proposal connects back to the "quasi-enclosure" piece (itself a follow-up to "tipping points"), the "data rules" piece, and an earlier series of posts discussing evaluation and data labor.

On "quasi-enclosure" and "tipping points"

Those posts argued that AI can increase useful knowledge work while still routing the resulting data into private pools, creating a precarious quasi-enclosure and raising the risk of content-ecosystem tipping points. If labs compete partly on verifiable claims about who contributed, who evaluated, and under what terms, then some of the value currently captured silently through closed transcript pools can instead flow through explicit labor relationships, collective bargaining, and public-facing quality claims. That does not solve the commons problem by itself, but it does create a more plausible "golden path" in which AI products remain broadly useful while preserving stronger incentives for the people and institutions that keep producing the knowledge those systems depend on. To retain the benefits of maintaining knowledge commons in general, we also need interventions and social norms aimed explicitly at maintaining those commons.

On "data rules"

The "data rules" piece argued for clearer and more enforceable options governing how data can be used across training, retrieval, and evaluation, in ways that help both creators and model builders. Attestation is one way to make such rules practical. A rule only matters if someone can later verify what happened: whether a dataset was licensed for training but not evaluation, whether an evaluation set was kept separate from training, whether a model output can be linked back to a model version and a body of supporting evidence. In that sense, attestation is not a rival proposal to clearer data rules; it is one concrete way to operationalize them.

Of particular note: designing a schema for attestation could help to surface the types of contracts available to data creators and AI developers.

On "selling AGI like AG1"

That post argued that as AI products get more expensive, buyers will want more than vibes and vague claims about "more intelligence". The attestation proposal gives a direct answer to that consumer problem.

On the evaluation-related posts

The earlier evaluation posts argued that output evaluation is costly, incomplete, and increasingly central as models become more general. They also argued that evaluation labor itself could become a major site of bargaining power. The attestation proposal follows up on these ideas. An evaluation attestation can say who did the work, what they reviewed, how much they reviewed, and under what process. That helps buyers interpret quality claims, but it also helps workers and collectives bargain over the provision of evaluation labor because the work is more visible.

Appendix II: objections and replies

Q: What if buyers simply do not care about attestations?

A: This is one strong practical objection. If most buyers continue to choose AI products based mainly on outputs, price, latency, UX, and enterprise features, then attestations will not do much work on their own. Attestations definitely will not magically displace those factors, but they can become additional decision-relevant signal, especially in higher-stakes domains and in procurement settings where compliance, liability, and reputation matter. If consumer and enterprise demand never materializes, then the proposal likely depends much more heavily on regulation, insurance, or industry self-governance than on pure market pull. See the short discussion in the intro about practical incentives.

Q: Couldn't this cause people to confuse provenance with "explainability" or "justification"?

A: Definitely possible. A provenance chain is not the same thing as a causal explanation of why a specific output is correct. Knowing that doctors contributed to training data or reviewed some model outputs does not by itself prove that a particular medical answer is trustworthy. The point of attestation is therefore not to replace epistemic validation, but to supplement it. It gives buyers more information about the system behind an output, not a complete proof that any one answer is right.

A critical part of making this system work would be ensuring that the organizations participating in the attestation chain (e.g., medical organizations) ensure that attestations are, for the most part, epistemically valid.

Q: Won't companies just game the metrics once "attested inputs" become a selling point?

A: They might. If the market starts rewarding visible counts like "number of doctors involved" or "number of licensed sources," firms will have incentives to optimize for what is easiest to attest rather than what most improves the system. That is a classic Goodhart problem. The best response is to design attestations carefully! Even then, some gaming pressure is inevitable.

And similarly, this issue can be partially solved by ensuring that the organizations providing attestations have an incentive to keep their attestations aligned with "real value" in their domain. Or put another way, if medical organizations continue to police the medical competence of their members, the approaches for this kind of organizational quality control can be used to combat metric gaming.

Q: Doesn't this just move trust from AI labs to auditors and certifiers?

A: In part, yes. Any attestation regime creates a new bottleneck around whoever verifies claims. That raises familiar worries about certification cartels, regulatory capture, and barriers to entry for smaller labs or open projects. Regulators and consumers will both have to continue contributing to governance questions around who gets to be trusted as a verifier and under what standards.

Q: How can useful attestations coexist with privacy, secrecy, and anti-gaming concerns?

A: There is a genuine tension here. If attestations are highly specific, they may reveal private contributor information, proprietary dataset composition, or evaluation details that make benchmarks easier to game. If they are too vague, they collapse into branding or marketing copy. There is no perfect resolution. The practical goal is likely to be selective disclosure: enough specificity to support meaningful external checking, without assuming that every contributor identity, dataset row, or eval item can be made public.

New research may also make this question less of a concern.

Q: Why would attestation reduce scraping or distillation if a rival can still match output quality?

A: If a competitor offers similar or better outputs at lower cost, many buyers will still switch regardless of cleaner provenance. A more modest claim is that attestation can create some product differentiation and make fully consent-based development more legible, not that it eliminates the economic pull of imitation. Scraping and distillation remain live incentives unless buyers, regulators, or counterparties attach real value to the attested relationship itself, whether that relationship concerns training data, retrieval sources, evaluation labor, or the identity of the output-producing model.

Importantly: for something like health advice, people are paying, in part, for some amount of confidence or peace of mind. Here, the presence of attestations really matters and makes a material difference between model A with attestations and model B with very similar outputs but no attestations.

Q: Does visibility alone actually create bargaining power for contributors?

A: Not by itself. Making data work legible is not the same thing as guaranteeing payment, better labor conditions, or stronger negotiating leverage. Those outcomes also require institutions that can enforce terms, whether through law, contracts, unions, collectives, or platform governance. Attestation can make bargaining easier by making contributions auditable and portable, but it does not substitute for the underlying political and organizational work.

Appendix III: a simple attestation schema for AI inputs, evaluations, and outputs

Here is one toy worked example of a concrete schema. This sketch borrows from several existing approaches described above. I expect this section to be made pretty redundant by future "full spec docs", but I think it's clarifying to include some toy examples alongside this post.

Design goals

The object model should:

Minimal outer wrapper

At a high level, the same outer structure can be reused even when the subject matter changes:

{
  "schema_version": "0.1",
  "object_type": "training_contribution",
  "subject": {
    "kind": "dataset_shard",
    "id": "med-001"
  },
  "attester": {
    "kind": "publisher",
    "id": "licensed-med-publisher"
  },
  "contributor": {
    "kind": "author",
    "credential": "MD"
  },
  "claim": {
    "statement": "A credentialed contributor supplied content used in model development."
  },
  "evidence": {},
  "verifier": {
    "kind": "third_party_auditor",
    "id": "auditor-a"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {},
  "signatures": [],
  "disclosure": {
    "public": true
  }
}

The wire format does not have to be this JSON shape. The same logical object could be encoded using a C2PA manifest, an in-toto statement plus predicate, or another signed envelope.

Suggested object types

The four core user-facing object types are:

Two plausible auxiliary types are:

Toy examples

Training contribution

{
  "schema_version": "0.1",
  "object_type": "training_contribution",
  "subject": {
    "kind": "dataset_shard",
    "id": "med-042"
  },
  "attester": {
    "kind": "publisher",
    "id": "licensed-med-publisher"
  },
  "contributor": {
    "kind": "author",
    "credential": "MD",
    "organization": "licensed medical publisher"
  },
  "claim": {
    "statement": "A peer-reviewed cardiology chapter authored by a credentialed physician was included in licensed corpus shard med-042.",
    "usage": "pretraining"
  },
  "evidence": {
    "artifact_id": "chapter-8841",
    "license_id": "license-332"
  },
  "verifier": {
    "kind": "third_party_auditor",
    "id": "auditor-a"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {
    "downstream_model_run": "model-x-pretrain-run-07",
    "downstream_model": "model-x-2026-03-15"
  },
  "signatures": [
    "sig:licensed-med-publisher:abc123"
  ],
  "disclosure": {
    "public": true,
    "auditor_only_fields": []
  }
}

Evaluation record

{
  "schema_version": "0.1",
  "object_type": "evaluation_record",
  "subject": {
    "kind": "model_version",
    "id": "model-x-2026-03-15"
  },
  "attester": {
    "kind": "evaluation_org",
    "id": "eval-lab-1"
  },
  "contributor": {
    "kind": "reviewer",
    "credential": "MD",
    "experience_years": 20
  },
  "claim": {
    "statement": "A practicing physician reviewed 100 outputs across 10 health-related tasks.",
    "task_set": "health-general-v2"
  },
  "evidence": {
    "sample_size": 100,
    "hours": 100,
    "report_id": "eval-report-77"
  },
  "verifier": {
    "kind": "assurance_process",
    "level": "independent-third-party"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {
    "benchmark_report": "eval-report-77",
    "related_model": "model-x"
  },
  "signatures": [
    "sig:eval-lab-1:def456"
  ],
  "disclosure": {
    "public": true,
    "auditor_only_fields": [
      "sampled_output_ids"
    ]
  }
}

Output provenance

{
  "schema_version": "0.1",
  "object_type": "output_provenance",
  "subject": {
    "kind": "model_output",
    "id": "response-abc123"
  },
  "attester": {
    "kind": "model_provider",
    "id": "provider-y"
  },
  "claim": {
    "statement": "This output was generated by model-x version 2026-03-15."
  },
  "evidence": {
    "request_hash": "req-8fd2",
    "model_version": "model-x-2026-03-15"
  },
  "verifier": {
    "kind": "provider_signature",
    "id": "provider-y-signing-key"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {
    "model": "model-x",
    "evaluation_attestations": [
      "eval-report-77"
    ],
    "training_attestations": [
      "train-attest-8841"
    ]
  },
  "signatures": [
    "sig:provider-y:ghi789"
  ],
  "disclosure": {
    "public": true
  }
}

Downstream use

{
  "schema_version": "0.1",
  "object_type": "downstream_use",
  "subject": {
    "kind": "software_release",
    "id": "health-app-v12"
  },
  "attester": {
    "kind": "developer",
    "id": "health-app-studio"
  },
  "claim": {
    "statement": "This iPhone health app release used GPT-6.0-med1000 during development.",
    "usage": [
      "code_generation",
      "documentation_drafting"
    ]
  },
  "evidence": {
    "build_id": "ios-build-991",
    "model_version": "gpt-6.0-med1000"
  },
  "verifier": {
    "kind": "workflow_log",
    "id": "build-pipeline-attestor"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {
    "model": "gpt-6.0-med1000",
    "related_outputs": [
      "response-abc123"
    ]
  },
  "signatures": [
    "sig:health-app-studio:jkl012"
  ],
  "disclosure": {
    "public": true
  }
}

If objects like these are interoperable, a buyer or auditor can move:

Discussion in the ATmosphere

Loading comments...