Skip to main content

Content Provenance for Publishers

Every article and image you publish can now carry cryptographic proof of origin. That proof travels through wire services and aggregators. It cannot be stripped without breaking verification.

The Problem: Distribution Without Documentation

A publisher sends a story to AP, which distributes it to 1,500 subscribers worldwide. Three months later, that story appears in an AI company's training corpus with no attribution, no license, and no payment. The AI company claims they had no way to know the content was owned.

That claim is difficult to contest when the only proof of ownership is a byline in HTML that was stripped during ingestion. Traditional metadata, EXIF data, and even watermarks visible to human eyes can be removed without detection. Once the header is gone, so is the ownership record.

Content provenance solves this at the infrastructure level. The ownership record is embedded cryptographically into the content itself - not in a field that can be deleted, but in the structure of the text and the file container. Removing it requires deliberately modifying the content, which breaks verification and creates its own evidentiary record.

How Provenance Works for Text

Encypher uses two complementary encoding modes for text provenance. The default mode embeds C2PA manifest data invisibly within text content. The encoding is undetectable to readers and survives copy-paste across digital platforms.

The alternative path uses Zero Width Characters for environments like Microsoft Word that handle certain Unicode ranges differently. Both paths produce invisible output that reads identically to the naked eye. Both survive copy-paste across browsers, email clients, and text editors.

The manifest embedded in each article records:

  • Publisher identity (verified cryptographic key)
  • Publication timestamp (tamper-evident)
  • Content hash (detects any modification)
  • Rights terms (machine-readable, supports Bronze/Silver/Gold tiers)
  • Author attribution (sentence-level granularity)

Sentence-level granularity is Encypher's proprietary technology. It authenticates each sentence individually using a Merkle tree structure, so verification can confirm not just that a document was published but which specific sentences were used - critical for licensing disputes involving partial reproduction.

Wire Service Distribution

Wire services distribute content at scale to hundreds or thousands of subscribers. Each copy is a potential licensing event and each copy can now carry the same cryptographic record as the original.

The Encypher API supports organizational signing, where a wire service signs content on behalf of member publishers using delegated credentials. AP can sign content for its member newspapers. Reuters can sign its own wire feeds. Each signed asset carries the originating publisher's identity, even when signing occurs at the distribution layer.

This means the provenance chain is established at the point of widest distribution, not just at the original publication. Every downstream subscriber receives content with embedded proof of origin, machine-readable rights terms, and a verified publishing timestamp.

The Willful Infringement Shift

US copyright law treats willful infringement differently from innocent infringement. Innocent infringement means the infringer did not know the work was protected. Willful means they knew, or should have known, and infringed anyway.

Statutory damages under 17 U.S.C. 504 run up to $30,000 per work for innocent infringement and up to $150,000 per work for willful infringement. For a publisher with thousands of articles in an AI training corpus, that difference is the difference between a nuisance claim and a material liability.

When content carries a C2PA manifest with machine-readable rights terms, any party that uses the content cannot credibly claim they did not know it was owned. The manifest is formal notice, embedded in every copy, in every downstream location. The "we did not know" defense is eliminated before the lawsuit is filed.

Publishers who have signed their archives hold a fundamentally different legal position than those who have not. The signed archive is not just a compliance artifact - it is the documentation that supports a willful infringement argument if licensing negotiations fail.

Licensing Leverage Before Litigation

Litigation is expensive and slow. Most publishers want licensing revenue, not court battles. Content provenance supports licensing by making the ownership case self-evident before any formal dispute begins.

When an AI company receives a formal notice with an Encypher evidence package, they receive cryptographic proof that is independently verifiable - it does not depend on trusting Encypher or the publisher's assertions. The verification libraries are open source. The signature was made against the publisher's own key. The AI company's legal team can verify every claim in the package without third-party involvement.

This changes the negotiating dynamic. Instead of a publisher asserting ownership and an AI company disputing it, the dispute is over licensing terms on content whose provenance is already documented. That is a more tractable negotiation, and it typically resolves faster.

Image Provenance

Encypher supports 13 image formats including JPEG, PNG, WebP, TIFF, AVIF, HEIC, and DNG. C2PA manifests are embedded in the file container - a JUMBF box appended to the file structure in a format-specific way defined by the C2PA specification.

EXIF metadata is routinely stripped when images are uploaded to social platforms, aggregators, and CDNs. C2PA manifests are not EXIF data. They are embedded in the file container itself and survive most distribution pathways. When an image is downloaded and re-uploaded, the manifest travels with the file.

For publishers whose photojournalists' work ends up on social platforms and in AI image generation training sets, image provenance creates a documented ownership record for every frame. See image provenance for format-specific details.

Signing Your Archive

The most valuable content for AI training is often the oldest - years of reporting, analysis, and photography accumulated before anyone thought to protect it. The Encypher API supports retroactive signing of existing archives.

Batch signing tools in the Python and TypeScript SDKs let you sign thousands of articles in a single job. A publisher with 500,000 articles in their CMS can typically complete a full archive signing over a weekend. The free tier covers 1,000 documents per month. Volume pricing is available for publishers with large archives.

Retroactive signing does not change the publication date in the manifest. The manifest records when signing occurred, separate from the original publication date. This distinction matters for licensing - the manifest documents that the content existed and was owned as of the signing date, which is sufficient for most dispute resolution purposes.

Related Resources

Start Signing Your Content

The free tier covers 1,000 documents per month. No credit card required. Start signing today and your archive begins building its ownership record.

Related