Signing your content is free. See it on your own words in two minutes.Start free Explore Encypher Seal

Content Provenance for Publishers

Every article and image you publish can now carry cryptographic proof of origin. That proof travels through wire services and aggregators. It cannot be stripped without breaking verification.

The Problem: Distribution Without Documentation

A publisher sends a story to AP, which distributes it to 1,500 subscribers worldwide. Three months later, that story appears in an AI company's training corpus with no attribution, no license, and no payment. The AI company claims they had no way to know the content was owned.

That claim is difficult to contest when the only proof of ownership is a byline in HTML that was stripped during ingestion. Traditional metadata, EXIF data, and even watermarks visible to human eyes can be removed without detection. Once the header is gone, so is the ownership record.

Content provenance solves this at the infrastructure level. The ownership record is embedded cryptographically into the content itself - not in a field that can be deleted, but in the structure of the text and the file container. Removing it requires deliberately modifying the content, which breaks verification and creates its own evidentiary record.

How Provenance Works for Text

Encypher uses two complementary encoding modes for text provenance. The default mode embeds C2PA manifest data invisibly within text content. The encoding is undetectable to readers and survives copy-paste across digital platforms.

The alternative path uses Zero Width Characters for environments like Microsoft Word that handle certain Unicode ranges differently. Both paths produce invisible output that reads identically to the naked eye. Both survive copy-paste across browsers, email clients, and text editors.

The manifest embedded in each article records:

Publisher identity (verified cryptographic key)
Publication timestamp (tamper-evident)
Content hash (detects any modification)
Rights terms (machine-readable, supports Bronze/Silver/Gold tiers)
Author attribution (sentence-level granularity)

Sentence-level granularity is Encypher's proprietary technology. It authenticates each sentence individually using a Merkle tree structure, so verification can confirm not just that a document was published but which specific sentences were used - critical for licensing disputes involving partial reproduction.

Wire Service Distribution

Wire services distribute content at scale to hundreds or thousands of subscribers. Each copy is a potential licensing event and each copy can now carry the same cryptographic record as the original.

The Encypher API supports organizational signing, where a wire service signs content on behalf of member publishers using delegated credentials. AP can sign content for its member newspapers. Reuters can sign its own wire feeds. Each signed asset carries the originating publisher's identity, even when signing occurs at the distribution layer.

This means the provenance chain is established at the point of widest distribution, not just at the original publication. Every downstream subscriber receives content with embedded proof of origin, machine-readable rights terms, and a verified publishing timestamp.

The Willful Infringement Shift

US copyright law treats willful infringement differently from innocent infringement. Innocent infringement means the infringer did not know the work was protected. Willful means they knew, or should have known, and infringed anyway.

Statutory damages under 17 U.S.C. 504 run up to $30,000 per work for innocent infringement and up to $150,000 per work for willful infringement. For a publisher with thousands of articles in an AI training corpus, that difference is the difference between a nuisance claim and a material liability.

When content carries a C2PA manifest with machine-readable rights terms, any party that uses the content has a harder time claiming they did not know it was owned. The manifest can serve as evidence of notice, embedded in every copy, in every downstream location. It can help rebut the "we did not know" defense. Legal outcomes depend on the facts and counsel.

Publishers who have signed their archives hold a fundamentally different legal position than those who have not. The signed archive is not just a compliance artifact - it is the documentation that supports a willful infringement argument if licensing negotiations fail.

Licensing Leverage Before Litigation

Litigation is expensive and slow. Most publishers want licensing revenue, not court battles. Content provenance supports licensing by making the ownership case self-evident before any formal dispute begins.

When an AI company receives a formal notice with an Encypher evidence package, they receive cryptographic proof that is independently verifiable - it does not depend on trusting Encypher or the publisher's assertions. The verification libraries are open source. The signature was made against the publisher's own key. The AI company's legal team can verify every claim in the package without third-party involvement.

This changes the negotiating dynamic. Instead of a publisher asserting ownership and an AI company disputing it, the dispute is over licensing terms on content whose provenance is already documented. That is a more tractable negotiation, and it typically resolves faster.

Image Provenance

Encypher supports 13 image formats including JPEG, PNG, WebP, TIFF, AVIF, HEIC, and DNG. C2PA manifests are embedded in the file container - a JUMBF box appended to the file structure in a format-specific way defined by the C2PA specification.

EXIF metadata is routinely stripped when images are uploaded to social platforms, aggregators, and CDNs. C2PA manifests are not EXIF data. They are embedded in the file container itself and survive most distribution pathways. When an image is downloaded and re-uploaded, the manifest travels with the file.

For publishers whose photojournalists' work ends up on social platforms and in AI image generation training sets, image provenance creates a documented ownership record for every frame. See image provenance for format-specific details.

Signing Your Archive

The most valuable content for AI training is often the oldest - years of reporting, analysis, and photography accumulated before anyone thought to protect it. The Encypher API supports retroactive signing of existing archives.

Batch signing tools in the Python and TypeScript SDKs let you sign thousands of articles in a single job. A publisher with 500,000 articles in their CMS can typically complete a full archive signing over a weekend. Bulk archive backfill is a paid add-on priced per document; ongoing publishing signing is free and unlimited for normal use.

Retroactive signing does not change the publication date in the manifest. The manifest records when signing occurred, separate from the original publication date. This distinction matters for licensing - the manifest documents that the content existed and was owned as of the signing date, which is sufficient for most dispute resolution purposes.

Brand Protection Through Provenance

Publishers face a second problem beyond licensing: misattribution. AI-generated content is being falsely attributed to established news brands. Deepfake news articles circulate under masthead names. Images are re-captioned with fabricated context.

Authentic content signed by a publisher carries a verifiable identity credential. Readers and platforms can verify that a given article or image was actually produced and signed by the claimed publisher. Content without that signature, or with a broken signature, is distinguishable as potentially inauthentic.

This creates a two-sided value: publishers protect their brand by making genuine content verifiable, and readers gain a mechanism to distinguish authentic journalism from fabricated content that borrows a publisher's name.

Fits Your Existing Publishing Workflow

Encypher integrates at the CMS or distribution layer. You do not need to change editorial workflows. The signing API is called at the point of publication, adding a C2PA manifest to the outgoing content before it enters the distribution pipeline.

Integration points

- CMS publish hook (WordPress, Arc, Brightspot, proprietary CMSs)
- Wire service submission pipeline
- CDN or asset management layer for images
- RSS feed generation for text provenance
- API-direct for publishers building custom distribution systems

Embedded provenance also complements existing enforcement channels. It does not replace DMCA takedown processes; it supplements them with pre-existing, machine-readable documentation of ownership that predates any infringement and functions as a form of constructive notice. Where licensing negotiations or litigation fit better than takedowns, the provenance record supports those approaches.

Related Resources

Start Signing Your Content

The free publisher signing tier covers normal publishing use. Start signing today and your archive begins building its ownership record.

Start free Talk to Sales

For Enterprises

For AI Companies

Academic Publishing

Government