What Is Content Provenance?
Content provenance is the cryptographic record of a piece of content's origin, authorship, and modification history. It is embedded directly into the content - not stored in a separate database - so the record travels wherever the content goes.
The C2PA open standard defines how content provenance manifests are structured and verified. Encypher authored Section A.7 of the C2PA 2.3 specification, which covers text provenance - the framework for articles, social posts, and any unstructured text.
Why Content Provenance Matters Now
Three forces converged in 2024 and 2025 to make content provenance a practical necessity rather than a theoretical concern.
EU AI Act Deadline: August 2, 2026
Article 52 requires AI systems that generate images, audio, and video to mark outputs as AI-generated in a machine-readable format. C2PA manifests satisfy this requirement. Providers who miss the deadline face fines of up to 3% of global annual turnover.
Synthetic Media Explosion
AI-generated images, audio deepfakes, and synthetic text appear across news, social media, and enterprise content pipelines. Without provenance, determining what was created by a human versus an AI system requires statistical guessing - which carries false positives. Cryptographic proof eliminates the guessing.
Publisher Rights Erosion
AI training and RAG systems use publisher content at scale. Without machine-readable rights embedded in the content, publishers cannot establish formal notice - a prerequisite for willful infringement claims. Content provenance converts passive copyright into active, machine-readable rights terms.
The C2PA standard - backed by Adobe, Microsoft, Google, BBC, and OpenAI, with over 200 member organizations - provides the open infrastructure for content provenance across media types. The standard published its 2.3 specification on January 8, 2026, including Section A.7 on text provenance, which Encypher authored.
How Content Provenance Works
Content provenance works through three steps: signing at creation, embedding in the content, and verification by anyone.
Cryptographic Signing at Creation
When content is created or published, the Encypher API signs it using the publisher's private key. The signature covers the content's hash - a fixed-length fingerprint of the content - along with metadata about the creator, creation time, and any rights terms. If the content is later altered, the hash no longer matches the signature and verification fails.
Manifest Embedding
For images, audio, and video, the C2PA manifest is embedded in the file's binary container using a JUMBF (JPEG Universal Metadata Box Format) structure. For text, Encypher uses proprietary invisible embedding that persists through copy-paste, email, and most distribution pathways. The embedded data includes the signed claim, the public key certificate chain, and any ingredient references (links to source content used to create the piece).
Free Verification by Anyone
Anyone can verify content provenance using open-source C2PA libraries or the Encypher verification tool. Verification extracts the manifest, checks the cryptographic signature against the content hash, validates the certificate chain, and returns the provenance record. No account required. No Encypher servers involved in verification - it works entirely on the content itself and the publisher's public key.
Content Provenance Across 31 Media Types
The C2PA standard supports content provenance across five media categories. Encypher implements all 31 MIME types through a single unified API.
Images (13 formats)
JPEG, PNG, WebP, TIFF, HEIC, HEIF, AVIF, GIF, SVG, BMP, DNG, JPEG 2000, JPEG XL
Image provenance guide →Documents (5 formats)
PDF, DOCX, PPTX, XLSX, plain text (TXT). Section A.7 of C2PA 2.3 covers unstructured text.
Text provenance guide →Fonts (3 formats)
TTF, OTF, WOFF2
The Encypher API handles format detection automatically. Submit any supported file and the API identifies the MIME type, selects the appropriate embedding method, and returns the signed content.
The C2PA Open Standard
The Coalition for Content Provenance and Authenticity (C2PA) is the standards body that defines how content provenance manifests are structured, embedded, and verified. With over 200 member organizations - including Adobe, Microsoft, Google, BBC, OpenAI, Qualcomm, and Intel - C2PA is the dominant open standard for digital content provenance.
C2PA 2.3, published January 8, 2026, introduced Section A.7: the text provenance framework. Encypher authored this section. Erik Svilich co-chairs the C2PA Text Provenance Task Force.
C2PA operates at the document level: a C2PA manifest authenticates a document as a whole. Encypher's proprietary sentence-level Merkle tree technology works within this framework to provide granular attribution at the sentence or paragraph level - a capability not defined by C2PA itself, but compatible with it.
Content Provenance vs. Content Detection
Content detection tools - AI detectors, deepfake detectors, and statistical watermark readers like SynthID - attempt to identify content by analyzing statistical patterns. Content provenance proves origin through cryptographic verification. The distinction matters in practice.
| Property | Content Provenance | Content Detection |
|---|---|---|
| Method | Cryptographic signature | Statistical pattern analysis |
| Accuracy | 100% - deterministic | Variable - probabilistic |
| False positives | Zero - verification either succeeds or fails | Significant - human text flagged as AI |
| False negatives | None if manifest present | Frequent after paraphrasing or editing |
| Tamper evidence | Yes - any change breaks signature | No - content can be edited to evade |
| Legal standing | Cryptographic proof suitable for litigation | Statistical inference not accepted as evidence |
| Works without original? | Yes - manifest is self-contained | Depends on model training data |
Content Provenance vs. Blockchain
Blockchain-based provenance systems record content hashes on a distributed ledger. This approach stores proof externally - on the chain - rather than embedding it in the content. The practical consequence: if the content is separated from the chain reference (which happens with copy-paste, re-posting, and B2B data distribution), the provenance record is lost.
C2PA manifests are embedded in the content itself. The same piece of text or image carries its provenance record wherever it travels, with no lookup to an external ledger required. Verification works offline, with no network dependency.
Blockchain also introduces latency (block confirmation times), cost (transaction fees), and governance complexity (which chain, which standard). C2PA uses public key infrastructure - the same cryptographic foundation as TLS and code signing - with no per-operation cost.
Full comparison: C2PA vs. blockchain provenance →Implementing Content Provenance
Three integration paths cover the full range of publisher and enterprise use cases.
API Integration
REST API with SDKs in Python, TypeScript, Go, and Rust. Sign a document in a single POST request. Batch endpoints for bulk archive signing. Under 50ms p99 latency.
API documentation →WordPress Plugin
One-click activation. Automatic signing on publish. No engineering required. Compatible with WooCommerce, Yoast, and Elementor.
WordPress plugin →Chrome Extension
Verify content provenance on any web page. Instant visual indicators for signed content. Available in the Chrome Web Store.
Chrome extension →Content Provenance by Audience and Use Case
For Publishers
Machine-readable rights, licensing infrastructure, and formal notice capability.
For AI Companies
EU AI Act compliance, coalition licensing, and publisher relationship management.
For Enterprises
Audit trails, AI governance, and tamper-evident documentation for regulated industries.
EU AI Act
Article 52 compliance before the August 2, 2026 deadline.
Text Provenance
Section A.7 implementation for articles, posts, and unstructured text.
Image Provenance
C2PA manifests for 13 image formats including JPEG, PNG, WebP, and HEIC.
Audio and Video
Provenance for synthetic voice, AI-generated video, and media files.
Verification
How to verify content provenance, what results mean, and what to do with them.
Related Topics
The C2PA Standard
The open standard for content provenance. How JUMBF containers, COSE signatures, and manifest structure work.
Cryptographic Watermarking
How deterministic proof of origin differs from statistical watermarking and why it survives distribution.
Content Provenance Glossary
Definitions for C2PA, JUMBF, COSE, variation selector markers, Merkle tree authentication, willful infringement, and 40+ terms.
Provenance vs. Detection
Why cryptographic proof differs from statistical detection and what that means for accuracy and legal standing.
Frequently Asked Questions
What is content provenance?
Content provenance is a cryptographic record of a piece of content's origin, authorship, and history. It is embedded directly into the file or text so the record travels with the content wherever it goes. Anyone can verify it for free using the C2PA standard, without trusting a third party.
How is content provenance different from metadata?
Traditional metadata - like EXIF data in photos or ID3 tags in audio - is stored separately from the content and can be stripped or altered without detection. Content provenance uses cryptographic signatures so any tampering is immediately visible. If the manifest is removed or the content is edited, verification fails.
What media types support content provenance?
The C2PA standard supports 31 MIME types across five categories: 13 image formats (JPEG, PNG, WebP, TIFF, HEIC, AVIF, and others), 6 audio formats (WAV, MP3, AAC, FLAC, AIFF, M4A), 4 video formats (MP4, MOV, M4V, MKV), 5 document formats (PDF, DOCX, PPTX, XLSX, and plain text), and 3 font formats. Text provenance - covering articles, social posts, and any unstructured text - is defined in Section A.7, which Encypher authored.
Who can verify content provenance?
Anyone. The C2PA verification libraries are open source and the standard is free to implement. Verification does not require an account or API key. Publishers, AI companies, journalists, courts, and regulators can all verify independently.
Does content provenance work after copy-paste?
For text, yes. Encypher embeds provenance markers using proprietary encoding that survives copy-paste across browsers, email clients, and text editors. For images and documents, C2PA manifests are embedded in the file container and survive most distribution pathways. Compression and format conversion can sometimes strip manifests from images - this is an active area of development in the C2PA community.
What does the EU AI Act require for content provenance?
EU AI Act Article 52, which takes full effect August 2, 2026, requires providers of AI systems that generate images, audio, and video to mark their outputs as AI-generated in a machine-readable format. C2PA manifests satisfy this requirement. Article 50 (effective since August 2024) covers general-purpose AI systems. Encypher provides API and SDK tooling to implement compliant marking before the deadline.
How does content provenance help publishers with AI licensing?
When content carries a C2PA manifest with machine-readable rights terms, AI companies that use that content without a license cannot claim innocent infringement - they had formal notice embedded in every copy. This converts a weak copyright claim into a strong willful infringement claim, increasing statutory damages from up to $30,000 per work to up to $150,000 per work under US copyright law.
Is content provenance the same as watermarking?
Cryptographic watermarking and content provenance serve the same purpose - proving origin - but are implemented differently. Cryptographic watermarking embeds proof directly into the content using steganographic or structural techniques. C2PA manifests are attached as a sidecar container. Both approaches are deterministic: verification either succeeds or fails, with no false positives. This is fundamentally different from statistical watermarking (like SynthID) which produces probabilities, not proof.
What happens if someone removes the content provenance record?
Removal is itself evidence. If a C2PA manifest was present when the content was signed but is absent when the content appears elsewhere, that absence documents tampering. For text embedded using Encypher's proprietary provenance markers, an attempt to strip the markers alters the content in ways that break cryptographic verification against the original source.
How do I add content provenance to my content?
Three main paths: the Encypher API (REST, works with any language), the WordPress plugin (one-click activation), or the Python/TypeScript/Go/Rust SDKs for batch processing existing archives. The free tier covers 1,000 documents per month. Enterprise tiers support millions of documents with custom workflows.
Start Protecting Your Content
The free tier covers 1,000 documents per month. No credit card required. API keys available instantly.