Text Content Provenance
Text provenance embeds cryptographic proof of authorship directly into the text using invisible characters. No markup changes, no visible differences, no server dependency. The proof is in the text itself.
C2PA Section A.7: The Text Standard
The C2PA 2.3 specification defines content provenance for all major media types. Section A.7 - "Embedding Manifests into Unstructured Text" - defines how provenance manifests are embedded into plain text, articles, and other unstructured content.
Erik Svilich, Encypher's founder, contributed Section A.7 and co-chairs the C2PA Text Provenance Task Force. The specification defines encoding approaches that carry C2PA manifest data within text content invisibly to readers, covering web and digital contexts, Microsoft Office workflows, and print and scan pipelines.
The full specification is available at the C2PA website:C2PA 2.3 Section A.7.
Default Encoding: Optimized for Web and Digital
Encypher's default encoding embeds C2PA provenance credentials as invisible characters placed within the signed text. From the reader's perspective, the text is identical to unsigned text. Reading tools, accessibility software, and search engines treat the text normally.
The embedded credentials survive copy-paste across browsers, email clients, text editors, and messaging platforms. When a reader copies text from a signed article and pastes it elsewhere, the credentials copy with the text. The provenance data is present in every copy.
This mode is appropriate for any publisher distributing text on the web, through email, or via CMS platforms. Verification is cryptographic and deterministic: a given piece of text either carries a valid credential or it does not.
Word-Compatible Encoding: Optimized for Microsoft Office
Microsoft Word handles certain text content differently from web browsers and general text processors. The default encoding can behave unexpectedly in Word documents, so Encypher provides a Word-compatible encoding mode that is stable across the Microsoft Office processing pipeline.
Publishers distributing content in Word format - legal documents, press briefings, editorial drafts, B2B document distribution - use this mode. The signing API accepts a parameter to select the encoding mode. Verification handles both modes automatically.
Sentence-Level Attribution
Standard C2PA provenance authenticates a document as a whole. The C2PA manifest records a credential for the entire document, and verification confirms that the document matches that credential. Any modification to the document breaks verification.
Encypher's proprietary sentence-level attribution authenticates each sentence individually. Verification confirms not just that a document was published by a specific author, but which specific sentences are authenticated as original. A party can prove that a specific sentence belongs to a specific document and was published by a specific author, without needing the full document.
Sentence-level attribution is Encypher's proprietary technology. It is not defined in the C2PA specification. Encypher implements it as an extension to the C2PA manifest structure, compatible with standard C2PA verification while providing additional granularity for Encypher-signed content.
Quote Integrity Verification
When a sentence from a signed article is reproduced in another context - a summary, a citation, a quote in another article - the sentence carries its provenance credentials. Verification confirms:
- The sentence was published in the claimed source article
- The sentence is unmodified from the original publication
- The publication date and author identity match the claimed source
For RAG pipelines, quote integrity verification provides a quality signal that improves citation accuracy. Retrieved passages can be verified against the signed original before being included in AI-generated responses. If the retrieved text has been modified - by scraping errors, OCR artifacts, or deliberate manipulation - verification fails and the passage can be flagged.
For copyright enforcement, sentence-level verification enables partial reproduction claims. If an AI company reproduces sentences from a publisher's archive across multiple outputs, each sentence can be individually verified as originating from the publisher's signed content. This supports claims about specific, documented reproduction rather than statistical inference about training data composition.
What Text Provenance Cannot Do
Text provenance has limits worth stating clearly. The embedded credentials survive copy-paste and most text distribution. They do not survive OCR (scanning physical text), heavy paraphrasing (rewriting the text in different words), or translation into another language.
If an AI system paraphrases a sentence rather than reproducing it, the paraphrase does not carry the original sentence's credentials. The provenance credentials are bound to the specific text sequence, not to the semantic content. Paraphrase detection is a different problem than provenance verification, and text provenance does not solve it.
Text provenance is designed for exact reproduction detection and ownership documentation. Combined with licensing terms in the manifest, it converts exact reproduction from an ambiguous infringement to a documented infringement with provable formal notice.
Related Resources
Add Text Provenance to Your Content
Free tier covers 1,000 documents per month. Python and TypeScript SDKs available. Sentence-level attribution included at all tiers.