Skip to main content

Cryptographic Watermarking for Text

Invisible provenance markers embedded in every article, post, and document you publish. Readers see nothing different. Verification tools see cryptographic proof of ownership.

Invisible by Design

Encypher embeds cryptographic provenance markers directly within the text character stream. The markers produce no visible output. The text looks identical to readers, screen readers handle it correctly, and search engine indexes treat it as the underlying text content. The watermark is invisible in every practical context.

There is no markup change, no visible tag, no alteration to the reading experience. A signed article and an unsigned article are indistinguishable to any reader. The difference only appears when a verification tool examines the content and extracts the embedded proof.

Copy-Paste Survival

Text processors preserve the full character stream in copy operations. When text is copied from a web page, the character stream is copied intact - including embedded provenance markers. When that text is pasted into an email, a document, a CMS, or a messaging platform, the markers are present in the pasted text.

This has been verified across:

  • Major web browsers (Chrome, Firefox, Safari, Edge)
  • Email clients (Gmail, Outlook, Apple Mail)
  • Document editors (Google Docs, Pages, LibreOffice)
  • Messaging platforms (Slack, Teams, WhatsApp)
  • CMS platforms (WordPress, Contentful, Drupal)

Microsoft Word is handled separately with the ZWC marker encoding, which uses a different character set optimized for Word's document processing behavior.

B2B Distribution and Wire Services

Wire service distribution is the highest-volume text distribution channel for professional content. AP and Reuters distribute thousands of stories per day to hundreds or thousands of subscriber outlets. Each distribution event is a potential ownership record.

When signed text passes through a wire service, the markers travel with the article. The subscriber outlet receives the article with the markers intact. Their CMS ingests it with the markers. Their readers copy and paste from the published article with the markers.

The chain of custody is not just documented at the source - it is present in every copy at every distribution point. Any downstream party that has the text also has the cryptographic proof of where it came from and whose rights terms apply to it.

Aggregator Scraping and AI Training Data

Web scrapers that collect text content from HTML pages extract the text content of the page - which includes the Unicode character stream with embedded markers. Standard scraping tools that use HTML parsers to extract text content preserve Unicode characters in the extracted text.

AI training corpus builders that scrape web content and process HTML to extract readable text are subject to the same behavior. If a scraper extracts the text content of a signed article, the markers are present in the scraped text unless the scraper explicitly strips invisible provenance markers - which is not a standard scraping operation.

This means that signed articles that end up in AI training corpora carry their provenance markers. The AI company that trained on the content has the cryptographic evidence of the content's origin in their training data. This is the mechanism that establishes formal notice for willful infringement claims.

Sentence-Level Granularity

Encypher's proprietary sentence-level Merkle tree authenticates each sentence individually. When a specific sentence from a signed article is reproduced in another context - an AI output, a summary, a quote in another article - the sentence carries its own proof of origin.

Verification can confirm, for a given sentence, that it came from a specific article by a specific publisher on a specific date. This supports two distinct use cases:

  • Quote integrity checking: confirm that a quoted passage matches the original publication
  • Partial reproduction claims: trace specific sentences in AI outputs back to their source documents

For publishers with large archives, this means that even partial reproduction of their content - a few sentences from an article, not the full text - can be cryptographically attributed to the original source. The evidentiary record covers specific sentences, not just document-level ownership.

Works in Any Text Context

The text watermarking approach is not tied to any specific content format or distribution channel. It works in:

  • Web articles (HTML text content)
  • Email newsletters
  • Social media posts (where platform character limits allow)
  • API responses delivering text content
  • CMS-managed content
  • Word documents (with ZWC encoding)
  • PDF documents (via the document signing path)

The signing API accepts a plain text string and returns the same string with embedded markers. How that string is subsequently stored, formatted, or displayed does not affect the provenance.

Related Resources

Add Invisible Watermarks to Your Text

No visible changes. Copy-paste durable. Sentence-level granularity included. Free for up to 1,000 documents per month.

Related