Cryptographic vs. Statistical Watermarking
Two different approaches to embedding identity into content. One produces proof. One produces estimates. What that means technically, legally, and practically.
Statistical Watermarking: How It Works
Statistical watermarking - exemplified by Google's SynthID - embeds watermarks by manipulating the statistical properties of content during generation. For text, SynthID biases token sampling probabilities in systematic ways. For images, it adds imperceptible perturbations to pixel values.
The watermark is a property of the content's distribution, not a discrete embedded artifact. Detection works by running a trained classifier over the content and estimating whether its statistical properties match those expected from a watermarked generation. The classifier returns a probability: the content is "likely" or "possibly" or "confidently" watermarked.
Statistical watermarks have a fundamental fragility: they are properties of the original content that can be disrupted by editing. Paraphrasing text changes the token sequence. Resizing or recompressing an image changes pixel values. Translation produces new token sequences. Each of these operations degrades or eliminates the statistical signal that the watermark relies on.
Verification also requires infrastructure: SynthID detection requires Google's classifier model, which is not publicly available and cannot be run independently. Any party that wants to verify SynthID watermarks depends on Google's detection service.
Cryptographic Watermarking: How It Works
Cryptographic watermarking embeds a structured, signed manifest into content. The manifest contains explicit claims about the content (signer identity, creation time, rights terms) and a COSE cryptographic signature that mathematically binds the claims to the signer's private key.
Verification is binary: either the signature is valid and the content hash matches, or it is not. There is no probability estimate. There is no trained classifier. The verification algorithm is defined in open standards (C2PA, COSE, X.509) and implemented in open-source code.
Cryptographic watermarks are either present or absent - not degraded. If the content is modified after signing, the hash no longer matches the signed hash, and verification reports a tamper detection. If the markers are removed, there is no manifest to verify. Either case is a definitive result, not a probability estimate.
For text, provenance markers are invisible Unicode characters embedded in the text. For media, provenance is a JUMBF container in the file structure. In both cases, the watermark is a discrete artifact that is either present and valid, present but invalid, or absent.
Technical Comparison
| Property | Cryptographic (Encypher) | Statistical (SynthID) |
|---|---|---|
| Verification output | Binary: valid or invalid | Probability estimate |
| Survives paraphrasing | No (paraphrase is new content with no markers) | Partially; signal degrades |
| Survives translation | No | No (signal lost) |
| Survives copy-paste | Yes (Unicode markers copy with text) | Yes (statistical property preserved) |
| Third-party verification | Yes, with open-source libraries | No, requires Google's service |
| Author identity included | Yes, in certificate | No |
| Tamper detection | Yes, hash mismatch is detectable | Limited; editing degrades signal |
| Legal defensibility | High; mathematical proof | Limited; statistical inference |
The Evasion Asymmetry
Statistical watermarks can be evaded by disrupting the statistical properties they rely on. Paraphrasing, adding noise, or applying generative post-processing to AI-generated text can reduce SynthID detection confidence below the threshold for positive identification. This is documented in academic research on watermark robustness.
Cryptographic watermarks respond differently to evasion attempts. If someone removes the Unicode markers from Encypher-signed text, the manifest is gone and the text verifies as unsigned - which is the correct result. The content is no longer provably owned. If someone modifies the text while keeping the markers, the hash mismatch is detectable and verification reports tampering.
There is no way to produce a fraudulent valid signature without access to the private key. This is the fundamental security property of public key cryptography. Statistical watermarks have no equivalent guarantee: they can be disrupted by operations that do not require access to any secret.
The Legal Weight Difference
In copyright litigation, the evidentiary standard for proving infringement requires actual proof of ownership, not statistical inference. A detection tool result saying "this content is 87% likely to be AI-generated" is a statistical estimate that can be challenged by any competent expert witness.
A valid C2PA signature is different in kind. It is a mathematical proof that a specific party signed a specific content at a specific time, and that the content has not been modified since. Challenging it requires demonstrating that the cryptographic system was broken, which is a much higher bar.
This matters for publishers pursuing copyright claims and for AI companies that receive formal notices with cryptographic evidence packages. The evidence type determines the legal weight of the claim and the cost of defending against it.
When Statistical Approaches Have Value
Statistical watermarking is useful for a specific problem: tracing AI-generated content back to a specific model when the content was generated without proactive provenance. If you need to determine whether content was generated by Gemini specifically, and the content was not signed at generation, SynthID detection provides information that cryptographic verification cannot.
For organizations building proactive provenance infrastructure - signing content at creation before distribution - cryptographic watermarking provides stronger and more durable documentation. For organizations doing reactive analysis of content they did not sign, statistical tools provide signal they would not otherwise have.
Related Resources
Deterministic Proof, Not Probability
Cryptographic watermarking that verifies with mathematical certainty. Free tier, no credit card, 1,000 documents per month.