Format-Preserving Encryption (FPE)

AES ciphertext of an SSN looks like a 256-bit blob; a downstream system expecting NNN-NN-NNNN cannot validate it, index it, or route on it. In production systems that is usually fine — decrypt at the boundary. But for non-prod environments (CI, load tests, demos, vendor integration testing) you need realistic-looking data that flows through schema validators, regexes, and join keys without exploding.

Format-preserving encryption (FPE) encrypts a value into a ciphertext with the same format: a 9-digit SSN in, a different 9-digit SSN out; a 16-digit credit-card number in, a Luhn-valid 16-digit number out. NIST standardizes two modes in SP 800-38G: FF1 and FF3-1, both built on AES.

1. When FPE Is the Right Tool

Non-prod data — realistic test data without exposing production PII.
Legacy systems with hard-coded length/format constraints that cannot accept arbitrary ciphertext.
Partial exposure — you need to show a partially masked value (last 4 digits of an account number) without decrypting the whole string.
Join keys across systems where plaintext cannot be shared, but a consistent pseudonym must match across databases.

2. FF1 vs FF3-1

FF1 — Feistel construction, handles very short domains (as few as 6 characters), supports variable-length tweaks. Recommended default.
FF3-1 — Faster; requires 7+ character domains. The original FF3 was withdrawn after a 2017 cryptanalysis; FF3-1 is the corrected variant.

Both use AES-128/192/256 as the underlying PRF. Security is tight against known attacks when the domain size and tweak are handled correctly; avoid rolling your own mode.

3. Example: FF1 in Python (pyffx)

import string
from pyffx import String, Integer

KEY = bytes.fromhex("0123456789abcdef" * 2)   # 16-byte AES key


def encrypt_ssn(ssn: str, tweak: bytes = b"ssn-v1") -> str:
    # SSN as a 9-digit integer so the ciphertext is also 9 digits.
    digits = ssn.replace("-", "")
    assert len(digits) == 9 and digits.isdigit()
    cipher = Integer(KEY, length=9)
    enc = cipher.encrypt(int(digits))
    enc_str = f"{enc:09d}"
    return f"{enc_str[:3]}-{enc_str[3:5]}-{enc_str[5:]}"


def encrypt_account(name: str) -> str:
    # Alphabetic handle that must remain alphabetic.
    cipher = String(KEY, alphabet=string.ascii_lowercase, length=len(name))
    return cipher.encrypt(name.lower())


print(encrypt_ssn("123-45-6789"))   # e.g. "482-19-3071"
print(encrypt_account("acmecorp"))  # e.g. "rjkqwhfm"

Because FPE is deterministic for a given (key, tweak), the same SSN always produces the same pseudonym — which is exactly what you want for joins. That also means it leaks equality: if the attacker sees the ciphertext of a known SSN, they can match other ciphertexts of the same value. See section 5 for when this matters.

4. Tweaks & Key Management

Tweak = domain separator. Use distinct tweaks per field (ssn-v1, acct-v1) so the same plaintext in two columns encrypts to different ciphertexts.
Per-environment keys. The key that seeds the pseudonyms in staging must differ from any prod key; otherwise staging ciphertexts collide with prod.
Envelope encryption. Wrap FPE keys with a KMS CMK; log every key-release event.
Rotation. Rotation invalidates existing pseudonyms. Plan for cutover windows or keep the old key available for decryption only.

5. FPE vs Tokenization vs Deterministic AEAD

Tokenization (vault-based) — random token, map stored in a vault. Most secure; no cryptographic leakage. Requires a lookup service on the reversal path.
FPE — no vault needed, reversal is cryptographic. Preserves format. Leaks equality (same input → same output).
Deterministic AEAD (AES-SIV) — preserves equality but not format; ciphertext is longer than plaintext.

Choose FPE only when format preservation is the requirement. For most production redaction, the vault-based tokenizer described on the PII redaction page is the safer default.

6. Gotchas

Small domains are weak — FPE on a 4-digit PIN has only 10 000 possible ciphertexts; an attacker with any plaintext/ciphertext access can brute-force the key.
Luhn preservation requires care — FF1 on the first 15 digits + recompute check digit, not FF1 on all 16; otherwise you break card validation.
Not a substitute for access control — FPE ciphertexts are still regulated data. Treat them as PII in logs, backups, and third-party integrations.
Audit every decryption. A decrypt-to-plaintext path exists; that is the riskiest operation and must be logged with actor, tweak, and purpose.

↑ Back to Top