Equivariance Encryption for Private LLM Inference

This article shows how we can leverage equivariant transformations to run LLM inference on encrypted data without losing model performance.

The problem is clear: you want to use powerful LLMs for sensitive data, but sending your private information to cloud providers is a privacy nightmare. Medical records, financial documents, legal contracts — all this sensitive stuff needs LLM processing, but traditional inference means exposing everything in plaintext.

What if we could encrypt the input, run inference on the encrypted data, and get back meaningful results? That's exactly what equivariant encryption enables.

Equivariance

Equivariance is a mathematical property where transforming the input leads to a predictable transformation of the output:

f(g(x)) = h(f(x))

Where:

g is our input transformation (encryption)
h is the corresponding output transformation
f is our model

For LLM inference, this means we can encrypt user inputs, process them through a modified model, and decrypt the outputs to get the same results as if we processed the original data directly.

But implementing this is tricky. Most encryption schemes destroy the semantic relationships that LLMs depend on or add extra latency. We need something that encrypts the data while preserving enough structure for the model to work and be fast.

Vocabulary Permutation

We're starting with the simplest approach: vocabulary permutation. Think of it as a secret code where every word maps to a different word, but the relationships stay intact.

Let's consider the example: "The weather is sunny"

"The"     → token 45  → encrypted token 892
"weather" → token 156 → encrypted token 23
"is"      → token 89  → encrypted token 445
"sunny"   → token 234 → encrypted token 67

Encrypted sequence: [892, 23, 445, 67]

The beautiful part? We can twist the model to understand this encrypted vocabulary while maintaining all the semantic relationships it learned during training.

Three Key Components

Encrypted Tokenizer:

encrypted_id = π(original_id)

Where π is our secret permutation function generated from a client key.

Permuted Embedding Layer: Instead of the original embedding matrix E, we create:

E_new[encrypted_id] = E_original[original_id]

Permuted Output Head: The language modeling head outputs probabilities over encrypted tokens, maintaining the same distributions but in permuted space.

This creates perfect equivariance:

Model_encrypted(π(tokens)) = π(Model_original(tokens))

Testing on Llama 3.2 1B

We implemented this on Llama 3.2 1B to test real-world performance:

Model: Llama 3.2 1B Instruct
Hardware: T4 GPU (CUDA required)
Encryption: Cryptographically secure permutation from SHA-256 derived seeds
Test prompts: Various categories from factual to creative

The implementation replaces only the embedding and output layers — the core transformer blocks remain unchanged. This is crucial for maintaining the model's learned representations.

We only rearrange the embedding rows — no additional mathematical transformations. This preserves the semantic distances that make LLMs work:

W_new[i] = W_original[π⁻¹(i)]

Practical Performance

Encryption/decryption: <1ms overhead per request
Inference speed: Same as original model
Memory usage: Identical to base model
Model quality: Preserved (semantic relationships intact)

What's Next?

This permutation approach is just the tip of the iceberg. The equivariance framework opens up possibilities for more sophisticated encryption schemes:

Orthogonal transformations in embedding space
Multi-layer encryption with different transformations per layer
Dynamic permutations that change based on context

Can we build encryption schemes that are both cryptographically secure and preserve model performance? How do we balance privacy, security and utility in production systems?