This article shows how we can leverage equivariant transformations to run LLM inference on encrypted data without losing model performance.
The problem is clear: you want to use powerful LLMs for sensitive data, but sending your private information to cloud providers is a privacy nightmare. Medical records, financial documents, legal contracts — all this sensitive stuff needs LLM processing, but traditional inference means exposing everything in plaintext.
What if we could encrypt the input, run inference on the encrypted data, and get back meaningful results? That's exactly what equivariant encryption enables.
Equivariance
Equivariance is a mathematical property where transforming the input leads to a predictable transformation of the output:
f(g(x)) = h(f(x))
Where:
gis our input transformation (encryption)his the corresponding output transformationfis our model
For LLM inference, this means we can encrypt user inputs, process them through a modified model, and decrypt the outputs to get the same results as if we processed the original data directly.
But implementing this is tricky. Most encryption schemes destroy the semantic relationships that LLMs depend on or add extra latency. We need something that encrypts the data while preserving enough structure for the model to work and be fast.
Vocabulary Permutation
We're starting with the simplest approach: vocabulary permutation. Think of it as a secret code where every word maps to a different word, but the relationships stay intact.
Let's consider the example: "The weather is sunny"
"The" → token 45 → encrypted token 892
"weather" → token 156 → encrypted token 23
"is" → token 89 → encrypted token 445
"sunny" → token 234 → encrypted token 67
Encrypted sequence: [892, 23, 445, 67]
The beautiful part? We can twist the model to understand this encrypted vocabulary while maintaining all the semantic relationships it learned during training.
Three Key Components
Encrypted Tokenizer:
encrypted_id = π(original_id)
Where π is our secret permutation function generated from a client key.
Permuted Embedding Layer: Instead of the original embedding matrix E, we create:
E_new[encrypted_id] = E_original[original_id]
Permuted Output Head: The language modeling head outputs probabilities over encrypted tokens, maintaining the same distributions but in permuted space.
This creates perfect equivariance:
Model_encrypted(π(tokens)) = π(Model_original(tokens))
Testing on Llama 3.2 1B
We implemented this on Llama 3.2 1B to test real-world performance:
- Model: Llama 3.2 1B Instruct
- Hardware: T4 GPU (CUDA required)
- Encryption: Cryptographically secure permutation from SHA-256 derived seeds
- Test prompts: Various categories from factual to creative
The implementation replaces only the embedding and output layers — the core transformer blocks remain unchanged. This is crucial for maintaining the model's learned representations.
We only rearrange the embedding rows — no additional mathematical transformations. This preserves the semantic distances that make LLMs work:
W_new[i] = W_original[π⁻¹(i)]
Practical Performance
- Encryption/decryption: <1ms overhead per request
- Inference speed: Same as original model
- Memory usage: Identical to base model
- Model quality: Preserved (semantic relationships intact)
What's Next?
This permutation approach is just the tip of the iceberg. The equivariance framework opens up possibilities for more sophisticated encryption schemes:
- Orthogonal transformations in embedding space
- Multi-layer encryption with different transformations per layer
- Dynamic permutations that change based on context
Can we build encryption schemes that are both cryptographically secure and preserve model performance? How do we balance privacy, security and utility in production systems?