Pretense is a local AI proxy that protects your code before it ever reaches an LLM. It intercepts API calls, mutates proprietary identifiers (variables, functions, classes) into safe synthetics, forwards the mutated code to the model, then reverses the mutation in the response. Your real code never leaves your machine.

How does rate limiting work?

Tokens are only burned when you send a prompt to the LLM. Each mutation counts against your 7-day rolling period quota: 1,000 mutations for Free, 100,000 for Pro, unlimited for Enterprise. The counter resets every 7 days from your first mutation, not on a calendar date.

Does Pretense use any cloud LLMs?

No. Pretense runs 100% locally on your machine. It acts as a transparent proxy between your editor (Claude Code, Cursor, Copilot, or any OpenAI-compatible CLI) and the upstream LLM API. The mutation and reversal happen in-process before any network request leaves your device.

Where is the data stored?

All data stays local. SQLite database on your machine stores the mutation map (original to mutated identifier pairs) so responses can be reversed byte-exactly. No telemetry, no cloud sync, no phoning home. Your mutation history is yours.

How do I roll this out across my team?

Pretense runs as a background proxy on port 9339. Point each developer's editor to localhost:9339 instead of the upstream API. Enterprise plans include team management with per-member audit logs, invite flows, and role-based access. See the /docs/deploy page for install scripts.

How do I install Pretense?

One line, any platform: curl -fsSL https://pretense.ai/install | sh. Works on macOS, Linux, and Windows (WSL). The installer drops a single binary, starts the proxy on port 9339, and prints the config snippet for your editor.

LiteLLM CVSS 9.6 Vulnerability: When Your AI Proxy Becomes the Attack Vector

The AI Proxy That Became an Attack Vector

In 2025, security researchers disclosed a critical vulnerability in LiteLLM, a popular open-source AI proxy and gateway used by thousands of organizations to manage their LLM API traffic. The vulnerability received a CVSS score of 9.6 out of 10 -- classified as Critical.

LiteLLM is used as a unified interface to multiple AI providers. Organizations route their OpenAI, Anthropic, Google, and other LLM API calls through LiteLLM to manage API keys, track usage, enforce rate limits, and switch between providers. Thousands of companies had deployed it in production.

The vulnerability meant that an attacker could potentially intercept, modify, or exfiltrate ALL AI API traffic passing through the compromised proxy. Every prompt, every response, every API key, every piece of proprietary code sent through LiteLLM was exposed.

The Architectural Problem

The LiteLLM vulnerability highlights a fundamental architectural risk with cloud-hosted AI proxies: they are single points of failure for all AI traffic. When you route all your LLM API calls through a single proxy, a vulnerability in that proxy exposes everything.

This is not a LiteLLM-specific problem. It is an architectural problem with any centralized AI proxy:

- **Centralized API key storage**: The proxy holds API keys for multiple providers. One compromise exposes all keys. - **Full traffic visibility**: The proxy sees every prompt and every response in plaintext. An attacker gains access to all AI interactions. - **Shared infrastructure**: In hosted or multi-tenant deployments, one customer's compromise can affect all customers. - **Trust chain dependency**: You trust the proxy with the most sensitive data in your AI pipeline. That trust must be warranted.

Developer Machine Cloud Proxy (LiteLLM) AI APIs
-- Send prompt with real code ---->
-- VULNERABILITY HERE ------->
Attacker can:
- Read all prompts
- Steal API keys
- Modify responses
- Exfiltrate code
<-- Receive response --------------	<-----------------------------
```

---

What CVSS 9.6 Means in Practice

A CVSS score of 9.6 is about as bad as it gets. For context:

- **CVSS 7.0-8.9** (High): Exploitable but may require some conditions - **CVSS 9.0-10.0** (Critical): Easily exploitable, severe impact, requires immediate remediation

A 9.6 score typically means: network-exploitable, low complexity, no authentication required, and high impact on confidentiality, integrity, and availability. In practical terms, an attacker on the network could exploit the vulnerability without needing credentials or special access.

For organizations using LiteLLM as their AI proxy, this meant:

- Every prompt sent through the proxy could have been intercepted - Every API key stored in the proxy could have been stolen - Every response could have been modified (injecting malicious code suggestions) - The complete AI interaction history could have been exfiltrated

typescript

// Request: developer debugging payment code { "model": "claude-sonnet-4-20250514", "messages": [{ "role": "user", "content": "Fix the bug in this function:\n\n" + "async function processStripeRefund(chargeId: string) {\n" + " const charge = await stripe.charges.retrieve(chargeId);\n" + " const refund = await stripe.refunds.create({\n" + " charge: chargeId,\n" + " amount: charge.amount,\n" + " reason: 'requested_by_customer',\n" + " metadata: { internalRefundId: generateRefundId() }\n" + " });\n" + "}" }], "api_key": "sk-ant-api03-REDACTED" // API key in transit } ```

---

The Local-First Alternative

Pretense takes a fundamentally different architectural approach. Instead of routing all AI traffic through a centralized cloud proxy, Pretense runs locally -- on the developer's machine or on your organization's infrastructure.

Developer Machine AI APIs
-- Pretense proxy (localhost:9339) ------------------>
1. Mutate identifiers locally
2. No centralized server
3. No shared infrastructure
4. Mutation map stays on YOUR machine
What leaves your network:
_fn4a2b, _cls7d4e, _v3b1c
(not processStripeRefund, stripe.charges)
<-- Response with synthetic names --------------------
Reverse mutation locally
<-- Real code returned to developer
```

The security advantages of local-first are significant:

**No single point of failure.** Each developer's Pretense instance is independent. Compromising one instance does not expose any other developer's data.

**No shared API keys.** Each developer uses their own API keys. There is no centralized key store to compromise.

**Mutation before transit.** Even if the network is compromised (man-in-the-middle, compromised VPN, malicious WiFi), the attacker sees only synthetic identifiers. The real code never leaves the local machine.

**Open source and auditable.** Unlike hosted proxy services where you trust the provider's security claims, Pretense's code is fully auditable. You can verify exactly what it does with your data.

---

The Real-World Impact

Consider two organizations, both using AI coding tools:

**Organization A** routes all AI traffic through a hosted LiteLLM instance. A CVSS 9.6 vulnerability is disclosed. Until they patch (which requires testing, staging, and production deployment -- typically 2-7 days for critical patches), all their AI traffic is exposed. Every developer's prompts, every piece of code, every API key.

**Organization B** uses Pretense on each developer's machine. Even if a vulnerability were discovered in Pretense (no software is immune), the impact is contained: only that developer's synthetic identifiers are exposed, and those identifiers are meaningless without the local mutation map.

typescript

// Organization A: what the attacker sees through the LiteLLM vulnerability

class FraudDetectionPipeline { private mlModel: XGBoostFraudModel; private velocityChecker: TransactionVelocityAnalyzer;

async evaluateTransaction(txn: Transaction): Promise<FraudScore> { const features = await this.extractFeatures(txn); const mlScore = await this.mlModel.predict(features); const velocityScore = await this.velocityChecker.check(txn); return this.combineScores(mlScore, velocityScore, txn.amount); } }

// Organization B: what any interceptor sees (Pretense mutation applied) // Synthetic identifiers, architecture hidden

class _cls4f2a { private _v3b1c: _cls7d4e; private _v8a2f: _clsb3c1;

async _fn2c3d(_v1e5f: _cls6a7b): Promise<_cls9d8e> { const _v5f6a = await this._fn4a2b(_v1e5f); const _v7b8c = await this._v3b1c._fn8c9d(_v5f6a); const _v2d3e = await this._v8a2f._fn1a2b(_v1e5f); return this._fn6f7a(_v7b8c, _v2d3e, _v1e5f._v3e4f); } } ```

---

Lessons for AI Infrastructure Security

The LiteLLM vulnerability teaches three lessons:

**1. Minimize the blast radius.** Centralized AI proxies create centralized risk. Local-first architecture limits the impact of any single vulnerability to a single developer's synthetic data.

**2. Assume the network is hostile.** Even if your proxy is perfectly secure today, network paths are not. Mutation ensures data is protected in transit regardless of the security posture of intermediate systems.

**3. Trust but verify.** Open-source, auditable code is the only way to verify security claims about data handling. Hosted services ask you to trust their security team. Open source lets you verify.

Pretense was built on these principles from day one. It is local-first, open-source, and treats every network path as potentially hostile. Your code is mutated before it touches any wire, any proxy, any third-party service.

Protect Your Code Today

Pretense is the AI firewall that mutates proprietary code before it reaches any LLM API. Install in 30 seconds and protect your team's intellectual property.

bash

curl -fsSL https://pretense.ai/install.sh | sh
pretense init
pretense start