LiteLLM CVSS 9.6 Vulnerability: When Your AI Proxy Becomes the Attack Vector
A critical vulnerability in LiteLLM AI proxy could intercept all AI API traffic. Here is why local-first, open-source AI security beats cloud proxy architectures.
The AI Proxy That Became an Attack Vector
In 2025, security researchers disclosed a critical vulnerability in LiteLLM, a popular open-source AI proxy and gateway used by thousands of organizations to manage their LLM API traffic. The vulnerability received a CVSS score of 9.6 out of 10 -- classified as Critical.
LiteLLM is used as a unified interface to multiple AI providers. Organizations route their OpenAI, Anthropic, Google, and other LLM API calls through LiteLLM to manage API keys, track usage, enforce rate limits, and switch between providers. Thousands of companies had deployed it in production.
The vulnerability meant that an attacker could potentially intercept, modify, or exfiltrate ALL AI API traffic passing through the compromised proxy. Every prompt, every response, every API key, every piece of proprietary code sent through LiteLLM was exposed.
The Architectural Problem
The LiteLLM vulnerability highlights a fundamental architectural risk with cloud-hosted AI proxies: they are single points of failure for all AI traffic. When you route all your LLM API calls through a single proxy, a vulnerability in that proxy exposes everything.
This is not a LiteLLM-specific problem. It is an architectural problem with any centralized AI proxy:
- **Centralized API key storage**: The proxy holds API keys for multiple providers. One compromise exposes all keys. - **Full traffic visibility**: The proxy sees every prompt and every response in plaintext. An attacker gains access to all AI interactions. - **Shared infrastructure**: In hosted or multi-tenant deployments, one customer's compromise can affect all customers. - **Trust chain dependency**: You trust the proxy with the most sensitive data in your AI pipeline. That trust must be warranted.
| Developer Machine Cloud Proxy (LiteLLM) AI APIs | |
|---|---|
| -- Send prompt with real code ----> | |
| -- VULNERABILITY HERE -------> | |
| Attacker can: | |
| - Read all prompts | |
| - Steal API keys | |
| - Modify responses | |
| - Exfiltrate code | |
| <-- Receive response -------------- | <----------------------------- |
| ``` |
---
What CVSS 9.6 Means in Practice
A CVSS score of 9.6 is about as bad as it gets. For context:
- **CVSS 7.0-8.9** (High): Exploitable but may require some conditions - **CVSS 9.0-10.0** (Critical): Easily exploitable, severe impact, requires immediate remediation
A 9.6 score typically means: network-exploitable, low complexity, no authentication required, and high impact on confidentiality, integrity, and availability. In practical terms, an attacker on the network could exploit the vulnerability without needing credentials or special access.
For organizations using LiteLLM as their AI proxy, this meant:
- Every prompt sent through the proxy could have been intercepted - Every API key stored in the proxy could have been stolen - Every response could have been modified (injecting malicious code suggestions) - The complete AI interaction history could have been exfiltrated
// Request: developer debugging payment code { "model": "claude-sonnet-4-20250514", "messages": [{ "role": "user", "content": "Fix the bug in this function:\n\n" + "async function processStripeRefund(chargeId: string) {\n" + " const charge = await stripe.charges.retrieve(chargeId);\n" + " const refund = await stripe.refunds.create({\n" + " charge: chargeId,\n" + " amount: charge.amount,\n" + " reason: 'requested_by_customer',\n" + " metadata: { internalRefundId: generateRefundId() }\n" + " });\n" + "}" }], "api_key": "sk-ant-api03-REDACTED" // API key in transit } ```
---
The Local-First Alternative
Pretense takes a fundamentally different architectural approach. Instead of routing all AI traffic through a centralized cloud proxy, Pretense runs locally -- on the developer's machine or on your organization's infrastructure.
| Developer Machine AI APIs |
|---|
| -- Pretense proxy (localhost:9339) ------------------> |
| 1. Mutate identifiers locally |
| 2. No centralized server |
| 3. No shared infrastructure |
| 4. Mutation map stays on YOUR machine |
| What leaves your network: |
| _fn4a2b, _cls7d4e, _v3b1c |
| (not processStripeRefund, stripe.charges) |
| <-- Response with synthetic names -------------------- |
| Reverse mutation locally |
| <-- Real code returned to developer |
| ``` |
The security advantages of local-first are significant:
**No single point of failure.** Each developer's Pretense instance is independent. Compromising one instance does not expose any other developer's data.
**No shared API keys.** Each developer uses their own API keys. There is no centralized key store to compromise.
**Mutation before transit.** Even if the network is compromised (man-in-the-middle, compromised VPN, malicious WiFi), the attacker sees only synthetic identifiers. The real code never leaves the local machine.
**Open source and auditable.** Unlike hosted proxy services where you trust the provider's security claims, Pretense's code is fully auditable. You can verify exactly what it does with your data.
---
The Real-World Impact
Consider two organizations, both using AI coding tools:
**Organization A** routes all AI traffic through a hosted LiteLLM instance. A CVSS 9.6 vulnerability is disclosed. Until they patch (which requires testing, staging, and production deployment -- typically 2-7 days for critical patches), all their AI traffic is exposed. Every developer's prompts, every piece of code, every API key.
**Organization B** uses Pretense on each developer's machine. Even if a vulnerability were discovered in Pretense (no software is immune), the impact is contained: only that developer's synthetic identifiers are exposed, and those identifiers are meaningless without the local mutation map.
// Organization A: what the attacker sees through the LiteLLM vulnerabilityclass FraudDetectionPipeline { private mlModel: XGBoostFraudModel; private velocityChecker: TransactionVelocityAnalyzer;
async evaluateTransaction(txn: Transaction): Promise<FraudScore> { const features = await this.extractFeatures(txn); const mlScore = await this.mlModel.predict(features); const velocityScore = await this.velocityChecker.check(txn); return this.combineScores(mlScore, velocityScore, txn.amount); } }
// Organization B: what any interceptor sees (Pretense mutation applied) // Synthetic identifiers, architecture hidden
class _cls4f2a { private _v3b1c: _cls7d4e; private _v8a2f: _clsb3c1;
async _fn2c3d(_v1e5f: _cls6a7b): Promise<_cls9d8e> { const _v5f6a = await this._fn4a2b(_v1e5f); const _v7b8c = await this._v3b1c._fn8c9d(_v5f6a); const _v2d3e = await this._v8a2f._fn1a2b(_v1e5f); return this._fn6f7a(_v7b8c, _v2d3e, _v1e5f._v3e4f); } } ```
---
Lessons for AI Infrastructure Security
The LiteLLM vulnerability teaches three lessons:
**1. Minimize the blast radius.** Centralized AI proxies create centralized risk. Local-first architecture limits the impact of any single vulnerability to a single developer's synthetic data.
**2. Assume the network is hostile.** Even if your proxy is perfectly secure today, network paths are not. Mutation ensures data is protected in transit regardless of the security posture of intermediate systems.
**3. Trust but verify.** Open-source, auditable code is the only way to verify security claims about data handling. Hosted services ask you to trust their security team. Open source lets you verify.
Pretense was built on these principles from day one. It is local-first, open-source, and treats every network path as potentially hostile. Your code is mutated before it touches any wire, any proxy, any third-party service.
Protect Your Code Today
Pretense is the AI firewall that mutates proprietary code before it reaches any LLM API. Install in 30 seconds and protect your team's intellectual property.
curl -fsSL https://pretense.ai/install.sh | sh
pretense init
pretense startShare this article