The Problem

The Pretense proxy sits in the hot path of every LLM API request a developer makes. Every code completion, every refactoring request, every multi-file analysis passes through it. That means latency is a product requirement, not a nice-to-have.

In v0.1, the scanner was written in TypeScript. It worked. It passed all tests. It caught secrets and mutated identifiers correctly.

It was also adding approximately 50ms to every request.

For a single code completion, 50ms is barely perceptible. For a multi-turn Claude Code session where dozens of requests fire in quick succession, 50ms per request accumulates into 2-3 seconds of visible delay. Developers noticed. In user testing, we heard the same feedback repeatedly: "it feels a little laggy."

50ms was not going to work.

---

Understanding Where the Time Went

Before rewriting anything, we profiled the TypeScript scanner to understand the actual bottleneck.

typescript

// TypeScript scanner hot path (simplified)
export function scanCode(source: string): ScanResult {
  const tokens = tokenize(source);           // ~15ms on 1000 tokens
  const identifiers = extractIdentifiers(tokens);  // ~8ms
  const mutations = identifiers.map(mutate);       // ~12ms
  const secrets = detectSecrets(source);           // ~18ms (32 patterns)
  return { tokens, mutations, secrets };
}

The profiler showed:

- Regex compilation: 22ms (the secret detection regexes were being recompiled on every call) - String allocation: 14ms (JavaScript's garbage collector pressure from creating thousands of string objects) - Single-threaded execution: 0ms overhead, but 0ms parallelism

The regex compilation was fixable in TypeScript (cache compiled regexes). We fixed it. It brought average time down to ~28ms. Still not fast enough.

The string allocation problem was inherent to V8's garbage collector. Every token, every mutation result, every secret match was a new heap allocation. With 1000 tokens, that is thousands of allocations per request.

The single-threaded problem was fundamental. Node.js is single-threaded by design. We could not parallelize the scan across multiple cores without worker threads and the communication overhead that entails.

We needed to go lower.

---

The Decision: Rust with Rayon

The choice of Rust was driven by three requirements:

1. Zero-copy string handling where possible 2. True parallelism across cores (for batch file scanning) 3. Compile-time regex compilation (via the `regex` crate's lazy static pattern)

We also needed to keep the Node.js interface: Pretense's CLI, proxy server, and most of the business logic was staying in TypeScript. The scanner needed to be callable from Node.js with minimal overhead.

That meant NAPI bindings.

---

Architecture: NAPI Bindings

NAPI (Node.js API) is the stable ABI for native Node.js modules. A Rust library compiled with `napi-rs` produces a `.node` file that TypeScript can import as a native module.

The key advantage over a child process or gRPC approach: NAPI calls are in-process. There is no serialization overhead, no inter-process communication, no socket roundtrip. The Rust code runs in the same process as the Node.js host.

rust

// packages/scanner-rs/src/napi_bindings.rs (simplified)
use napi::bindgen_prelude::*;
use napi_derive::napi;

#[napi]
pub fn scan_source(source: String, file_path: String) -> Result<String> {
let result = scan_file(&source, &file_path)
.map_err(	e	Error::from_reason(e.to_string()))?;
serde_json::to_string(&result)
.map_err(	e	Error::from_reason(e.to_string()))
}

#[napi]
pub fn scan_batch(requests_json: String) -> Result<String> {
let requests: BatchScanRequest = serde_json::from_str(&requests_json)
.map_err(	e	Error::from_reason(e.to_string()))?;
let results = scan_files_parallel(&requests.files)
.map_err(	e	Error::from_reason(e.to_string()))?;
serde_json::to_string(&results)
.map_err(	e	Error::from_reason(e.to_string()))
}
```

From TypeScript, this is a synchronous function call:

typescript

// TypeScript caller

const result = JSON.parse(scanSource(sourceCode, filePath)); // result.tokens, result.secrets, result.mutation_map ```

The NAPI boundary has a cost: JSON serialization/deserialization for the input and output. We measured this at approximately 0.15ms for a 1000-token file. It is the dominant cost at small sizes, which is why the benchmark gap narrows at very small inputs.

---

The Rust Scanner: Key Implementation Details

**Lazy static regex compilation:**

rust

use once_cell::sync::Lazy;

// Compiled once at program start, reused for every scan
static IDENTIFIER_RE: Lazy<Regex> = Lazy::new(	{
Regex::new(r"([a-zA-Z_][a-zA-Z0-9_]{2,})").unwrap()
});

static SECRET_PATTERNS: Lazy<Vec<(String, Regex)>> = Lazy::new(	{
vec![
("github_token".to_string(),
Regex::new(r"ghp_[A-Za-z0-9]{36}").unwrap()),
("aws_access_key".to_string(),
Regex::new(r"AKIA[0-9A-Z]{16}").unwrap()),
("anthropic_key".to_string(),
Regex::new(r"sk-ant-[A-Za-z0-9-]{32,}").unwrap()),
// 29 more patterns...
]
});
```

**Parallel batch scanning with Rayon:**

rust

pub fn scan_files_parallel(file_paths: &[String]) -> Result<Vec<FileScanResult>, ScanError> {
file_paths
.par_iter() // parallel iterator across all CPU cores
.map(	path	{
let content = std::fs::read_to_string(path)
.map_err(	e	ScanError::Io(e.to_string()))?;
scan_file(&content, path)
})
.collect()
}
```

For a 50-file context window, Rayon distributes the scan across all available CPU cores. On an M3 MacBook Pro with 12 performance cores, this means effectively scanning 12 files simultaneously.

**Deterministic mutation (djb2 + SHA-256):**

rust

pub fn mutate_identifier(identifier: &str, kind: &str) -> String { // djb2 for speed let mut djb2: u32 = 5381; for byte in identifier.bytes() { djb2 = djb2.wrapping_shl(5).wrapping_add(djb2).wrapping_add(byte as u32); }

// SHA-256 for uniqueness guarantee let sha = Sha256::digest(identifier.as_bytes()); let sha_prefix = u32::from_be_bytes([sha[0], sha[1], sha[2], sha[3]]);

// XOR combination, take 4 hex chars let combined = djb2 ^ sha_prefix; format!("_{}{:04x}", kind, combined & 0xFFFF) } ```

The combination of djb2 (fast) and SHA-256 (collision-resistant) gives us a hash that is both fast to compute and guaranteed to have negligible collision probability across a realistic codebase.

---

Benchmark Methodology and Results

We benchmarked on:

- Input: 1000 code tokens drawn from real TypeScript production code - Hardware: Apple M3 Pro, 18GB RAM - Runs: 10,000 iterations per measurement, report mean + p99 - Cold start excluded: the comparison is warm-path performance

Implementation	Mean	p99	Throughput
TypeScript v0.1 (with regex bug)	50.2ms	89ms	20 req/s
TypeScript v0.1 (regex cached)	28.1ms	41ms	36 req/s
Rust via NAPI (single file)	1.82ms	2.9ms	549 req/s
Rust via NAPI (batch, 50 files)	8.4ms total	12ms	5,952 files/s

The 27x headline number is TypeScript v0.1 (with regex bug) vs. Rust single-file: 50.2ms to 1.82ms.

The more meaningful comparison is TypeScript v0.1 (regex cached, representing an optimized TypeScript implementation) vs. Rust: 28.1ms to 1.82ms, which is 15.4x.

Both represent meaningful improvements. The batch performance is where the Rust advantage compounds: scanning 50 files in 8.4ms is not achievable in TypeScript on a single thread.

---

The NAPI Serialization Cost

At small input sizes, the NAPI boundary dominates. JSON serialization for a 10-token file costs approximately 0.12ms, which is 6.5% of the 1.82ms total for a 1000-token file. At 50 tokens, it is a larger fraction.

This means the Rust scanner is not 27x faster for trivially small inputs. For inputs under 100 tokens, the gap is closer to 8-10x. The improvement is most dramatic for the medium-to-large code blocks that are typical in real Claude Code sessions.

---

Lessons Learned

**1. Profile before rewriting.** We found the regex compilation bug in TypeScript that brought 50ms to 28ms before writing a line of Rust. If that had been the only problem, a Rust rewrite would have been unnecessary.

**2. NAPI serialization is the ceiling at small sizes.** If we were doing this again, we would investigate zero-copy NAPI buffers for the input string rather than JSON serialization. This is a 0.12ms improvement that we have not yet implemented.

**3. Rayon is trivially easy for embarrassingly parallel workloads.** Replacing `.iter()` with `.par_iter()` and letting Rayon handle thread pooling is one of the best power-to-effort ratios in systems programming.

**4. The TypeScript caller does not need to change.** Because we designed the NAPI interface to accept and return JSON strings, the TypeScript caller imports a function that looks like any other function. The rewrite was invisible to every other part of the codebase.

**5. The build pipeline is the hard part.** `cargo build --release` targeting multiple architectures (x86_64, arm64, linux-musl for Docker) required more CI configuration than the Rust code itself. Set up the cross-compilation pipeline early.

---

The Current State

The Rust scanner ships in Pretense v0.2.0 and later. It is the default scanner for all installations. The TypeScript fallback is still present in the codebase for debugging purposes but is not activated in production paths.

If you want to read the source:

- Core scanner: `packages/scanner-rs/src/lib.rs` - NAPI bindings: `packages/scanner-rs/src/napi_bindings.rs` - Benchmark suite: `packages/scanner-rs/benches/`

The scanner is part of the Pretense monorepo and the relevant packages are MIT-licensed.

[Get started with Pretense at pretense.ai/docs](/docs)