When One Cache Isn’t Enough: Diagnosing a Bidirectional Expiry Bug Humans and AI Both Miss

🧩 TL;DR:

A subtle expireAfterAccess bug in a dual-cache setup caused thousands of evictions in a small cache — even for reused keys. The root cause? Only one side of the bidirectional cache was touched on read, leading to silent expiry in the reverse map. This engineering postmortem breaks down how different experience levels (from junior to principal) would likely observe, diagnose, and resolve the issue — and how AI fits into both the creation and correction of this kind of fault. Spoiler: abstraction was the missing link.

Below is a complete analysis of the cache desynchronization bug in getOrCreateHash(...), focused on engineering-level diagnostics, and a candid view on how AI-generated code may or may not affect fault detection and resolution.

🔍 Problem Recap

public String getOrCreateHash(Map<String, String> keyData) {
	String canonical = buildCanonicalKeyString(keyData);

	String existing = keyToHash.getIfPresent(canonical);
	if (existing != null) {
		return existing; // ❌ only touches keyToHash
	}

	String hash = computeHash(canonical);
	insertMapping(canonical, hash);
	return hash;
}

Bug: Only one side of a bidirectional Caffeine cache is touched.
If expireAfterAccess is used (as in this case), the reverse cache hashToKey is silently evicted over time — leading to excessive evictions despite key reuse.

In this scenario, the issue becomes apparent when engineers observe thousands of evictions in a cache sized for only 300–400 entries — a clear signal that something is wrong.

👨‍💻 Engineering Level Analysis

Role	Can Spot Bug?	Can Fix Bug?	Reasoning & Diagnostic Path
Junior Engineer (0–2 years)	✅ Notices something’s wrong (evictions) 🚫 Root cause unclear	⚠️ Maybe	Would observe high eviction counts but likely trust cache behavior. May not recognize that asymmetric access is causing silent expiry. Unfamiliar with bidirectional cache semantics.
Mid-Level Engineer (2–5 years)	✅ Likely suspects expiry issue	✅ Yes	Can correlate eviction logs or metrics with recent accesses. Might miss that only one cache is touched unless familiar with `expireAfterAccess`. Needs to trace code carefully.
Senior Engineer (5–10 years)	✅ Likely identifies root cause	✅ Yes	More experience with caching patterns. Recognizes asymmetry and confirms it through inspection or instrumentation. May suggest better observability or fixes.
Lead/Principal Engineer (10+ years)	✅ Immediately questions the design	✅ Yes	Will question dual cache use without encapsulation. Sees this as a design flaw, not just an implementation bug. Likely to abstract into a safer construct.

💡 Does AI Help or Hinder?

Perspective	AI Helpfulness	Reasoning
Code Generation	✅ Helpful	AI can generate cache setup and hash computation quickly, including eviction and ordering logic. Boosts productivity.
Code Correctness	⚠️ Mixed	AI-generated code may overlook subtle behaviors like `expireAfterAccess` side effects, especially when bidirectional caches are split.
Debugging Assistance	✅ Helpful	With the right prompts, AI can explain `expireAfterAccess`, describe why stale keys are evicted, and suggest cache touch strategies.
Fault Resolution	✅ Helpful	AI can summarize edge cases, provide precise fixes, and suggest more robust abstractions like `BiDirectionalCache`.

🎯 Final Summary

Role	AI Productivity Boost	AI Introduced Fault?	AI Helped Fix?
Junior Engineer	✅ High	⚠️ Possibly	✅ Yes
Mid-Level Engineer	✅ Moderate	⚠️ Possibly	✅ Yes
Senior Engineer	✅ High	⚠️ Unlikely	✅ Yes
Principal Engineer	✅ High	❌ No	✅ Yes

Absolutely — here’s a refined version of the 🧠 Key Insight section with all your requested points integrated clearly and concisely:

🧠 Key Insight

The bug is not inherently caused by AI — it stems from a subtle behavioral mismatch in how bidirectional caches behave under expireAfterAccess, a nuance that’s easy to miss without deep familiarity.

AI accelerated prototyping and didn’t hinder resolution, especially when paired with strong observability. However, the true root cause was architectural: missing abstraction — using two independent caches without coordinated expiry access.

⚠️ AI Code Risk Depends on Engineering Experience

Role	AI Code Risk Level	Why
Junior Engineer	🔴 High	May accept AI output at face value, with limited understanding of hidden runtime behavior.
Mid-Level Engineer	🟠 Moderate	Can spot issues but may still miss deeper problems like expiry symmetry or lifecycle mismatches.
Senior Engineer	🟢 Low	More equipped to critique cache design, expiry assumptions, and test coverage depth.
Principal Engineer	🟢 Very Low	Likely to question abstraction integrity, assumptions, and enforce long-term robustness.

✅ Safe AI-Assisted Development Requires Discipline

Always thoroughly review and test AI-generated code. Don’t assume correctness — assume shortcuts.
Prefer small, isolated code blocks when using AI. The more targeted the request, the easier it is to verify.
Do not blindly trust AI-generated unit tests.
Like humans, AI tends to test the happy path or what it just wrote — not edge cases, failure paths, or misbehaving integrations.
Strong peer review is critical. Design, expiry behavior, and system-level contracts must be checked, not just method-level correctness.

🧪 AI is a force multiplier — but without deep review and real-world testing, it can amplify both quality and error at scale.

The Daily Kebab

The ramblings of a technomuse