Ethereum12 min read

Memory-Aware Caching in Lighthouse

Implementing dynamic, byte-limited beacon state caching to prevent OOM crashes during extended non-finality periods

March 15, 2025—

Memory-Aware Caching in Lighthouse

As part of my work during the Ethereum Protocol Fellowship (Cohort 6), I implemented dynamic memory-aware beacon state caching in Lighthouse, an Ethereum Consensus Layer client written in Rust. This post details the incident that motivated this work, the technical implementation, and the benchmarks that validated the approach.

The Pectra Holesky Incident

On February 24th, 2025, shortly after the Pectra upgrade on Holesky testnet, an invalid block was accepted by Geth, Nethermind, and Besu due to a misconfigured depositContractAddress value. Since these three clients represented a super-majority of the network, most validators attested to this invalid block, causing it to become part of a justified checkpoint.

This triggered a prolonged period of non-finality where the network couldn't finalize new blocks, activating "inactivity leak" mode that quadratically increased penalties for validators.

For Lighthouse specifically, this created severe operational challenges:

Memory bloat: Inactivity penalties added ~70MB of data to each Beacon State at epoch boundaries (memory optimizations weren't applied to the inactivity_scores field)
Disk explosion: Usage ballooned from 20GB to over 1TB as the hot database couldn't perform normal migrations to the more efficient freezer database
OOM crashes: Nodes experienced out-of-memory crashes due to expensive state lookups when serving non-finalized blocks to peers attempting to sync

The incident exposed critical gaps in client resilience during adverse network conditions.

The Problem with Count-Based Caching

Lighthouse used a fixed-size LRU cache for beacon state that performed well during normal operations but became dangerously inefficient during network stress. The cache was bounded by count (default 32 states), which translates to roughly 500MB during normal conditions.

However, as Beacon State size increases during non-finality, the same 32 states could expand to multiple gigabytes. This risks validator liveness and network stability, particularly affecting solo stakers with limited hardware resources who can't simply throw more RAM at the problem.

The Solution: Byte-Limited Caching

My implementation moves Lighthouse from a rigid, count-based cache to an optional byte-limit based caching system. The key insight is that we should cap memory usage directly rather than using item count as a proxy.

Core Components

1. Memory Size Calculation

At the foundation, I leveraged Milhouse's MemorySize trait for BeaconState inside the consensus/types crate. Using Lighthouse's existing macros, the implementation traverses all relevant sub-trees and lists to calculate accurate memory footprint.

impl MemorySize for BeaconState {
    fn memory_size(&self, tracker: &mut MemoryTracker) -> usize {
        // Traverse all fields, handling shared Arc structures
        self.validators.memory_size(tracker) +
        self.balances.memory_size(tracker) +
        self.inactivity_scores.memory_size(tracker) +
        // ... other fields
    }
}

2. Differential Memory Tracking

A critical implementation detail: the MemoryTracker ensures that shared Arc structures (such as committee caches and other shared sub-trees) are only counted once. This prevents over-counting when multiple cached states share references to the same underlying data.

pub struct MemoryTracker {
    seen_arcs: HashSet<usize>,  // Track Arc pointer addresses
    total_bytes: usize,
}

impl MemoryTracker {
    pub fn track_arc<T>(&mut self, arc: &Arc<T>) -> bool {
        let ptr = Arc::as_ptr(arc) as usize;
        self.seen_arcs.insert(ptr)  // Returns false if already seen
    }
}

3. Batched Measurement

Calculating memory on every cache operation would be prohibitively expensive. Instead, the implementation tracks insert counts and performs full measurement passes every N inserts:

impl StateCache {
    pub fn insert(&mut self, key: StateKey, state: BeaconState) {
        self.entries.insert(key, state);
        self.inserts_since_measure += 1;

        if self.inserts_since_measure >= MEASURE_INTERVAL {
            self.measure_and_maybe_prune();
            self.inserts_since_measure = 0;
        }
    }

    fn measure_and_maybe_prune(&mut self) {
        let mut tracker = MemoryTracker::new();
        let total_bytes: usize = self.entries.values()
            .map(|state| state.memory_size(&mut tracker))
            .sum();

        // Update metrics
        metrics::set_gauge(
            &BEACON_STATE_CACHE_MEMORY_SIZE,
            total_bytes as i64
        );

        // Prune if over limit
        if let Some(max_bytes) = self.max_cached_bytes {
            while total_bytes > max_bytes && !self.entries.is_empty() {
                self.evict_oldest_batch();
                // Re-measure after eviction
                total_bytes = self.measure_total();
            }
        }
    }
}

4. Batch Eviction

When memory usage exceeds the configured limit, the cache prunes states in batches. The batch size is derived from the existing state_cache_headroom parameter and clamped to a safe range, with re-measurement between batches until usage falls back under the limit.

CLI Integration

The new behavior is exposed via a --state-cache-max-mb CLI flag:

lighthouse bn \
    --state-cache-max-mb 8192 \
    --network holesky

This flag is disabled by default to preserve current behavior unless operators explicitly opt in. This conservative approach ensures existing deployments aren't affected while giving operators a tool to constrain memory during adverse conditions.

Metrics

I added dedicated metrics for both whole-cache and per-state memory sizing to make performance overhead visible in production:

store_beacon_state_cache_memory_size - Total bytes used by cached states
store_beacon_state_cache_size - Number of cached states
beacon_state_memory_size_calculation_time - Time spent in measurement code

Benchmarks and Validation

To evaluate the implementation, I recreated Holesky-like conditions by running Lighthouse and hammering it with HTTP requests to force the state cache to fill up with large beacon states.

Test Setup

Continuous HTTP spam to keep the state cache hot
Consistent node configuration across all tests
Only variable: whether --state-cache-max-mb was set
Metrics visualized in Grafana

Results: Without Byte Cap (Baseline)

Running without the memory limit:

State Cache Size: 128 (fixed at max)
State Cache Memory: N/A (not tracked)
Node Memory Usage: 42 GiB → 49 GiB (climbing)

The cache fills to its count limit (128 states) while total node memory climbs uncontrollably as each state grows larger during simulated non-finality.

Results: With Byte Cap (8 GiB)

Running with --state-cache-max-mb=8192:

State Cache Size: 60-70 entries (variable)
State Cache Memory: ~7.51 GiB (bounded)
Node Memory Usage: ~40.2 GiB (stable)

The cache dynamically adjusts its entry count to stay within the memory budget. During the benchmark:

Whole-cache recomputation: ~15-18 seconds
Per-item measurement cost: 300-350ms
CPU overhead: ~40-45% of a single core during recomputation

Live Metrics Snapshots

$ curl -s http://127.0.0.1:5054/metrics \
  | grep -E 'store_beacon_state_cache_(memory_size|size)'

store_beacon_state_cache_memory_size 7380783997
store_beacon_state_cache_size 37

store_beacon_state_cache_memory_size 8015966929
store_beacon_state_cache_size 28

store_beacon_state_cache_memory_size 7576144761
store_beacon_state_cache_size 17

These samples show the cache oscillating between 6.87-7.47 GiB while the entry count shrinks from 37 to 17. This is exactly the expected behavior: the cache keeps pruning states until measured memory sits just below the configured 8 GiB ceiling, even if that means discarding a large fraction of cached entries.

Key Insights

1. Memory measurement has overhead, but it's bounded

Yes, there's CPU cost for measuring memory (~15-18 seconds per full scan), but it's predictable and doesn't saturate cores. The batched approach keeps this overhead manageable.

2. Count-based limits are a poor proxy for memory

During normal operation, "128 states ≈ 500MB" is a reasonable assumption. During non-finality, "128 states ≈ 10GB" breaks everything. Direct byte limits are the right abstraction.

3. Shared data structures complicate measurement

Beacon states share significant structure via Arc. Without careful tracking, you'll either over-count (leading to premature eviction) or under-count (failing to prevent OOM). The MemoryTracker pattern solves this elegantly.

4. Operators need tunables for their hardware

Solo stakers running on 16GB machines have very different constraints than institutional operators with 128GB servers. The --state-cache-max-mb flag gives everyone a knob to tune for their environment.

Conclusion

The memory-aware caching implementation transforms Lighthouse's state cache into an adaptive component that respects actual memory constraints. During the kind of stressed, non-finalizing conditions that broke Holesky, operators now have a concrete tool to prevent OOM crashes and maintain validator liveness.

The approach demonstrates a broader principle: when system resources are the constraint, measure and limit those resources directly rather than using proxies that break down under stress.

This work was done as part of the Ethereum Protocol Fellowship (Cohort 6), mentored by Michael Sproul and Dapplion from the Lighthouse team. The PR can be found at sigp/lighthouse#7803.

Topics

rustlighthouseconsensuscachingperformanceepf