Project Bi-Weekly Update: HashSet vs Bloom Filter Insertion Memory Benchmarking
Student: Jonathan Ami
Date: March 21, 2025
The Ddos benchmarking module was significantly refactored to better isolate and analyze memory usage. The major improvements include:
Vec<Ipv4Addr>, packets are now generated and inserted directly into the data structure.is_malicious() calls were removed. This ensures the benchmark only measures the memory and performance costs of insertion.These changes help ensure that any memory reported by heaptrack is strictly from the data structure itself.
Tests were run using the following commands:
HashSet:
heaptrack cargo run --release -- --test ddos-performance
Bloom Filter:
heaptrack cargo run --release -- --test ddos-performance --bloom
Here are the key results from heaptrack for both implementations:
peak heap memory consumption: 11.2MB
peak RSS (including overhead): 24.0MB
total memory leaked: 416.9kB
calls to allocation functions: 251,395
temporary allocations: 81,548 (32.44%)
total runtime: 00.114s
peak heap memory consumption: 11.2MB
peak RSS (including overhead): 23.9MB
total memory leaked: 417.0kB
calls to allocation functions: 251,398
temporary allocations: 81,548 (32.44%)
total runtime: 00.114s
Despite the expectation that a Bloom filter should use significantly less memory than a HashSet, both implementations showed identical peak memory usage (~11.2MB) and almost identical allocation behavior. The Bloom filter even made slightly more allocations than the HashSet.
This was surprising because:
Possible explanations include:
Vec<u8> or boxed internal buffers.This contradiction highlights a key real-world insight: theoretical memory savings may be masked by surrounding allocations unless benchmarking is tightly controlled.
ddos.rs to remove packet lookups and retain only direct insertion benchmarkingVecs and replaced them with inline IP generationmain.rs to pass benchmark parameters directly (e.g., blacklist size, FPR)u32 instead of Ipv4Addr to reduce allocation overheadno_std to benchmark raw memory use