Project Bi-Weekly Update: Enhanced Memory & Time Benchmarking with HashSet vs BloomFilter
Student: Jonathan Ami
Date: April 4, 2025
proc_benchmark.sh and insertion_time_benchmark.shTwo new benchmarking tools were introduced:
insertion_time_benchmark.sh: Runs time-based insertion tests from 100k to 5M entries using both data structures, writing results to CSV.proc_benchmark.sh: Profiles /proc/<pid>/status data after 1-second runs to capture memory usage for Bloom Filter and HashSet, exporting to bloomfilter_mem.csv and hashset_mem.csv.VmRSS (Resident Memory) showed expected trends as insertions scaled:

| N | VmRSS (KB) - HashSet | VmRSS (KB) - BloomFilter |
|---|---|---|
| 100k | 3244 | 2740 |
| 1M | 20428 | 3680 |
| 2M | 38976 | 4932 |
Observations:
Insertion timing, recorded using the insertion_time_benchmark.sh, also showed favorable results for BloomFilter:
| N | Time (ms) - HashSet | Time (ms) - BloomFilter |
|---|---|---|
| 100k | ~1.47 | ~2.69 |
| 5M | ~81.10 | ~182.17 |
Analysis:
benchmark_results/ folder for all generated CSV data..sh scripts to handle batch timing and /proc memory capture.