AI News HubLIVE
站内改写4 分钟阅读

待翻译:Show HN: Opthash – Rust implementations of Elastic and Funnel hashing

AI 服务暂时不可用,以下为来源摘要,待恢复后补全翻译:Notifications You must be signed in to change notification settings Fork 0 Star 6 BranchesTags Open more actions menu Folders and files NameName Last commit message Last commit date Latest commit History 210 Commits 210…

来源Hacker News AI作者: aayd

AI 服务暂时不可用,以下为来源正文,待恢复后补全翻译。

Notifications You must be signed in to change notification settings Fork 0 Star 6 BranchesTags Open more actions menu Folders and files NameName Last commit message Last commit date Latest commit History 210 Commits 210 Commits .github/workflows .github/workflows .vscode .vscode assets assets benches benches ci ci opthash opthash paper paper scripts scripts src src tests tests .gitattributes .gitattributes .gitignore .gitignore .pre-commit-config.yaml .pre-commit-config.yaml AGENTS.md AGENTS.md CHANGELOG.md CHANGELOG.md CLAUDE.md CLAUDE.md Cargo.lock Cargo.lock Cargo.toml Cargo.toml LICENSE LICENSE README.md README.md pyproject.toml pyproject.toml uv.lock uv.lock Repository files navigation Rust implementations of Elastic Hashing and Funnel Hashing from Optimal Bounds for Open Addressing Without Reordering (Farach-Colton, Krapivin, Kuszmaul, 2025) — see References 1. Both are open-addressing hash maps that achieve optimal expected probe complexity without reordering elements after insertion. Data Structures Both maps share a common core: a single-Arena allocation per map indexed by per-level descriptors, 7-bit fingerprint control bytes, SIMD control-byte scans for occupancy + lookup, and tombstone accounting 2 3 4. Per-level salt re-randomization 5 decorrelates probe paths across levels. The default BuildHasher is foldhash 6. The two maps differ in how they probe within a level: ElasticHashMap — Levels with geometrically halving capacities, each probed by a SwissTable-style triangular sequence ((idx + delta) & mask); inserts follow a per-level probe budget. FunnelHashMap — Bucketed levels (paper §5): a key maps to one bucket per level; overflow spills to the next level, not other buckets. Plus a split special array: primary (odd-step group probe) and fallback (two-choice buckets). Both maps mirror std::collections::HashMap's API and support the same operations. Each map starts with zero allocation (new()) and grows dynamically on demand. The reserve_fraction headroom knob is exposed via dedicated constructors. Usage Rust cargo add opthash use opthash::{ElasticHashMap, FunnelHashMap}; let mut map = ElasticHashMap::new(); map.insert("key", 42); assert_eq!(map.get("key"), Some(&42)); let mut map = ElasticHashMap::with_capacity_and_reserve_fraction(1024, 0.10); map.insert("key", 42); assert_eq!(map.get("key"), Some(&42)); let mut map = FunnelHashMap::with_capacity_and_reserve_fraction(1024, 0.10); map.insert("key", 42); assert_eq!(map.get("key"), Some(&42)); Python pip install opthash from opthash import ElasticHashMap, FunnelHashMap m = ElasticHashMap() m["key"] = 42 assert m["key"] == 42 assert "key" in m and len(m) == 1 m = ElasticHashMap.with_options(capacity=1024, reserve_fraction=0.10) m = FunnelHashMap.with_options(capacity=1024, reserve_fraction=0.10) Layout Sketch Arena (one allocation per map) ============================== fp = fingerprint (7-bit control byte) kv = key-value entry, = empty (CTRL_EMPTY = 0x00), xx = tombstone (CTRL_TOMBSTONE = 0x80) All control bytes pack first, then alignment padding so the slot region starts at align_of::>(), then all slots: arena::ptr ► [fp fp xx ... ][fp xx fp ...][fp fp ...][ pad ][kv kv kv ...][kv ... ] └─── ctrl L0 ────┘└─── ctrl L1 ───┘└── ... ──┘ └─ slots L0 ─┘└─ ... ─┘ ▲ each descriptor caches ctrl_ptr + data_ptr into the arena. Each descriptor stores cached ctrl_ptr, data_ptr, capacity, plus per-shape metadata (salt, mask, etc). All slot/ctrl/SIMD ops live on the ArenaSlots trait (src/common/arena.rs). ElasticHashMap ============== levels: Box (descriptors only) Level 0 ctrl_ptr, data_ptr, capacity (~half of total slots) Level 1 geometrically halved Level 2 ... per-level group_count, group_count_mask, salt, len, tombstones, half_reserve_slot_threshold, budget_cap arena: Arena (the single allocation backing every level above) map-wide len, total_slots, max_insertions, reserve_fraction, batch_plan, current_batch_index, batch_remaining, max_populated_level, hash_builder, alloc FunnelHashMap ============= levels: Box (descriptors) Level 0 ctrl region fp xx fp ... fp fp xx ... fp ... slot region kv kv kv ... kv kv kv __ ... kv ... └── bucket 0 ──┘└── bucket 1 ──┘ (xx in ctrl marks a removed slot; the slot bytes may still hold the stale kv physically, but are logically uninit — never read.) Level 1 (same layout, smaller buckets) per-level bucket_count_mask, bucket_size_log2, salt, len, tombstones special: SpecialArray primary group-probed (paper B) group_count_mask, len, tombstones fallback two-choice bucketed (paper C) bucket_count, bucket_size_log2, len, tombstones arena: Arena (covers every level + both special regions) map-wide len, total_slots, max_insertions, reserve_fraction, primary_probe_limit, max_populated_level, hash_builder, alloc Benchmarks See benches/README.md for comparison charts. References Footnotes Martín Farach-Colton, Andrew Krapivin, William Kuszmaul. Optimal Bounds for Open Addressing Without Reordering (2025). arXiv: https://arxiv.org/abs/2501.02305. Establishes the elastic and funnel hashing schemes implemented in src/elastic.rs and src/funnel.rs; the funnel "special array" split into primary (group-probed, paper B) and fallback (two-choice, paper C) follows the paper's construction directly. ↩ Abseil. SwissTable design notes. https://abseil.io/about/design/swisstables. Source of the 7-bit fingerprint control-byte layout + SIMD group scans used by the shared ArenaSlots trait (see src/common/arena.rs, src/common/control.rs, src/common/simd.rs) and the triangular (idx + delta) & mask probe sequence used in Level::triangular_group_start. ↩ Matt Kulukundis. Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step (CppCon 2017). https://www.youtube.com/watch?v=ncHmEUmJZf4. Talk introducing the SwissTable design referenced above. ↩ hashbrown — Rust port of SwissTable. https://github.com/rust-lang/hashbrown. Used as the absolute throughput ceiling in the Criterion benches (see benches/README.md). ↩ J. Lawrence Carter, Mark N. Wegman. Universal Classes of Hash Functions (STOC 1977 / JCSS 1979). DOI: https://doi.org/10.1016/0022-0000(79)90044-8. Foundational hash-based probing model the FKK bounds rely on; the per-level salt re-randomization in Level/BucketLevel (see level_salt in src/common/math.rs) follows the universal-hashing assumption. ↩ foldhash crate. https://crates.io/crates/foldhash. Default BuildHasher (foldhash::fast::RandomState) wired up in src/common/mod.rs. ↩ About Optimal open-addressing hash maps (Elastic Hashing & Funnel Hashing) in Rust, with Python bindings. Topics python rust data-structures hashmap open-addressing pyo3 funnel-hashing elastic-hashing Resources Readme License Apache-2.0 license Uh oh! There was an error while loading. Please reload this page. Activity Stars 6 stars Watchers 0 watching Forks 0 forks Report repository Releases 4 v0.10.1 Latest May 31, 2026 + 3 releases Packages 0 Uh oh! There was an error while loading. Please reload this page. Uh oh! There was an error while loading. Please reload this page. Contributors Uh oh! There was an error while loading. Please reload this page. Languages Rust 88.2% Python 10.4% Shell 1.4%