2025 Theses Doctoral
Unlocking Storage Performance: A Systems Approach to Kernel Bypass, Replication, and Caching
Modern storage technologies, such as NVMe SSDs and RDMA-enabled network fabrics, offer unprecedented performance, significantly raising the bar for software infrastructure efficiency. However, traditional storage software stacks, particularly within the Linux kernel, have emerged as critical performance bottlenecks, limiting the full utilization of these hardware capabilities. This thesis addresses this challenge through a systematic exploration of kernel bypass, replication efficiency, and caching strategies, each aimed at significantly improving storage system performance.
The first component of this thesis introduces BPF-KV, a key-value store designed around XRP, an innovative in-kernel execution framework utilizing eBPF (extended Berkeley Packet Filter). BPF-KV leverages XRP to safely bypass the kernel storage stack, executing critical storage functions directly within the NVMe interrupt handler. By embedding simple, user-defined functions such as index traversals and aggregations in kernel space, BPF-KV minimizes kernel-user space crossings, significantly reducing latency and improving throughput. This approach maintains compatibility with existing kernels and preserves essential system properties such as isolation and security, unlike traditional kernel bypass techniques like SPDK, which sacrifice safety and CPU efficiency. Experimental results demonstrate that BPF-KV achieves up to 2.5× higher throughput and substantially lower tail latency compared to traditional system calls.
The second contribution, RubbleDB, addresses inefficiencies in replicated log-structured merge (LSM) tree-based key value stores, prevalent in modern databases. Traditional replication mechanisms redundantly execute compaction operations across replicas, consuming excessive CPU resources. RubbleDB eliminates this redundancy by using NVMe-over-Fabrics (NVMe-oF), an efficient, CPU-friendly networked storage protocol, to share compacted SSTable files directly between replicas. By pre-allocating storage space and employing direct I/O via NVMe-oF, RubbleDB ensures strong consistency and efficient data replication without redundant CPU work. Evaluations demonstrate that RubbleDB significantly reduces CPU usage and network overhead, thereby enhancing the scalability and performance of distributed LSM-tree-based databases under write-intensive workloads.
Finally, the thesis presents DFUSE, a distributed file system that delivers strongly consistent kernel-level write-back caching, overcoming limitations inherent in existing FUSE-based distributed systems. Traditionally, FUSE systems face a trade-off between strong consistency (write-through caching) and high performance (write-back caching with weak consistency). DFUSE resolves this trade-off by embedding distributed coordination logic directly into the kernel’s FUSE driver. By offloading consistency control to the kernel, DFUSE effectively synchronizes the page cache across nodes, ensuring strong consistency while leveraging the performance advantages of write-back caching. Experimental evaluations show DFUSE achieves up to 68% higher throughput and 40% lower latency compared to traditional FUSE designs.
Together, BPF-KV, RubbleDB, and DFUSE represent a holistic systems approach to unlocking the performance potential of modern storage hardware. By systematically addressing kernel overhead, replication inefficiencies, and caching consistency, this thesis offers concrete, widely applicable solutions that significantly enhance storage system performance in diverse deployment scenarios.
Files
This item is currently under embargo. It will be available starting 2026-09-08.
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Cidon, Asaf
- Degree
- Ph.D., Columbia University
- Published Here
- October 15, 2025