Theses Doctoral

Cryptographic Data Structures

Yeo, Kevin Wei Li

We study cryptographic data structures that enable efficient organization and access of information while simultaneously protecting the privacy of data access patterns. Cryptographic data structures have been studied for nearly four decades under different names including oblivious RAM (ORAM), private information retrieval (PIR) and symmetric searchable encryption (SSE). Even though cryptographic data structures have been well-studied for a long time, our understanding remains limited especially compared to our knowledge of standard (non-cryptographic) data structures. In this work, we develop techniques for faster constructions as well as establish inherent limitations for different cryptographic data structures.

First, we study oblivious RAM (ORAM) that considers the setting of cryptographically protecting accesses and updates performed by a client into random access memory (i.e., an array) stored by a server. Even though arrays are a very simple functionality, it turns out that cryptographic arrays are critically important as they may be used to generically compile any standard data structure into a cryptographic one. We present the first logarithmic overhead construction of oblivious RAMs with 𝑂(log 𝑛・log log 𝑛) overhead for an n-entry database that nearly matches the best lower bounds of 𝛀(log 𝑛). This is a near quadratic improvement over the prior best constructions requiring 𝑂(log² 𝑛 / log log 𝑛) overhead.

Secondly, we extensively study the landscape of trade-offs between efficiency and privacy by examining different privacy guarantees for cryptographic dynamic arrays. We consider several privacy notions including symmetric searchable encryption, differential privacy, snapshot security and multiple non-colluding servers. Surprisingly, for all these natural privacy notions, we prove an 𝛀(log 𝑛) lower bound that is also required by an ORAM. Our lower bounds seem to imply thatweakening the privacy guarantees of an ORAM in small ways does not enable speed-ups in efficiency. We also extend our techniques to prove logarithmic lower bounds for other dynamic data structures (including set membership, predecessor/successor and disjoint sets) that satisfy any of the above privacy notions.

Finally, we study private information retrieval (PIR) enabling a client to privately accessing entries of an array that are stored by one or more server(s) in read-only memory (akin to static data structures). It turns out that restricting server memory to be read-only substantially increases the difficulty of obtaining privacy. Without any additional preprocessing of an n-bit database, it is well known that each PIR client query requires 𝛀(𝑛) server computation linear in the database size. Thus, many recent works considered two approaches to obtain sublinear query time: batch queries for retrieving multiple entries at once and preprocessing.

In this dissertation, we present improvements in both directions. For batch queries, we construct the most efficient explicit blackbox reduction from batch PIR to single-query PIR (also known as a batch code). To do this, we present a cuckoo hashing scheme with negligible failure obtaining quadratically smaller query overhead compared to prior works. In the preprocessing setting, we present the highest lower bounds to date for PIR with public and private preprocessing. Additionally, we construct a PIR with private preprocessing scheme that obtains optimal trade-offs between preprocessing space and query times matching our lower bound (up to logarithmic factors) for all parameter regimes.

Files

  • thumbnail for Yeo_columbia_0054D_19485.pdf Yeo_columbia_0054D_19485.pdf application/pdf 1.82 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Malkin, Tal G.
Degree
Ph.D., Columbia University
Published Here
October 15, 2025