Theses Doctoral

Making Data Storage Efficient in the Era of Cloud Computing

Tang, Yang

We enter the era of cloud computing in the last decade, as many paradigm shifts are happening on how people write and deploy applications. Despite the advancement of cloud computing, data storage abstractions have not evolved much, causing inefficiencies in performance, cost, and security.

This dissertation proposes a novel approach to make data storage efficient in the era of cloud computing by building new storage abstractions and systems that bridge the gap between cloud computing and data storage and simplify development. We build four systems to address four data inefficiencies in cloud computing.

The first system, Grandet, solves the data storage inefficiency caused by the paradigm shift from upfront provisioning to a variety of pay-as-you-go cloud services. Grandet is an extensible storage system that significantly reduces storage costs for web applications deployed in the cloud. Under the hood, it supports multiple heterogeneous stores and unifies them by placing each data object at the store deemed most economical. Our results show that Grandet reduces their costs by an average of 42.4%, and it is fast, scalable, and easy to use.

The second system, Unic, solves the data inefficiency caused by the paradigm shift from single-tenancy to multi-tenancy. Unic securely deduplicates general computations. It exports a cache service that allows cloud applications running on behalf of mutually distrusting users to memoize and reuse computation results, thereby improving performance. Unic achieves both integrity and secrecy through a novel use of code attestation, and it provides a simple yet expressive API that enables applications to deduplicate their own rich computations. Our results show that Unic is easy to use, speeds up applications by an average of 7.58x, and with little storage overhead.

The third system, Lambdata, solves the data inefficiency caused by the paradigm shift to serverless computing, where developers only write core business logic, and cloud service providers maintain all the infrastructure. Lambdata is a novel serverless computing system that enables developers to declare a cloud function's data intents, including both data read and data written. Once data intents are made explicit, Lambdata performs a variety of optimizations to improve speed, including caching data locally and scheduling functions based on code and data locality. Our results show that Lambdata achieves an average speedup of 1.51x on the turnaround time of practical workloads and reduces monetary cost by 16.5%.

The fourth system, CleanOS, solves the data inefficiency caused by the paradigm shift from desktop computers to smartphones always connected to the cloud. CleanOS is a new Android-based operating system that manages sensitive data rigorously and maintains a clean environment at all times. It identifies and tracks sensitive data, encrypts it with a key, and evicts that key to the cloud when the data is not in active use on the device. Our results show that CleanOS limits sensitive-data exposure drastically while incurring acceptable overheads on mobile networks.


  • thumnail for Tang_columbia_0054D_15773.pdf Tang_columbia_0054D_15773.pdf application/pdf 1.31 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Yang, Junfeng
Ph.D., Columbia University
Published Here
February 27, 2020