2014 Theses Doctoral
Accelerating Similarly Structured Data
The failure of Dennard scaling [Bohr, 2007] and the rapid growth of data produced and consumed daily [NetApp, 2012] have made mitigating the dark silicon phenomena [Esmaeilzadeh et al., 2011] and providing fast computation for processing large volumes and expansive variety of data while consuming minimal energy the utmost important challenges for modern computer architecture. This thesis introduces the concept that grouping data structures that are previously defined in software and processing them with an accelerator can significantly improve the application performance and energy efficiency.
To measure the potential performance benefits of this hypothesis, this research starts out by examining the cache impacts on accelerating commonly used data structures and its applicability to popular benchmarks. We found that accelerating similarly structured data can provide substantial benefits, however, most popular benchmark suites do not contain shared acceleration targets and therefore cannot obtain significant performance or energy improvements via a handful of accelerators. To further examine this hypothesis in an environment where the common data structures are widely used, we choose to target database application domain, using tables and columns as the similarly structured data, accelerating the processing of such data, and evaluate the performance and energy efficiency. Given that data partitioning is widely used for database applications to improve cache locality, we architect and design a streaming data partitioning accelerator to assess the feasibility of big data acceleration. The results show that we are able to achieve an order of magnitude improvement in partitioning performance and energy. To improve upon the present ad-hoc communications between accelerators and general-purpose processors [Vo et al., 2013], we also architect and evaluate a streaming framework that can be used for the data parti- tioner and other streaming accelerators alike. The streaming framework can provide at least 5 GB/s per stream per thread using software control, and is able to elegantly handle interrupts and context switches using a simple save/restore. As a final evaluation of this hypothesis, we architect a class of domain-specific database processors, or Database Processing Units (DPUs), to further improve the performance and energy efficiency of database applications. As a case study, we design and implement one DPU, called Q100, to execute industry standard analytic database queries. Despite Q100's sensitivity to communication bandwidth on-chip and off-chip, we find that the low-power configuration of Q100 is able to provide three orders of magnitude in energy efficiency over a state of the art software Database Management System (DBMS), while the high-performance configuration is able to outperform the same DBMS by 70X.
Based on these experiments, we conclude that grouping similarly structured data and processing it with accelerators vastly improve application performance and energy efficiency for a given application domain. This is primarily due to the fact that creating specialized encapsulated instruction and data accesses and datapaths allows us to mitigate unnecessary data movement, take advantage of data and pipeline parallelism, and consequently provide substantial energy savings while obtaining significant performance gains.
- Wu_columbia_0054D_11961.pdf binary/octet-stream 4.53 MB Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Kim, Martha Allen
- Ph.D., Columbia University
- Published Here
- July 7, 2014