Academic Commons

Theses Doctoral

Fast, Scalable, and Accurate Algorithms for Time-Series Analysis

Paparrizos, Ioannis

Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs.
For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis.
For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings.

Files

  • thumnail for Paparrizos_columbia_0054D_14479.pdf Paparrizos_columbia_0054D_14479.pdf application/pdf 2.17 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Gravano, Luis
Degree
Ph.D., Columbia University
Published Here
July 21, 2018
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.