Theses Doctoral

Causality inference between time series data and its applications

Chen, Siyuan

Ever since Granger first proposed the idea of quantitatively testing the causal relationship between data streams, the endeavor of accurately inferring the causality in data and using that information to predict the future has not stopped. Artificial Intelligence (AI), by utilizing the massive amounts of data, helps to solve complex problems, whether they include the diagnosis and detection of disease through medical imaging, email spam detection, or self-driving vehicles. Perhaps, this thesis will be trivial in ten years from now. AI has pushed humankind to reach the next technological level in technology. Nowadays, among most machine leaning inquiries, statistical relationships are determined using correlation measures. By feeding data into machine learning algorithms, computers update the algorithm’s parameters iteratively by extracting and mapping features to learning targets until the correlation increases to a significant level to cease the training process. However, with the increasing developments of powerful AI, there is really a shortage of exploring causality in data.

It is almost self-evident that ”correlation is not causality." Sometimes, the strong correlation established between variables through machine learning can be absurd and meaningless. Providing insight into causality information through data, which most of the machine learning methods fall short to do, is of paramount importance.

The subsequent chapters detail the four endeavors of studying causality in financial markets, earthquakes, animal/human brain signals, the predictivity of data sets. In Chapter 2, we further developed the concept of causality networks into a higher-order causality network. We applied these to financial data and tested their validity and ability to capture the system’s causal relationship. In next Chapter 3, We examined another type of time series-earthquakes. Violent seismic activities decimate people's lives and destroy entire cities and areas. This begs us to understand how earthquakes work and help us make reliably and evacuation-actionable predictions. The causal relationships of seismic activities in different areas are studied and established. Biological data, specifically brain signals, are time-series data and their causal pattern are explored and studied. Different human and mice brain signals are analyzed and clustered in Chapter 4 using their unique causal pattern to understand different brain cell activity. Finally, we realized that the causal pattern in the time series can be used to compress data. A causal compression ratio is invented and used as the data stream’s predictivity index. We describe this in Chapter 5.


  • thumnail for Chen_columbia_0054D_15692.pdf Chen_columbia_0054D_15692.pdf application/pdf 12.2 MB Download File

More About This Work

Academic Units
Mechanical Engineering
Thesis Advisors
Lipson, Hod
Ph.D., Columbia University
Published Here
January 24, 2020