Theses Doctoral

Harnessing Simulated Data with Graphs

Maia, Henrique Teles

Physically accurate simulations allow for unlimited exploration of arbitrarily crafted environments. From a scientific perspective, digital representations of the real world are useful because they make it easy validate ideas. Virtual sandboxes allow observations to be collected at-will, without intricate setting up for measurements or needing to wait on the manufacturing, shipping, and assembly of physical resources. Simulation techniques can also be utilized over and over again to test the problem without expending costly materials or producing any waste.

Remarkably, this freedom to both experiment and generate data becomes even more powerful when considering the rising adoption of data-driven techniques across engineering disciplines. These are systems that aggregate over available samples to model behavior, and thus are better informed when exposed to more data. Naturally, the ability to synthesize limitless data promises to make approaches that benefit from datasets all the more robust and desirable.

However, the ability to readily and endlessly produce synthetic examples also introduces several new challenges. Data must be collected in an adaptive format that can capture the complete diversity of states achievable in arbitrary simulated configurations while too remaining amenable to downstream applications. The quantity and zoology of observations must also straddle a range which prevents overfitting but is descriptive enough to produce a robust approach. Pipelines that naively measure virtual scenarios can easily be overwhelmed by trying to sample an infinite set of available configurations. Variations observed across multiple dimensions can quickly lead to a daunting expansion of states, all of which must be processed and solved. These and several other concerns must first be addressed in order to safely leverage the potential of boundless simulated data.

In response to these challenges, this thesis proposes to wield graphs in order to instill structure over digitally captured data, and curb the growth of variables. The paradigm of pairing data with graphs introduced in this dissertation serves to enforce consistency, localize operators, and crucially factor out any combinatorial explosion of states. Results demonstrate the effectiveness of this methodology in three distinct areas, each individually offering unique challenges and practical constraints, and together showcasing the generality of the approach. Namely, studies observing state-of-the-art contributions in design for additive manufacturing, side-channel security threats, and large-scale physics based contact simulations are collectively achieved by harnessing simulated datasets with graph algorithms.


  • thumnail for Maia_columbia_0054D_17506.pdf Maia_columbia_0054D_17506.pdf application/pdf 7.85 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Grinspun, Eitan
Zheng, Changxi
Ph.D., Columbia University
Published Here
September 28, 2022