Theses Doctoral

Fossil, data, and information driven paleontology

Yu, Congyu

Paleontology is based on fossils but what is the link between fossil specimens and our reconstruction of life history seems to be ambiguous. The majority of paleontological studies focus on fossil morphology to infer their phylogenetic status, but recently increasing number of studies emphasize the role of paleontological data rather than particular specimens. Datasets construction and data processing are still basic in many paleontological studies, thus hampering the transition towards data-driven paleontology. More importantly, there has been a lack of understanding of the difference between data and information embedded inside. In this thesis, I present examples of three kinds of paleontological studies driven by fossil, data, and information, respectively, which shows the reconstruction of evolutionary history via different level of features from fossils.

Chapter 1 shows the evolution and development of ceratopsian dinosaurs with emphasis on the fossil materials from the Gobi Desert, Mongolia. Chpater 1.1 reports Beg tse, a neoceratopsian dinosaur that is sister to all other know neoceratopsians, and morphologically and temporally between neoceratopsians and more basal ceratopsians. In chapter 1.2, to further explore the development of Protoceratops as well as other ornithischian dinosaurs, two embryonic Protoceratops skulls are CT-scanned and compared with more mature Protoceratops and other ornithischian dinosaurs. The results show strong peramorphosis in ceratopsian dinosaurs and conservative cranial development in stem ornithischians. Chapter 1.3 reports a new species of Protoceratops, P. tengri, which bears a regular wavy pattern along its neck frill that is absent in almost all previously reported Protoceratops. Such structure may function as display as it seems to be the ancestral form of other patterned cranial structures in more derived ceratopsids.

Chapter 2 focus on data-driven paleontological studies, especially the applications of artificial intelligence (AI). Chapter 2.1 is based on the data comprised from chapter 1.2, deep neural networks (DNNs) are used to segment CT slices of embryonic Protoceratops fossils and have reached human comparable performance, but the generalization ability of such models remains questionable. Chapter 2.2 shows DNNs-based localization and segmentation of osteons in histological thin sections from Alvarezsaurian dinosaurs. The results indicate a truncated development pathway rather than compressed development during the miniaturization of this group. Chapter 2.3 is a short review about previous AI applications in paleontology, in which a large portion is based on data from foraminifera, insects, and other microfossils while only few are working with vertebrate fossils. There are approximate 10-year gap in algorithms and datasets between paleontology and mainstream AI studies.

Chapter 3 explores the even basic level of data-driven paleontology, the information. Under the framework of information theory and communication system engineering, chapter 3.1 introduces the basic concepts of information theory and how they are represented in paleontological studies. Chapter 3.2 quantify the information entropy, mutual information, and channel capacity in morphological character matrices of various groups of vertebrates. The results suggest alternative weighting strategy in phylogenetic analysis and question current construction strategy of morphological character matrices. Chapter 3.3 makes further perspective about the application of information theory in paleontological study by treating it as a communication system.

During the last two decades, the increase of data and appearance of novel methods have led many research fields transiting towards data driven. However, the construction of datasets, harnessing of novel data processing methods, and establishment of a general theory all indicate significant lags between paleontology and many other research fields. This thesis provides the very initial examples towards data-driven paleontological studies.

Geographic Areas


  • thumnail for Yu_columbia_0054D_17241.pdf Yu_columbia_0054D_17241.pdf application/pdf 3.02 MB Download File

More About This Work

Academic Units
Earth and Environmental Sciences
Thesis Advisors
Meng, Jin
Ph.D., Columbia University
Published Here
May 25, 2022