Statistical methods for indirectly observed network data

Tyler H. McCormick

Statistical methods for indirectly observed network data
McCormick, Tyler H.
Thesis Advisor(s):
Zheng, Tian
Persistent URL:
Ph.D., Columbia University.
Social networks have become an increasingly common framework for understanding and explaining social phenomena. Yet, despite an abundance of sophisticated models, social network research has yet to realize its full potential, in part because of the difficulty of collecting social network data. In many cases, particularly in the social sciences, collecting complete network data is logistically and financially challenging. In contrast, Aggregated Relational Data (ARD) measure network structure indirectly by asking respondents how many connections they have with members of a certain subpopulation (e.g. How many individuals with HIV/AIDS do you know?). These data require no special sampling procedure and are easily incorporated into existing surveys. This research develops a latent space model for ARD. This dissertation proposes statistical methods for methods for estimating social network and population characteristics using one type of social network data collected using standard surveys. First, a method to estimate both individual social network size (i.e., degree) and the distribution of network sizes in a population is prosed. A second method estimates the demographic characteristics of hard-to-reach groups, or latent demographic profiles. These groups, such as those with HIV/AIDS, unlawful immigrants, or the homeless, are often excluded from the sampling frame of standard social science surveys. A third method develops a latent space model for ARD. This method is similar in spirit to previous latent space models for networks (see Hoff, Raftery and Handcock (2002), for example) in that the dependence structure of the network is represented parsimoniously in a multidimensional geometric space. The key distinction from the complete network case is that instead of conditioning on the (latent) distance between two members of the network, the latent space model for ARD conditions on the expected distance between a survey respondent and the center of a subpopulation in the latent space. A spherical latent space facilitates tractable computation of this expectation. This model estimates relative homogeneity between groups in the population and variation in the propensity for interaction between respondents and group members.
Item views
text | xml
Suggested Citation:
Tyler H. McCormick, , Statistical methods for indirectly observed network data, Columbia University Academic Commons, .

Columbia University Libraries | Policies | FAQ