2018 Theses Doctoral
Through a Dark Mirror: Answers, Questions, and the Creation of Machine Knowledge
This dissertation addresses the question of the creation of scientific knowledge, in the con- text of an online question-and-answer forum: Stack Overflow. The project starts from the claim advanced in the philosophy and sociology of science that Nature is not sufficient to settle disputes; for a fact to be created by being accepted as true, it must be justified in discourse. Actor-Network Theory claims that this justification occurs through the mo- bilisation of resources, which are marshalled to make a truth-claim unassailable without also defeating all the supporting resources. I adapt ANT’s claim to the methods of Social Network Analysis. In network terms, the facticity of a claim is measured by its indegree — the number of other claims which are justified by it — and its outdegree measures the count of the resources which justify it.
I first consider answers to Stack Overflow questions, because they are the most straight- forward to describe, and the closest object to what previous analyses of citation networks of academic papers have considered. The central question is how an answer is made to be true, where I find that the more other answers an answer links to, the higher its indegree. In fact, when the measure of outdegree is expanded to include the resources which resources themselves link to, this association is higher still. While referring to users doesn’t matter in itself, referring to users’ posts does, as does having more code. The best predictor of indegree is the number of users linked to. I also find that although more modalised answers matter more, black-boxed answers which are not modalised at all matter the most. Finally, while more resources make an answer more likely to be edited, they do not make it more likely to be defeated and deleted entirely.
But Stack Overflow is also (and primarily) made of questions to which these answers respond, and questions are less addressed by existing theory. The basic conflict is between whether questions are just transparent references to answers, or whether they make their own independent contributions to knowledge. Having established that the network of questions is structurally distinct from the network of answers, I find that questions’ indegree and outdegree are also correlated, but less so than for answers. I find that linking to answers matters, especially when the appropriateness of the answers to the question is taken into account. However, the contributions that questions make independently of their answers, topicality and uniqueness, do not generally matter. Like answers, questions can also be closed rather than answered, and I find that questions that are closed for different reasons have different patterns of answering.
In the final chapter, I consider how the techno-social infrastructure of the site encourages the creation of knowledge. The chapter is spent studying the existence and nature of a Matthew Effect on Stack Overflow, going beyond the results for the distribution of reputation to examine the mechanisms involved in assigning reputation. Both the distributions of posts’ upvotes and the attachment kernel indicate a rich-get-richer effect, but they suggest that the effect is not proportional as in common models of the phenomenon. However, it also seems like the reason for attributing more upvotes to users with higher reputation is not solely due to their higher reputation: a regression discontinuity design on displayed user reputation produces a null result, suggesting that the reputation-rich are good, and the good get richer.
- Obeng_columbia_0054D_14306.pdf application/pdf 2.15 MB Download File
More About This Work
- Academic Units
- Thesis Advisors
- Bearman, Peter Shawn
- Ph.D., Columbia University
- Published Here
- November 10, 2017