Using Machine Learning to improve Internet Privacy

Sebastian Zimmeck

Using Machine Learning to improve Internet Privacy
Zimmeck, Sebastian
Thesis Advisor(s):
Bellovin, Steven Michael
Ph.D., Columbia University
Computer Science
Persistent URL:
Internet privacy lacks transparency, choice, quantifiability, and accountability, especially, as the deployment of machine learning technologies becomes mainstream. However, these technologies can be both privacy-invasive as well as privacy-protective. This dissertation advances the thesis that machine learning can be used for purposes of improving Internet privacy. Starting with a case study that shows how the potential of a social network to learn ethnicity and gender of its users from geotags can be estimated, various strands of machine learning technologies to further privacy are explored. While the quantification of privacy is the subject of well-known privacy metrics, such as k-anonymity or differential privacy, I discuss how some of those metrics can be leveraged in tandem with machine learning algorithms for purposes of quantifying the privacy-invasiveness of data collection practices. Further, I demonstrate how the current notice-and-choice paradigm can be realized by automatic machine learning privacy policy analysis. The implemented system notifies users efficiently and accurately on applicable data practices. Further, by analyzing software data flows users are enabled to compare actual to described data practices and regulators can enforce those at scale. The emerging cross-device tracking practices of ad networks, analytics companies, and others can be supplemented by machine learning technologies as well to notify users of privacy practices across devices and give them the choice they are entitled to by law. Ultimately, cross-device tracking is a harbinger of the emerging Internet of Things, for which I envision intelligent personal assistants that help users navigating through the increasing complexity of privacy notices and choices.
Computer security
Machine learning
Internet--Law and legislation
Computer security--Law and legislation
Internet--Security measures
Computer science
Item views
text | xml
Suggested Citation:
Sebastian Zimmeck, , Using Machine Learning to improve Internet Privacy, Columbia University Academic Commons, .

Columbia University Libraries | Policies | FAQ