A traffic jam in Chicago appears on the map on screen, traced in a familiar red line. To the end user, it’s a sign of gridlock, which warns of a grueling commute. To the company aggregating the mobile data, the grid is locking in on much more.
It may seem simple, but every tidbit gathered by a big data company delivers more information than most people expect. The traffic grid on Google Maps is made possible by hundreds of thousands of pieces of data, collected from individual people at different times throughout the day. When aggregated, the data can form a full picture that reveals personal aspects of each person’s life.
With funding from the National Science Foundation, Rader is studying how people’s beliefs, values and attitudes about their personal data affect the way they navigate digital privacy norms. Her research takes a closer look at derived data, or automated guesses and predictions about people that are made possible by combining data collected over time across the multiple online apps and platforms people use.
“The same data can also be aggregated to reveal information about people’s travel patterns and used to make inferences about people over time,” said Emilee Rader, Ph.D., associate professor of Media & Information. “Where they live, where they work, where they go to church, where their children go to school, even how much money they make, guessed based on general income information about people in their zip code.”
Derived data is what allows Google Maps to work so effortlessly, as traffic and highway congestion is mapped in real time. However, there may be drawbacks from the perspective of some users.
“Even though people may be aware of the benefits they’re getting from technology, the data they are providing can essentially be used to make harmful inferences about people,” said Rader.
The Dilemma at the Intersection of Value and Data
Data privacy holds value for technology users. Rader suggests that data privacy is essential, allowing for autonomy, creativity, development, well-being, mental health and liberty.
People know they should be concerned about data privacy. However, when confronted with terms of agreement to access a new app or device, they quickly consent without reading the fine print.
“People often say that that doesn’t stop them from buying these devices,” said Rader. “Privacy, then, is assumed not to be important to people based on these behaviors.”
Therein lies the catch.
Until now, digital privacy has relied on the self-management model. The model assumes people won’t use services that have privacy policies they don’t like, and companies won’t violate their own privacy policies because of legal or regulatory consequences.
This has proven risky for consumers, as more and more companies appear in the news due to data breaches or policy violations. Whether by Amazon Ring’s camera use or Facebook’s data harvesting, companies are gleaning more personal data than many people realize.
“The self-management model for privacy doesn’t work,” said Rader. “The logic of self-management basically makes it okay for companies to blame the user if something happens that they don’t like.”
While many people agree to be monitored by Google Maps so they can track the route to their destination, they are not necessarily consenting to share derived data from their activities—at least not knowingly.
In earlier studies, Rader found that people are only aware of data collection in three instances: it’s visible, it relates to the purpose of the technology, and/or it makes sense based on the technology’s function. People were often unaware that assumptions could be made based on raw data, leading to complex sets of derived data about their personal lives.
Emilee Rader, Ph.D., Associate Professor of Media & Information
Simple data, such as how much time a person slept as recorded in a fitness app, could lead to derived data about how many times the person overslept in one week. The number of voices in a room, as heard by Alexa, could lead to derived data about household size and when the children go to school. The Facebook Events where a person RSVPs could lead to derived data about political affiliation or party membership.
Delivering Derived Data to the User
The raw data being collected is only the tip of the iceberg. Researchers in the BITLab are trying to help people understand how this data can be used to make assumptions and predictions about their future choices and behaviors, in ways they may not like.
One solution is under development at ComArtSci. In the Behavior, Information & Technology Lab (BITLab), students are developing a prototype that would make derived data more visible to the user. They are building upon the API of the Automatic Smart Driving Assistant, a device used to record driving data, and developing a web app with consumer education in mind.
“We want to collect data from a group of people using the device, and share that data with them in a way that is explicit about all the things that are happening behind the scenes,” said Joe Freedman, a master’s student in computer science. “The goal is to educate people about the behind-the-scenes data that’s being used and the processing that’s happening without their consent.”
In a second project, students are also taking a closer look at how people are categorized by Facebook and Google to target content and advertising to them.
“We’re using an app to parse Facebook and Google for ad inferences, so we can separate those categories out and view the categories independently,” said Anjali Munasinghe, a junior in computer science.
The researchers plan to show people this information, which makes up a kind of hidden profile, and discuss how well they think the data represents them as a person.
“The goal of this is to understand how people think about these categories and where they come from,” said Rader. “By understanding this, we will have a better idea of what kinds of inferences people were already aware of, and what they are uncomfortable with. This will help us understand new norms that are forming for data inferences and digital privacy.”
Working with the student researchers, Rader has identified challenges and opportunities related to data privacy. She aims to design systems that empower people to create an enforce their own norms. Ultimately, Rader hopes to find a new approach to managing data privacy—one that would allow people to have a say in how their own data is collected and used.
“Privacy is not dead,” she said. “We need a new way to think about managing privacy.”
This material is based upon work supported by the National Science Foundation under Grant No. CNS-1524296.
By Melissa Priebe