How useful is the content we create online?
By Dr Peter Mooney, Department of Computer Science
The Brainstorm long read: we generate a massive amount of online content from Facebook posts to YouTube videos, but it's nearly impossible for us to manage the privacy of this data 
 

In recent times, scandals such as the Facebook Cambridge Analytica story have brought people’s online privacy to the forefront of public attention. The ever-expanding usage of social media throughout everyday life offers a tantalisingly rich data resource for scientists and industry. Data analysts and scientists are using this so called "Big Data" to find out what our shopping preferences are, our health and wellbeing concerns, how we move around cities, our online consumption behaviour for products such as food, movies and TV and how we form and maintain social friendships.

User-generated content (UGC) is any data or content which we create and purposely make available online. The overwhelmingly most popular source of such UGC as text, links, images, videos, sound etc is social media. UGC covers Facebook posts, tweets on Twitter, online videos posted to video sharing websites such as YouTube, shared photographs on Instagram or Flickr, text written to message boards and forums, comments or ratings on websites such as TripAdvisor or comments at the end of online news articles such as those found on the RTE website.

This type of content is very useful for scientific research and analysis. One of the most popular approaches in this area of research is a technique called data aggregation. Data aggregation is a very powerful scientific approach and an easy concept to explain.

We’ve all had that feeling when we put two or three pieces of seemingly different information together about a person. As a result, we find out something completely unexpected or revealing. We put "two and two together" and gain some new insight into that person.

This is how data aggregation works in the Big Data world. People make rational decisions about which UGC they generate in isolation online. However, most people naturally struggle to factor in how their data might be aggregated with other UGC data in the future. For example, how their online photographs might be combined with their posts, tweets or messages on online message boards. Putting "two and two together" from different sources of data may lead to revealing sensitive facts about people. Modern data analytics has the potential to deduce extensive information about people from these individual clues.

Unfortunately, the difficulty with this aggregation effect is that it makes it nearly impossible for us to manage the privacy of our own UGC data. As popularised in TV shows such as Criminal Minds, it has always been possible to combine various pieces of information to learn something new about a person. However, the power and reach of data aggregation is much greater in this digital age.

So how are we online users protected? There are two principal ways in which online users are being protected. Firstly, there are ethical practices which surround the types of access that researchers and scientists have to this data and what research can subsequently be performed. Ethicists (those who study ethics) are constantly studying modern-day innovations (drones, genetics, smartphones, etc) to ensure ethical thinking and arguments are up-to-date. Universities and research bodies usually have ethics boards and committees which try to offer a guiding hand to ensure that academic integrity is maintained, and that people’s civil liberties and privacy are not diminished or damaged in research.

Secondly, protection is provided by legal frameworks. In May 2018, the General Data Protection Regulation (GDPR) came into force. This is a pro-user facing regulation which is very much focused on protecting people’s data online including that of UGC. GDPR specifies that organisations must allow processing of data only for specific, explicit, and legitimate purposes. Any processing of data should be relevant to a company or organisation’s core business.  Crucially, GDPR states that data must be accurate and kept up to date with reasonable effort but kept in a form that does not permit identification. Organisations and companies risk huge fines if they do not undertake measures to implement GDPR. 
As users of the Internet and generators of UGC, we must always be aware of the content we are generating and should take steps towards managing our online privacy. We cannot know how UGC generated today could bring about privacy issues when linked to other sources of data in the future. It is impossible to guard completely against these types of privacy violations in the future. One such example comes from the increased popularity of Unmanned Aerial Vehicles (UAVs) or drones. Privacy violations can occur through the collection of information including random people on the ground simply due to the drone constantly video recording while flying. 

Today, we expect real-time, up-to-date, personalised experiences of our online experiences. Unfortunately, giving up one’s digital privacy in exchange for these experiences is often the default ‘free charge’. Privacy has become something that online consumers use as form of payment for these experiences.

Should people be afraid or worried about their habits and behaviours in generating UGC?  The issues outlined in this article should not prevent people enjoying and engaging in UGC in safe, responsible and legal ways. Privacy self-management and common-sense should be exercised. People should ask themselves the following question when they are knowingly generating UGC: "am I comfortable with the possibility of this content or data potentially being used by third parties (scientists, analysts, companies, etc.) now or in the future?"