Putting people at the heart of big data

Big data has given scientists – and companies – a treasure trove of new information for analysing, understanding and predicting human behaviour, but it’s also thrown up a raft of questions about privacy and ownership.

Our smartphone societies are inundated with new services, which sees us donating data in ways we seldom consider, with privacy implications that echo far beyond a light tap on the ‘Accept’ button. So how do we make sure that the data revolution benefits both individual people and the society we live in?

Fosca Giannotti, project coordinator of SoBigData, an open ecosystem for ‘ethic-sensitive scientific discoveries’, sees the need for alternatives that help avoid the concentration of big data in a few hands.

Big data, which she defines as ‘the mass digital traces of human activities, captured as our activities are mediated through IT services’, comes with both risks and benefits.

‘(People) are fascinated with using new services, thus donating our data to these — meaning that there are many new opportunities for scientists to study human behaviour,’ explained Giannotti. ‘On the other side, this data goes to companies, and there is the risk of data being centralised in bigger and bigger silos — e.g. with Google — creating imbalance between such owners and individuals.’

Novel questions

Giannotti, a research director at the Information Science and Technology Institute ‘Alessandro Faedo’ in Pisa, Italy, says opening up access to big data for analysis by non-specialists can help it be used for social good, for example, in examining topics like medical research, public transport and epidemics.

‘Many researchers are demonstrating how human mobility data like that from mobile phones can be used to indicate the health of a country. Or, in social debate, we can understand better what happened by analysing social media use during Brexit, looking at fake news or detecting bots,’ said Giannotti.

‘Our virtual environment will support non-experts in creating such experiments.’

SoBigData, which is short for the Social Mining & Big Data Ecosystem, is a safe virtual research environment that allows researchers, economists, decision makers, and innovators to ask novel questions of big data, ‘to fully unleash the power of big data analytics for all’.

The infrastructure offers access to facilities in the form of huge datasets, libraries of algorithms, and ready-to-use data toolkits provided by 12 European research institutions experienced in big data analytics.

Privacy

But the project’s focus on ethics has another form.

It draws on the expertise of data scientists to help transform research questions into big data analytical processes which are based on the concept of ‘privacy-by-design’ — posing the appropriate legal and ethical questions that a data scientist must ask themselves right from the beginning. And it’s coming just in time.

Late May will bring about the introduction of the General Data Protection Regulation (GDPR), a new EU law which was created to govern personal data protection in Europe.

The idea of the GDPR is to give people greater control of their data, by allowing them to know what data organisations hold about them and how it is used, as well as making it easy to change permissions.

An area that will be heavily affected by the changes this will bring is the biomedical sector — one of the reasons why the work of a project called My Health My Data (MHMD) may likely prove very useful.

MHMD coordinator Professor Edwin Morley-Fletcher, president of e-health consultancy Lynkeus, says the aim is to design a network that gives people full control of their personal healthcare data.

The project, which is due to finish next year, would complement hospital data systems with an open biomedical information interface allowing hospitals, researchers, and businesses to use de-identified data for open research, at the same time as letting patients manage their personal data account from an electronic device.

‘With the GDPR coming, whenever you deal with re-identifiable data, the privacy of the data subjects must be strongly guaranteed,’ he said. ‘Normally, all hospitals have systemised data that can be traced back to the patients, which implies a strong need to have direct consent from the patient, and the capacity to full traceability of it, to know what happens with the data.’

‘It’s an aspect of empowerment, the democratisation of data in a sense.’

Edwin Morley-Fletcher, President, Lynkeus

Blockchain

This is why MHMD’s project leaders decided to make a distributed, peer-to-peer network based on blockchain, which is essentially a decentralised digital ledger system. It creates a secure management layer for encrypted and anonymised data, opening it up for shared use while ensuring the privacy of the patient.

Moreover, the possibility of ‘smart contracts’ in certain types of blockchain means that patients can set and update the consent conditions controlling how their data are used, with these contracts automatically dictating how the data can be accessed or re-used in any given circumstance.

‘It’s an aspect of empowerment, the democratisation of data in a sense,’ said Prof. Morley-Fletcher. ‘The goal is to make it as frictionless as possible, with no bureaucracy, so that hospital data controllers and individuals can make clear decisions on what happens with data.’

The approaches of MHMD and SoBigData align with Europe’s vision of a shared online repository making all data from publicly funded research available for all — the European Open Science Cloud.

Late 2017’s European Open Science Cloud event in Brussels made it clear that the EU would like to see this science cloud become a reality by 2020, and around €272 million of the Horizon 2020 budget for 2018-2020 is already earmarked for its implementation.

Originally published on Horizon