Access to data

Access to data

Access to data means that you determine who you make your data available for, how you provide access, and under which conditions.

Access to data

Conducting research is often a team effort. Even before collecting the data, it is important to consider who will get access to the data, under which conditions and what permissions they will have.

If your data are personal, confidential, or contain copyrighted material, you have both a legal and ethical obligation to make sure that only the research project members can access the data during data collection and processing.

What about sensitive data?

Sensitive data can be FAIR without being open. The FAIRness is made by a clear description on how access to the data can be granted e.g. for research purposes.   

A lot of research is based on sensitive personal data, data protected by IPR (Intellectual Property Rights) agreements or confidential data. This means that access to the data must be managed and restricted.  

Example 1

One solution is to store your data on a file share platform with backup and a strictly controlled access. Christian Andreas Schultz, Department of Politics and Society, Aalborg University -- from the ISSP project tells more.

What other options do you have for providing access to sensitive personal data?

While you work, access to sensitive data is restricted to the researchers conducting the project.

 

To share sensitive data with others, you can anonymise (change to impersonal ID's) or de-identify (remove ID's) them - but there are many problems with this approach: 

image description

The first problem...

is that anonymisation is not trivial and may even be impossible. For instance, if you have a CT scan of the head of a patient others might be able to make a surface reconstruction that in some cases could break the intended anonymisation if the patient has some specific anomalies. 
If you can’t anonymise, you can’t directly give access. 
 

image description

The second problem...

is that it is not possible to add new data to anonymised patient information. So, if, 5 years later, you observe that the patient is still alive, you cannot just add that piece of information. 
This limits the reusability of the data.
 

image description

The classical way...

to work with data from different institutes is to pool them on a central server, which hosts the analysis tools, creating a data lake. You collect the data, you perform your analysis and then you effectively throw them away. 
The data are often not reusable and therefore not perfectly FAIR.
 

Example 2

How do you make your sensitive data re-usable by others while you work? Carsten Brink explains one possible solution.

Distributed learning

Within Carsten Brink’s research area, researchers from all over the world work together. To predict what outcome a treatment for their patients will have, they develop a model. They can base this model on the patient data at their own hospital, but in some cases, the model will require more data than their own hospital can provide. In this case, they can send their model from institute to institute, collecting results along the way. This is called Distributed Learning.

Now they can analyse a large pool of results without moving sensitive data. 

 

Depending on the size of the project, distributed learning can be a FAIR practice to handle sensitive personal data responsibly, but it does require a substantial overhead of mapping data to a standard format. 

Learn more about distributed learning here

Example 3

image description

In Carsten Brink’s case, at the end of the research project the data will stay at his institute so that new data can be added all the time.

Other researchers can apply for access, and, if approved, they can send a model to analyse the local data. They will never get physical access to the raw data.   

Example 4

In Nikola Vasiljević’s case the data are not sensitive and, due to progressivism from Vestas Wind systems A/S, the data are not even confidential to protect commercial interests.

Nikola Vasiljević can share his data openly. This is how he made sure the access to his non-sensitive data would be FAIR: For the wind turbine wake project, the experimental data and associated metadata, will be uploaded to a DTU data repository that allows the scientist to give other people access to the data or to the metadata of the data. In some scientific projects, it is agreed not to share data openly. Instead, a comprehensive metadata record can explain the data and potential access.

Where else can you preserve your research data?

Many institutions and public organisations have established data repositories, where scientists can upload and share their data. Projects funded by the European Union are expected to make generated data or their metadata available to the public - for example through data repositories.

To search for a suitable repository for your research data you can visit re3data.org, which is a global registry of research data repositories from different academic disciplines, FAIRsharing, which allows you to discover databases grouped by domain, species or organization, or check the links page to find more resources on data repositories.