Metadata are data about data. Research data need metadata to become findable, accessible, interoperable and reusable - by humans and machines.
Metadata are data about data. They play an important role in making your data FAIR. Metadata have to be added continuously to your research data, not just at the beginning or at the end of a project. Metadata can be added manually or automatically, and preferably according to a disciplinary standard. From a FAIR perspective, metadata are more important than your data, because metadata would always be openly available and they link research data and publications in the Internet of FAIR Data and Services. The distinction between data and metadata is not ontological, but it is grounded in use. What is “data” and what is “metadata” is thereby a matter of perspective: Some researchers’ metadata can be other researchers’ data.
While data documentation is meant to be read and understood by humans, metadata (which are sometimes a part of the documentation) are primarily meant to be processed by machines.
The ISSP project uses a standardised codebook and a translation protocol to ensure comparability of the different national surveys. The codebook contains structural metadata for the background variables used in the national surveys, e.g. variable name, measurement goals, variable definition, example questions, and so on.
The Language Technology Group has published their dataset, the Danish Parliament Corpus (2009-2017), in CLARIN-DK, a repository for language-based, textual data. CLARIN-DK is the Danish part of CLARIN ERIC, a European research infrastructure for the Humanities. During the data submission process, CLARIN-DK guides the user to systematically upload relevant administrative, descriptive, and structural metadata together with the dataset.
Here you can see screenshots of the data submission pages of CLARIN-DK, where the three types of metadata are marked in different colours (light green = administrative metadata; yellow = descriptive metadata; dark green = structural metadata).
Here you can see a single plot from the Multi-lidar observations of the Vestas multi-rotor turbine wake project with its attached metadata.
The descriptive metadata (yellow) describe the plot for discovery and identification purposes and include elements such as title, abstract, author, and keywords.
The administrative metadata (light green) provide information to help manage the digital object, such as: information about when and how it was created, file type, licence and access rights. Finally, the data are provided with a project ID that informs about the relation to a project.
The structural metadata (dark green) specify the internal structure of the digital object, in this example: the plot variable names and units that define the relationship between the variables in the plot.
Here you can see screenshots of a database of radiography images, where the three types of metadata are marked in different colours (light green = administrative metadata; yellow = descriptive metadata; dark green = structural metadata).
You can see that the three types of metadata apply not only to the database, but also to individual images and sets of data.
Click on the screenshot to enlarge it.
The quality of your metadata has a huge impact on the reusability of your research data. It is best practice to use a discipline-specific metadata standard and/or an ontology commonly used in your field to describe your data. Some data repositories can help you in choosing the appropriate metadata standard for your data.
Have a look at the links page to get started.
All ISSP metadata are published in the technical reports together with the data sets on the ISSP website. The metadata are structured according to the Data Documentation Initiative, short DDI. The DDI provides a very detailed framework for structuring and describing survey data and other types of observational data in the Social Sciences.
Using an ontology helps others to understand the structure and content of your data, making your data searchable, interoperable and reusable. Carsten Brink talks about the importance of ontologies for his research.
Metadata standards often start as schemas developed by a particular research group or community to enable the best possible description of their data.
Since there was no metadata standard specific to his field, Nikola Vasiljević joined a team of researchers working on a taxonomy to describe collected data. The team based their taxonomy on the Dublin Core Metadata card, which contains 15 descriptive metadata entries. To make the taxonomy more useful within their field, they added 5 additional metadata entries: External Conditions, Activities, Instruments, Models, Materials.
A taxonomy was developed for several of the entries in the metadata card. When you catalogue your data with such metadata cards, search engines can explore the metadata cards and end-users can easily find the right dataset for their own research.
The available metadata subsets in TEI (Text Encoding Initiative) are not specifically suited for the annotation of parliamentary speeches and debates. Therefore, the research community of the Language Technology Group is currently collaborating to define a relevant metadata scheme for this purpose. Read more about this in the workshop description and proceedings of ParlaCLARIN-II.
Publishing your research data and metadata online provides you with an extra location for people to find your work. Even though your publications contain your results, your data may still not be findable. Metadata are machine-readable, and when they have a persistent identifier, search engines can easily find them.
To be FAIR, your data (and metadata) must have a findable persistent identifier. The persistent identifier is typically assigned when a digital resource is placed in a data repository.
Nikola Vasiljević plans to make both the research data and metadata from the wake simulation project open to all. He intends to publish the data in his university repository with a rich descriptive metadata record to make his data FAIR.
Descriptive metadata such as title and keywords are machine-readable and can make a data set easier to find. The link to publications, references and licences will also be part of the metadata record.
Have a look at Nikola's dataset in the institutional repository, DTU Data.
The Language Technology Group has published their their data, the Danish Parliament Corpus (2009-2017), in CLARIN-DK, a repository for language-based textual data. During the data submission process, CLARIN-DK guides the user to systematically upload relevant metadata together with the dataset.
Many researchers cannot openly publish their data. However, you can always publish rich metadata about your data. For sensitive data, publishing metadata provides you with a platform, where you can make clear under which conditions the data can be accessed and how they may be reused. Carsten Brink shares his thoughts.