Sikt’s research data archive is committed to ensuring that research data archived meets the criteria defined in FAIR. This is done through workflow compliance with the OAIS-model. Implementation of DDI-Lifecycle as a metadata standard for describing data. And a modern infrastructure built on cloud technology that ensures user friendliness, availability, authenticity and persistence.

The research data archive in Sikt believes that the highest level of FAIR is achieved through actual reuse of archived data, rather than measured programmatically through evaluation of minimal generic metadata in a specific format.

The research data archive in Sikt emphasizes reuse within publications, master & bachelor theses and educational programs. Additionality we strive to make reuse of research data possible to the public and private sector, through government officials at state, county & municipal level, private analysis companies, journalists and citizen science.  

Sikt’s research data archive operates with 5 curation levels. The highest level of FAIR is achieved for level 1 and 2. The lower levels of curation still comply with the criteria of FAIR, but at a lower level.

A manual examination of the FAIRness of Sikt’s Research Data Archive (referred to as “Sikt Archive”) in compliance with FAIRsFAIR Data Object Assessment Metrics (Devaraju, A., et al. 2022) (referred to as “FDOA Metrics”).

OVERALL FAIR Level: Advanced

The archive provides the ability to search and discover metadata at the variable level for most of the archive’s holding (Curation level 1 & 2). However, the descriptive metadata elements covered by the FDOA Metrics are limited to the minimum requirements of the archive and do not include variable level descriptive metadata elements.

Findability for humans

Based on the type of research data and provided information from the depositor, metadata is registered from data collections to variables. This is made available for search, analysis and discovery in Surveybanken, our dissemination service.

Findability for machines

Research data archived with Sikt’s research data archive is given a globally unique identifier that locates the research data within our catalogue. All metadata is available through API and OAI-PMH for harvesting and extraction

Details in FAIRsFAIR Data Object Assessment Metrics

All information on access defined by the metrics is within the archives metadata. But not all information is exposed due to security reasons.

Accessible for humans

All research data are outfitted with access conditions that specify who and how one can get access to the data. Most of the data in Sikt’s research data archive is restricted to specific groups, primarily researchers and students at NRC or Eurostat approved research institutions.

Accessible for machines

All metadata and research data are available in open formats and follow standardized communication protocols. Metadata is openly available, but research data requires authorization before access can be given.

Details in FAIRsFAIR Data Object Assessment Metrics

The archive converts all relevant (as defined in CESSDA Metadata Model) information about the research data (from project to code values) to machine readable metadata. This procedure links all information objects together to a single digital object.

The archive has actively chosen not to embed the metadata in landing pages due to performance and security issues. DDI-L metadata of our records can exceed 50 000 lines, and the total amount of metadata objects in the archive surpass 10 000 000. The archive could opt to present fragments of our DDI, but this would not hold all information relevant due to the reference structure of DDI-L.

Additionally, the archive believes that no end user can make sense of this data without communication with the archive. We would open access to any inquiries.

Interoperable for humans

Ingested research data, metadata and related resources are combined into a single metadata record. Data and metadata are converted to open formats. Which are monitored and migrated to newer formats when deemed necessary. When requests are received from end users, metadata is provided in open format, and data converted to open or proprietary format based on needs.

Interoperable for machines

Research data and metadata are available in open formats that can be read by machines. Controlled vocabularies are implemented where relevant and applicable to enhance semantic categorization of the research data.

Details in FAIRsFAIR Data Object Assessment Metrics

The archive implements the most comprehensive metadata model, DDI-L, for describing the research data it holds. This is done through adoption of the CESSDA Metadata Model, the acknowledged best practice for research data relevant to the archive. These give end users, machines and humans, necessary information to understand, evaluate and reuse the research data.  

Most of the holdings of the archive are access controlled and cannot be regulated by licenses for reuse as those listed in SPDX. The archive operates with bespoke licenses which are enforced through contractual agreements with end users. For the open data the archive holds, SPDX licenses should be implemented in the future.

Reusable for humans

At curation level 1 and 2 the research data is described with the best metadata standard, DDI, for the kind of research data at this level. Sikt’s research data archive follows the recommendations for implementation of DDI specified in the CESSDA Metadata Model. Additionally, at this curation level quality control of research data is carried out to ensure completeness and understandability, from project level to code values.

Reusable for machines

Data descriptors, provenance and access are all registered in the metadata of the research data. This enables machines to make sense of the research data provided by the archive. Persistent IDs on all metadata elements and datafiles combined with Controlled Vocabularies enable machine actionability.

Details in FAIRsFAIR Data Object Assessment Metrics

FAIR scoring details

Detailed report of the manual examination of the FAIRness of Sikt’s Research Data Archive against FAIRsFAIR Data Object Assessment Metrics (Devaraju, A., et al. 2022). 

Findable

FsF-F1-01D – Data is assigned a globally unique identifier

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 1

The archive implements DOI as it’s globally unique identifier. DOIs are minted for the DDI object StudyUnit, in research terms, the research project. A StudyUnit can consist of one or more research data objects.

FsF-F1-01D – 1 Identifier is resolvable and follows a defined unique identifier syntax (IRI, URL)

All our minted DOIs are resolvable, this is checked automatically every 30 minutes through API communication with DataCite from our DOI publication service. If any DOI present in our metadata is not resolvable, the archive is alerted and corrects eventual errors.

FsF-F1-01D-2 Identifier is not resolvable but follows an UUID or HASH type syntax

Not relevant

FsF-F1-02D – Data is assigned a persistent identifier

As the archive implements DOI as its PID it complies with criteria for both a persistent and unique identifier defined in the metrics. The persistence of the archive’s data was proved when the archive migrated from our legacy systems to our new research data platform. The DOIs minted in our legacy system now resolve to the same digital object in our new research data platform.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 1

FsF-F1-02D-1 Identifier follows a defined identifier syntax

The archive implements the DOIs as recommended by Datacite, using the archive prefix in the URL supplemented by the archive identifier as a suffix.

FsF-F1-02D-2 Persistent identifier is resolvable

See FsF-F1-01D-1

FsF-F2-01M – Metadata includes descriptive core elements (creator, title, data identifier, publisher, publication date, summary and keywords) to support data findability.

The metadata elements defined are the absolute minimum requirement of the metadata that the archive register for their digital objects.

Self-assessed FAIR level: 2.5 of 3

Self-assessed Score: 1.5 of 2

FsF-F2-01M-1 Metadata has been made available via common web methods

The archive does not provide its metadata in the specified formats from the metric. The primary reason for this is that the amount of metadata we register for our digital objects is too large to be stored within these formats. We use JavaScript to enable our webpages to process the information. All metadata is available through our API and OAI-PMH for those that are interested in accessing it programmatically.

FsF-F2-01M-2 Core data citation metadata is available

All specified elements from the metric are present in our metadata for all our digital objects

FsF-F2-01M-3 Core descriptive metadata is available

All specified elements from the metric are present in our metadata for all our digital objects

FsF-F3-01M – Metadata includes the identifier of the data it describes

For our highly curated data (level 1 and 2) we provide identifiers (physicalInstanceReference) to the data, and the contents of the data (variables) that uniquely identifies them internally in the archive. The landing page of our digital objects provides information on access and a link to direct download, or to an order form where it is applicable.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1

FsF-F3-01M-1 Metadata contains data content related information (file name, size, type)

File name is present in metadata and given to files on delivery. Type is irrelevant to specify since the end user specifies type, and the archive automatically converts to preferred format. Size information is stored internally since it is not deemed relevant for end users for the research data the archive disseminates.

FsF-F3-01M-2 Metadata contains a PID or URL which indicates the location of the downloadable data content.

Most of the archives research data is access controlled and for obvious reasons the URL to the research data is then not exposed. The PID to the research data is present in the metadata, which is used to communicate with our data delivery service. All landing pages include a URL to our data ordering system, but this is not provided in the metadata.

FsF-F4-01M – Metadata is offered in such a way that it can be retrieved programmatically

All the archive’s metadata is available through the archive’s API or OAI in DDI and Dublin Core format. When applicable DDI elements are chosen over Dublin Core elements since it gives a better description of the research data the archive holds. Mapping from DDI to Dublin Core is done where relevant and possible. We do not aggregate our detailed DDI descriptions to Dublin Core, since we do not see that this adds any value.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 2 of 2

FsF-F4-01M-1 Metadata is given in a way major search engines can ingest it for their catalogues (JSON-LD, Dublin Core, RDFa)

The archive provides some Dublin Core metadata (as mentioned above). But the archives focus is not ingestion into other data catalogues, but to enable discovery through Google.

FsF-F4-01M-2 Metadata is registered in major research data registries (DataCite)

We use DataCite as our PID provider through DOI, so naturally they are registered there.

Findable – result and comment

Self-assessed FAIR-Level: Advanced

Self-assessed Total Score: 6.5 of 7

The archive provides the ability to search and discover metadata at the variable level for most of the archive’s holding (Curation level 1 & 2). These descriptive metadata elements are not captured in the FAIR scoring for findability at all as the descriptive metadata elements covered are just the minimum requirements of the archive.

Accessible

FsF-A1-01M – Metadata contains access level and access conditions of the data

Both data and metadata hold access information. Study level metadata is always accessible, but variable level metadata can be withheld for IP reasons.

Self-assessed FAIR level: 3 of 3

Self-assessed score: 1 of 1

FsF-A1-01M-1 Information about access restrictions or rights can be identified in metadata

Information is present in elements defined by the CESSDA Metadata Model, which is the recommended practice for the archive. Additionally internal archive specific access information is present and available in the metadata.

FsF-A1-01M-3 Data access information is indicated by (not machine readable) standard terms

The archive implements 9 levels of access in human readable format. Information on personal identifiable data is specific for the research data, and general information is provided in the metadata. Specific information is given on request.

FsF-A1-01M-2 Data access information is machine readable

The access information specified guides our access systems, so practical machine readability is proven.

FsF-A1-03D – Data is accessible through a standardized communication protocol

Given the correct permissions of the end user, machine or human, data is accessible through https

Self-assessed FAIR level 3 of 3

Self-assessed Score: 1 of 1

FsF-A1-03D-1 Metadata includes a resolvable link to data based on standardized web communication protocols.

This is not the case since a majority of the research data held by the archive is restricted access. Exposure of such links in metadata poses a security risk. Resolvable links are given after authentication of the user.

FsF-A1-02M – Metadata is accessible through a standardized communication protocol

Metadata is available through HTTPS

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 1

Accessible – result and comment

Self-assessed FAIR level: Advanced

Self-assessed Total Score: 3 of 3

All information on access defined by the metrics is within the archives metadata. But not all information is exposed due to security reasons.

Interoperable

FsF-I1-01M – Metadata is represented using a formal knowledge representation language

The metadata is represented through OAI endpoints in DDI and DC formats but not directly available at the landing pages of the studies due to size and performance concerns.

Self-assessed FAIR level: 1 of 3

Self-assessed Score 1 of 2

FsF-I1-01M-1 Parsable, structured metadata (JSON-LD, RDFa) is embedded in the landing page XHTML/HTML code

The archive has chosen not to make this metadata available in the landing pages due to performance concerns for the end user. The raw text file containing metadata for the archive is in the range of 10-30MBs depending on number of variables. JSON can be extracted from the repository, but require special permission, since the JSON files can be close to 300MB due to how DDI-Lifecycle references are handled.

FsF-I1-01M-2 Parsable, graph data (RDF, JSON-LD) is accessible through content negotiation, typed links or sparql endpoint

Information is available through API and OAI, but JSON is restricted due to the amount of metadata the archive registers.

FsF-I2-01M – Metadata uses semantic resources

Controlled vocabulary and standards are implemented to an extensive degree in the metadata of the archive. These include DDI Alliance, CESSDA, ISO and archive specific vocabularies. Information on vocabulary, version and identifier, is present in the metadata.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 1

FsF-I2-01M-1 Vocabulary namespace URIs can be identified in the metadata

As mentioned above, they are present

FsF-i2-01M-2 Namespaces of known semantic resources can be identified in metadata

As mentioned above, known semantic resources are implemented.

FsF-I3-01M – Metadata includes links between the data and its related entities

All records in the archive have related materials and resources registered in their metadata, both with URL (internally maintained) or DOI and an internal identifier for the object. Additionally, records are grouped to data collections where relevant. Both data files and variables are linked to the metadata through identifiers. Metadata is also present at this level.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 1

FsF-I3-01M-1 Related resources are explicitly mentioned in metadata

They are as mentioned above

FsF-I3-01M-2 Related resources are indicated by machine readable links or identifiers

They are given internal identifiers that are resolvable within our metadata repository. Some have external identifiers, DOI, which are resolved externally. Where DOI does not exist for the resource, the archive creates URLs for the resource (where relevant) and maintains these. But not the same commitment for persistence as DOIs.

Self-assessed FAIR-level: Advanced

Self-assessed Total Score: 3 of 4

The archive has actively chosen not to embed the metadata in our landing pages. The primary concern is security and performance issues when roughly 50 000 lines of metadata are available. The main reason for large metadata is due to how DDI-Lifecycle is structured by the use of references. We could opt to present fragments of our DDI, but this would not hold all information relevant. Instead, references would be included, but these references can only be resolved within the archives metadata repository by design. So, in order to present actual useful metadata, we need to include all resolved references which then in turn result in a large record.

Additionally, the archive believes that no end user can make sense of this data without communication with the archive. We would open access to any inquiries.

Reusable

FsF-R1-01MD – Metadata specifies the content of the data

For the archive’s highest curated (level 1 and 2) metadata is registered about the data file, variables, questions and code values. This documentation follows the recommendations from the CESSDA Metadata Model for describing social science data through the use of DDI.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 3 of 4

FsF-R1-01MD-1 Minimal information about available data content is given in metadata

  1. Resource type (e.g. dataset) is given in metadata

This is specified through controlled vocabulary

  • Information about data content (e.g. links) is given in metadata

All relevant information about data content is given in metadata (for level 1 & 2), the lowest level is code value. We do not provide metadata for cell values.

FsF-R1-01MD-2 Verifiable data descriptors (file info, measured variables or observation types) are specified in metadata

  1. File size and type information are specified in metadata

The archive holds this information internally but does not deem it as relevant to expose to the end user. The archive uses metadata in DDI, and parquet data format, to automatically transform the files to the specified format by the end user.

  • Measured variables or observation types are specified in metadata

All variables have metadata about them in alignment with the CESSDA Metadata Model. Additionally, archive-specific metadata is implemented to enable online analysis, still aligning with the DDI-Lifecycle metadata standard where applicable.

FsF-R1-01MD-3 Data content matches file type and size specified in metadata

As the archive does not present this externally it is not possible to check. But internal systems are in place to check the validity of digital objects held. All data files, variables and code lists are registered with internally unique identifiers.

FsF-R1-01-MD-4 Data content matches measured variables or observation types specified in metadata

The archive ensures match through identifiers given to data files, variables, questions and code lists. The information presented at the landing page of the research data object is the same as that is used when creating the actual data file that is delivered to the end users.

FsF-R1.1-01M – Metadata includes license information under which data can be reused

Due to the fact that the majority of the archives holdings are under restricted access, they are suited with a bespoke archive specific license not registered within SPDX. For open data SPDX licenses can be chosen, but is not mandatory for depositors.

Use conditions are presented elsewhere and are actively enforced by the archive through User agreement with conditions of use and declaration of secrecy.

Self-assessed FAIR level: 0 of 3

Self-assessed Score: 0 of 2

FsF-R1.1-01M-1 License information is given in an appropriate metadata element

For the small amount of studies that actually have license within the archive, this information is present in relevant license element.

FsF-R1.1-01M-2 Recognized license is valid (community specific or registered at SPDX)

A small amount of records holds SPDX licenses, but majority are not equipped with license.

FsF-R1.2-01M – Metadata includes provenance information about data creation or generation

All minimum requirements from the metrics are present in the metadata. But information on some modifications to the data are withheld due to disclosure risks. In the workflow of the archive a lot of edits are done to both the data and metadata after deposits. All of these edits are communicated and discussed with the depositor and are registered in internal systems. These edits go beyond the requirements specified in the metrics. Edits can be provided upon request, as long as there are no disclosure risks. Additionally, original SIPs can be provided, as long as it does not contain data that violates regulations.

The edits, and recording of these, do not follow formal provenance ontologies such as PROV-O. Although the archive does not use PROV-O, the archive assesses their provenance level to be at an advanced FAIR level as the edits can be mapped to SDTL

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 2

FsF-R1.2-01M-1 Metadata contains elements which hold provenance information and can be mapped to PROV

Information provided for the minimum requirements can be mapped to PROV-O.

FsF-R1.2-01M-2 Metadata contains provenance information using formal provenance ontologies (PROV-O)

This is not the case for the archive.

FsF-R1.3-01M – Metadata follows a standard recommended by the target research community of the data

The archive implements the DDI-Lifecycle metadata standard for describing the research data. This is acknowledged as the best metadata standard for describing research data from social, behavioral and economic sciences. These domains are the primary disciplines of the archives holdings.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 1

FsF-R1.3-01M-1 Community specific metadata standard is detected using namespaces or schemas found in provided metadata or metadata services output

This is present. DDI-Lifecycle

FsF-R1.3-01M-2 Community specific metadata standard is listed in the re3data record of the responsible repository

This is present.

FsF-R1.3-01M-3 Multidisciplinary but community endorsed metadata (RDA Metadata Standards Catalog, fairsharing) standard is listed in the re3data record or detected by namespace.

This is present both places for DC.

FsF-R1-3-02D – Data is available in a file format recommended by the target research community

For the highest curation level (1 & 2) the archive automatically converts to requested formats from the end users. Either open format or proprietary. Listing these out is deemed unnecessary since they are created on request. For open download data, format can be requested through API for machines.

Self-assessed FAIR level: 3 of 3

Self-assessed Score: 1 of 1

FsF-R1.3-02D-1 The format of a data file given in the metadata is listed in the long term file formats, open file formats or scientific file formats controlled list

  1. The format of the data file is an open format

CSV is provided with HTML and XML for documentation

  • The format of the data file is a long term format

CSV, XML and HTML, is long term format

  • The format of the data file is a scientific format

Data can be provided in format for all statistical software.

Reusable – result and comment

Self-assessed FAIR Level: Moderate

Self-assessed Total Score: 6 of 10

As the archive does not have SPDX licenses for the majority of the data the level is set to moderate. The archive would argue that this is in conflict with the majority of the research data it holds. The majority of research data is restricted and governed by legislations and laws and cannot be suited with licenses found in SPDX. Instead bespoke archive-specific license is issued. But a simple string in one element is not enough to inform of the regulations specified by the license, so instead this information is held elsewhere than what is specified in the metrics.

Total Score

Total FAIR-Level: Advanced

Total Score: 18.5 of 24

Total percentage: 77%