Practical guidance for researchers on writing data sharing plans

Professor Kay-Tee Khaw

When you apply for a CRUK grant, we ask you to fill in a data sharing plan. These are some of the issues you may wish to consider as you complete this plan, and then as you seek to implement the planned activities in the course of your research.

This guidance should be read in conjunction with our policy on data sharing, which sets out our high-level expectations with which grantholders must comply. In this guidance, “data” means quantitative or qualitative information generated in the course of research, which could take the form of datasets, code and software, images or data generated through analysis of samples or biological materials.

To support good science and to make the most of our research investments.

We support research data management and sharing as a fundamental element of good research practice. In line with other funders in the UK and internationally, such as RCUK, Wellcome Trust and NIH, we view research data as a public good. Sharing data enhances the transparency, integrity and reproducibility of research.

As a publicly funded charity, we have an obligation to make the most of all the research we fund. This includes getting the maximum possible benefit out of the data our funded research generates, and reducing duplication and waste in research.

We are additionally motivated by the need to accelerate progress towards 3 out of 4 patients surviving cancer by 2034, as outlined in our Research Strategy. In order to achieve this ambitious aim, pre-existing data will need to be linked with newly generated data, within and across disciplines, in order to generate innovative new insights which can benefit patients.

This position is consistent with a number of international statements of principle outlined in the following documents:

As discussed above, we see data sharing as a route to maximising patient benefit from our research funding. However we recognise that research data management and sharing can be complex processes. Therefore a robust data sharing plan is a useful first step. The plan requires you to consider how you will manage data generated in the lifetime of your grant, and make data shareable with scientists throughout and beyond the grant cycle.

Writing the plan will encourage applicants to take data sharing seriously from the first stages of a grant proposal. It allows you to consider the resources you will need for data management and sharing (for which you can now cost in your proposals), to anticipate any challenges with data sharing and to consider potential solutions.

Table 1. Summary of the key terms of our policy on data sharing and preservation.

Issue

Our position

Data outputs covered

You should make all research data (unpublished data, -ve results, code etc.) available for sharing, provided it is safe and feasible to do so.

It is especially crucial to share data underlying publications, at the same time as, or very shortly after, publication.

Data sharing plan

Applicants have to submit a data sharing plan, which the relevant committee will assess. Progress with the plan will be subject to regular review.

Costs

Costs can be included in applications (see our costs guidance for details), provided these are well justified and reasonable in the context of your research.

Metadata

Community standards/minimum information guidelines are encouraged where available.

Time limits for data release

No later than acceptance for publication of the main findings from the final dataset (unless restrictions from third party agreements or IP protection apply), or on a timescale in line with the procedures of the relevant research field.

Modes of data sharing

Deposition in an appropriate data repository is encouraged. Alternatively, direct (and transparent) data sharing by the investigator or institution may be appropriate when justified.  

Preservation

Data should be retained for at least five years after grant end.

Discoverability

Groups should be proactive in communicating the contents of their datasets and making clear the conditions of access to that data.

Exclusive use

Limited period of exclusive use permitted where justified at the grant outset.

Acknowledgement

Secondary users of data should credit the original researcher and acknowledge CRUK for funding the original research.

You will generally need to complete a data sharing plan when applying for funding. The format of this plan depends on the committee you are applying to:

  • Applicants to the Population Research Committee must upload a formally structured template as an appendix to their FlexiGrant application¹
  • Applicants to other committees will be asked to complete a free text data sharing plan when applying through the electronic grant management system (FlexiGrant)

Depending on the committee you are applying to, we may follow up with you during the lifetime of your grant to check on progress with data sharing and adherence to your data sharing plan. Please refer to the guidelines for your specific committee for more details. 

We also understand that you may need to adapt the method and timelines for sharing discussed in your data sharing plan during the course of the study – for example, when potential intellectual property arises unexpectedly.

¹ These special requirements reflect the critical importance of sharing data from cohort and epidemiological studies.

The freetext data sharing plan gives you the option to discuss areas of data management and sharing most relevant to your proposed research.

Your answers should be concise, precise and realistic. The detail should be proportionate to the complexity of the study, the types of data being managed, the anticipated long-term value of the data, and the anticipated data security requirements.

You can consider structuring your plan by asking yourself the seven questions below, as a basis for the points/paragraphs in your plan. You may not find all the questions are relevant for your planned research, and these questions are only intended as suggestions. You can also consult the subject-specific guidance for discoveryclinical and population Research.

1)    What types of data from your project are shareable? Which are likely to be particularly useful for the scientific community?

  • Sub-divide your data if possible, and briefly summarise whether the type of data can be easily shared (and if not, why not). Examples of different data types may include: imaging data; genotypic data; clinical measurements; and data generated from surveys.
  • Highlight elements of your project data which are likely to have long-term value to the scientific community and are therefore most likely to be the focus of your data preservation efforts.

2)    Can you foresee any likely restrictions on data sharing, linked for example to intellectual property rights (IPR) or patient confidentiality? Are there ways to limit the impact of these restrictions?

  • If IPR is a serious concern, you may be able to regulate access to the data by: discussing options with a repository; justifying a longer time period for your team's exclusive use of the data; or material transfer agreements (MTAs) or data access agreements with clear terms which protect intellectual property.
  • If designing consent procedures for your study, you should think about the potential for future use of the study data. Therefore, where appropriate and feasible, in the consent form you should consider discussing secondary use of the data by other researchers.

3)    How will you store your data during the lifetime of the project and preserve it in the longer term for future reuse by other researchers?

  • Your answer could reference a number of methods for data storage, including: on your department's server; in an institutional repository; a discipline-specific repository which accepts specific types of structured data; or generalist repositories which are domain-agnostic. We would encourage you to store the data in a way which makes sharing of the data further down the line as easy as possible.
  • Consult the lists of repositories on our website (categorised into repositories for discovery researchclinical research, and population research). We would encourage the use of these specific disciplinary repositories if they exist; if not then we would encourage exploration of generalist repositories such as Figshare and Zenodo if appropriate, or repositories maintained by your institution.
  • We recognise that there is currently a shortage of third party repositories for hosting academic clinical trial data, and that a lot of data sharing exists between research groups in this area via collaboration agreements. Clinical researchers should still store and document their data in formats which are conducive to sharing with bona fide researchers, and be as transparent as possible about procedures for data access and sharing
  • You may also wish to consult BioSharing, which maintains a register of standards and databases in the life sciences. This allows you to search for any particularly specific databases which may not be included in the lists later in this document.

4)    How will you document, annotate and describe your data such that another research group could meaningfully interpret and use it? Are there formal metadata or other community-agreed standards you can use?

  • Agreed best practice community or formal standards for metadata provision should be adopted where these are in place to make the data usable (e.g. the Minimum Information About a Microarray Experiment guidelines for microarray data, or the CDISC standards for clinical research). Please provide links to these if available.
  • There are a number of resources which can provide guidance in developing high-quality metadata. Many of these are community specific, and can be found on the website of the relevant community repository. For example, the metabolomics repository MetaboLights only accepts submissions in a specific format that encourages standardised reporting and management of metabolomics metadata.
  • Several key metadata resources and standards for biological sciences are compiled on the website of the Digital Curation Centre. The Digital Curation Centre also maintains a list of more generic, domain-agnostic metadata standards.
  • BioSharing, the searchable portal of repositories, also allows users to search for metadata standards and reporting guidelines. 

5)    How will you make sure other research groups outside your consortium can discover the contents of the datasets generated through your research?

  • Making your data discoverable to other researchers is a crucial first step in data sharing. You should indicate in your plan whether you will use registries/repositories/indexes/word-of-mouth/publications to publicise the availability and accessibility of your data.
  • Consider how you can cross-reference and provide links to your dataset(s) – in relevant journal publications for example – to ensure data can be readily found.
  • Data discoverability is a particularly important consideration when you are not making your data available through third party repositories. If you as PI have to oversee data access, you must make clear - for example on your study website - what data are available for sharing and whom prospective secondary users of the data should contact to discuss data access. One example of this approach is the data sharing page of the CRUK-UCL Cancer Trials Centre. You can also consider publishing metadata about your dataset via a repository to ensure discoverability of your dataset (even if the data itself cannot be shared via a repository).
  • If your datasets are available through third-party discipline-specific or generalist repositories, you can consider other approaches to simultaneously improve the discoverability of your data and receive academic credit for doing so. For example:
    • You can deposit your data in repositories that provide datasets with unique, persistent Digital Object Identifiers (DOIs) using DataCite.
    • You can consider writing a ‘data descriptor', as published by journals like Nature: Scientific Data. These are peer-reviewed, open-access articles to describe scientifically-valuable datasets, thereby promoting data sharing and reuse, and allowing authors to receive credit for their data outputs. One example of a data descriptor in cancer imaging can be found here.

6)    Under what conditions will other researchers be able to access and re-use your data? For example, will your data be openly accessible, or will the PI make the decision to grant or refuse access - and on what grounds?

  • This part of your plan will depend on the chosen modes for storing and sharing data. You should make clear whether secondary users would be able to openly access data, or whether access would be controlled through an assessment process (e.g. requiring data sharing agreements and/or project proposals, or approval by a data access committee).
  • You can also explain the logistics of sharing the data (e.g. via secure file transfer for sensitive data), and describe the conditions placed on secondary users of the data by any data transfer agreements.

7)    Can you define any yearly milestones for data sharing? Are you planning for a period to enjoy exclusive use of your study data?

  • If you already have a sense of your publication timeframes, your plan can also discuss the likely milestones for sharing significant datasets. Ideally, we would expect datasets which underpin findings in journal publications to be released simultaneously or shortly afterwards. If you plan to share significant chunks of raw data from your study, we understand that this may take longer to prepare for sharing.
  • Regarding the period of time during which you would enjoy exclusive use of the data, our policy accepts that a limited period of exclusive use of data for primary research is reasonable, according to the nature and value of the data and the way they are generated and used. This period should be clearly defined and justified in your plan.

Generic guidance on data sharing

Guidance on using data repositories

Please also consult our documents on repositories pertaining to:

Resources for discovering data

Contact us

We would be grateful for any comments and feedback to help improve this guidance.

researchdata@cancer.org.uk

Download subject-specific guidance

Why aren't we sharing?

Why aren't we sharing?

We caught up with three of our researchers to find out why they’ve placed data sharing at the heart of their research programmes.