Handling Peer Review
When, Where, and How to Share Data
Dr. Clark Holdsworth, Senior Manager Communications & Partnerships
The treatment of data within the scientific community has not remained static in what we might consider the modern era of scientific discourse. Concepts such as reproducibility have undergone challenges whether it be related to ethical considerations or the sheer volume increase of the literature. Data itself has fundamentally changed in scope for the same reasons, as the methods of data collection have exploded into dizzying levels of throughput. Further, advances in technology and statistical methods allow for new analysis of existing data.
When – The purpose of sharing
I’ll discuss the purpose of data sharing from two perspectives. The first relates to the rigor of the research community. With ever increasing investment in science since the 1940’s, this investment demands increased accountability. As science shifted away from patroned intellectual circles to an industrial-academic complex supported by governments and multinational organizations, there has been long periods focused on ensuring accountability of the community to combat fraud and low-quality research. Data sharing is chief among these in the modern era. The community and our larger society began to share the sentiment that the data itself as a de novo output of research requires transparency, archiving, and potentially verification—should the need arise.
The second perspective on the purpose of data sharing relates to the efficacy of the community. This is properly framed as the opportunity cost of failing to share data. With advances in technology and methods, it has become apparent that data is a commodity and that the scientific findings and conclusions arising from the data, while valuable, could not replace the raw data as there may be further—or future—utility for that data. The primary component of the scientific process improved by technology has debatably been the level of throughput. The gross, order-of-magnitude increases in throughput mean that old data becomes part of a collective pool that can contribute to new, independent studies, findings, and conclusions. The sentiment from the community has been that this means data sharing is needed from all researchers as a matter of accountability to ensure that potential findings and conclusions are not lost.
The message here should be clear: Data sharing should be a constant and routine consideration for all research.
How – Preparing shareable data
We see that data sharing has become an accountability within the community for the reasons outlined above. However, as with any accountability, it demands resources to be accomplished. In the case of data sharing this is primarily a time investment for the preparation required.
Preparation of shareable data involves 3 key components:
Clear, consistent documentation is required for the data to be processed and used in future research.
Data must be stored in a reliable database, on a reliable platform that can be accessed long after it has been collected.
Metadata is crucial for using data for new research. Without discovery, data sharing becomes theoretical, as the practical implications will become non-existent.
The exact protocol that you follow will depend on the stage at which you choose to share your data and where it will be preserved. There are many options here that can also be used redundantly depending on whether you tie the data to your manuscript or treat it independently. Below we categorize the 3 avenues for data sharing.
Where – Accomplishing data sharing as a routine practice
The manner in which you share data will be determined to varying degrees by your funder, publisher, and your choice among independent platforms.
Sharing data may be a requirement or point of policy from your funder. This means the “where” is first gatekept by a funder’s specific requirements. In some instances, you may submit data as part of your grant management process using a proprietary or affiliated tool of the funder. Conversely, it may mean sharing the data as part of your manuscript publication process—outlined in the second component herein—that meets the funder requirements.
Data can be shared as part of the manuscript publishing process if the venue you choose has pathways for data sharing. This is relatively common for many organizations, although it currently varies by field. This can be used either at your discretion as your personal choice to share data, or it can be used as a qualified solution to your funder’s requirements for data sharing.
Finally, there resources for data repositories not tied to your funder or publisher that can allow you to share the data at your discretion or as a suitable solution to your funder’s requirements. These repositories are certainly likely to be younger and less established than those of major funders and publishers, so be certain to choose appropriately to ensure the permanence of your data sharing.