Raw diffraction data preservation and reuse: overview, update on practicalities and metadata requirements

Loes M. J. Kroon-Batenburg, John R. Helliwell, Brian McMahon, Thomas C. Terwilliger
  • IUCrJ, January 2017, International Union of Crystallography
  • DOI: 10.1107/s2052252516018315

What is it about?

The importance of retaining and making available the experimental data to support scientific results is becoming increasingly evident to policy makers, funders and scientists themselves. Crystallography has always had a tradition of data sharing, but historically this has been in the form of processed experimental data and the derived atomic and molecular structure based on those processed experimental data. It is now much easier to collect and store the complete set of underlying ie primary, also known as ‘raw’, data collected in a crystallography experiment. There are, however, some difficulties. in archiving the raw experimental data for later reuse, which are in the ‘Big Data’ level of challenge, but also important scientific benefits to be exploited by preserving these raw data, which we discuss in this article. The most obvious difficulty is the large size of raw data sets; but even more important is the need to ensure that the raw data can subsequently be interpreted and re-evaluated; this requires a detailed understanding of all the relevant experimental parameters. This is known as the metadata associated with a raw data set. The International Union of Crystallography (IUCr) has taken a keen interest in specifying what metadata are essential for understanding and reusing data and also likewise for the raw data from crystallographic experiments. This article is an outcome of an workshop sponsored by the IUCr and held in Croatia in 2015 to engage the community in agreeing these essential metadata, and systematically collecting them. The workshop has been widely regarded by the community as monitored by downloads of the talks from the workshop website http://www.iucr.org/resources/data/dddwg/rovinj-workshop . The article has suitably expanded and extended the coverage of the topic.

Why is it important?

In our article we describe, for example, funding agencies' policies and initiatives on research data management in Asia, Europe and the USA so we can definitely hope that those agencies' funding managers and committees would be interested and regard our article as important. Not least 2014 was the International Year of Crystallography, sponsored by the United Nations, and which certainly brought crystallography to the fore. We also hope that the layman, who pays taxes for all of this science research, will at least find the general aspects of our article accessible and regard it as an informative example which can be a benchmark for other scientific fields to be measured against by these taxpayers. Since we describe ‘Big Data’ challenges in crystallography in our article we can hope and expect that school children, who may well be set homework or projects on 'Big Data', could select crystallography as their science research field example that they will describe. Finally the politicians, who approve the funding, and we hope for a share of it, will be attracted to read our article as well, not least by starting with this Kudos general summary!

Perspectives

Professor John Richard Helliwell
University of Manchester

I have had a long standing interest in the importance of raw data in my field for example presenting a lecture on aspects of this for methods improvements within the community at the World Congress of Crystallography in 1981 in Ottawa (see https://zenodo.org/record/166325). It is also a great pleasure to work with my co-authors on researching for and writing on this topic in our article. We have also undertaken much more besides as colleagues in the IUCr’s Diffraction Data Deposition Working Group; our most recent report can be found here :- http://forums.iucr.org/viewtopic.php?f=21&t=347

Brian McMahon
IUCr

I believe that this article demonstrates the IUCr's commitment to best scientific practice in several ways. It set up the Diffraction Data Deposition Working Group (DDWG) to analyse carefully the rationale behind the potentially expensive trend towards large-scale raw data archiving. The DDDWG has attracted experts on a wide range of pertinent topics to its workshops, the full proceedings of which are available through its website. The continuing CIF project to characterize data (and associated metadata) provides a robust software framework, and interactions between the CIF developers and other data format designers ensure that this work has benefits beyond the strict confines of crystallography.

Read Publication

http://dx.doi.org/10.1107/s2052252516018315

The following have contributed to this page: Professor John Richard Helliwell and Brian McMahon