At-Risk Data Commons Background

In our increasingly interdependent and connected world, sound decision-making at all levels from individual households to nations and beyond, requires factual information in the form of reliable, easy to use data.  Yet, even the most well-resourced repositories have difficulty keeping their holdings available in usable forms.  For example, while teaching at UIUC, Ms. Duerr routinely asked agencies such as NASA, NOAA, Geological Survey of Alabama, etc. to provide datasets requiring help so that her students could get hands on data curation experience.  She never had a shortage of requests – for example NASA’s Space Science Data Center provided a list of more than 600 datasets from their earliest rockets and satellite missions.  These datasets were in formats that predated ASCII.  NASA needed them reformatted into today’s CDF format and needed metadata created from pdf’s of the original documentation.  While her students were able to process a few of these data sets to the satisfaction of NASA, more than 600 data sets remain to be processed.  The situation with other data centers is similar and the horror stories from researchers about obtaining data from their colleagues is even more compelling.

Compounding this, the last US presidential election left many citizens, companies and researchers concerned about the availability of the data they need in order to perform their work.  This led to the establishment of a number of data rescue events.  However well-meaning, these events were often not informed by data management expertise, which led the ESIP community to publish the white paper “Stronger Together: The Case for Cross-sector Collaboration in Identifying and Preserving At-risk Data” which clearly spells out the issues and suggests that the various communities need to work together, though no mechanisms were proposed for actually doing this, in contrast to the mechanisms proposed here.

The At-Risk Data Commons and associated prototype Data Nomination Tool had its genesis at a workshop sponsored by an IMLS grant awarded to the The Sheridan Libraries Johns Hopkins University where participants who had individually explored data archive issues and solutions were brought together to explore current issues and challenges with data archiving and data “rescue” activities. One of the participants mentioned that a tool similar to that matched data needs with those who could help would be useful in resolving some of these data problems.  Thus the At-Risk Data Commons was born.

Considerable expertise exists in our very young organization and a significant amount of activity has already taken place:

  1. For example, the prototype Data Nomination tool was developed by Cloud BIRST.
  2. Considering becoming an ESIP cluster.
    Considering becoming a Gateway Projects participant.
  3. Additionally, we have attended a Science Gateways Bootcamp to begin the process of developing a business and sustainability plan and have submitted a Demo and Poster proposal to the Gateways 2019 meeting.
  4. Demonstrations/presentations are planned for:
  • 2019 RDA Spring Plenary,
  • 2019 IUGG,
  • ESIP Summer Meeting 2019,
  • USGS CDI 2019 Meeting,
  • 2019 EarthCube Community Meeting,
  • 2019 RDA Fall meeting.