Data, Data Everywhere

A recent report from the Council of Library and Information Resources(CLIR), entitled “The Problem With Data”, delves into the increasingly crucial issue of data management and it’s associated issues.  A quote at the beginning of “The Problem with Data”, states that, “Every day, we create 2.5 quintillion bytes of data—so much that 90% of the data in the world today has been created in the last two years alone.”—IBM, Bringing Big Data to the Enterprise 1.

I was unfamiliar with the word quintillion so I had to look it up.  Apparently it comes after a quadrillion.  That is a lot of data being generated daily!  On a side note, this is what a quintillion pennies would look like.

CLIR and The Digital Library Federation were commissioned by the Alfred P. Sloan Foundation to complete a study of data curation practices among scholars at five institutions of higher education, Penn State, Lehigh University, Bucknell University, John Hopkins, and University of Pennsylvania. They conducted ethnographic interviews with faculty, postdoctoral fellows, graduate students, and other researchers in a variety of social sciences disciplines.  The goals of the study were to identify barriers to data curation, to recognize unmet researcher needs within the university environment, and to gain a holistic understanding of the workflows involved in the creation, management, and preservation of research data (pg.5).

Some key findings included:

-None of the researchers interviewed for the study had received formal training in data management practices, nor did they express satisfaction with their level of expertise. Researchers are learning on the job in an ad hoc fashion.

-Few researchers, especially among those who are early in their career, think about long-term preservation of their data.

-The demands of publication output overwhelm long-term considerations of data curation. Metadata and documentation are of interest only if they help a researcher complete his or her work

-There is a great need for more effective collaboration tools, as well as online spaces that support the volume of data generated and provide appropriate privacy and access controls.

-Few researchers are aware of the data services that the library might be able to provide and seem to regard the library as a dispensary of goods (e.g., books, articles) rather than a locus for real-time research/professional support.

This seems to me that there needs to be some fundamental changes made in education to prepare researchers for the challenges of dealing with all of this data.  Alternately there could be a push for more collaborative efforts in data management.  Several participants in the study expressed a desire for help with their data management and would welcome the opportunity to collaborate on some level with a colleague with greater expertise (pg.15).  The authors end their report with recommendations for future as well as the role that libraries can play in shaping that future. “There is a clear need for libraries to move beyond passively providing technology to embrace the changes in scholarly production
that emerging technologies have brought. Few researchers see the library as a partner, and most of the researchers in this study seemed to regard the library as a dispensary of goods (i.e., books, articles) rather than a locus for badly needed, real-time professional support.” (pg. 16).

As we move forward with the “Problem of Data“, it is important to remember that we cannot expect researchers to know that we are here to help them with their data management issues.  Librarians have long been known for their outreach efforts and it’s time to double our efforts not only for the benefit of researchers but also to stay a relevant and integral part of the great data machine.


One thought on “Data, Data Everywhere

  1. “This seems to me that there needs to be some fundamental changes made in education to prepare researchers for the challenges of dealing with all of this data.”

    Very true. I wrote a couple of posts over at the e-Science Community Blog about that. What is the best way to do it? As a standalone course or integrated within the curriculum of science and engineering courses? I think the latter would be better, but not as feasible. The former might be easier to get going. Then there’s the question of how to get students to attend the courses? Course credit? A certificate? There are still many questions to answer.

