There just so happens to be a new resource for librarians and researchers alike to assist in the process of curating data.  Databib, a joint venture by IMLS, the Purdue and Penn State University Libraries, provides useful information about data repositories such as the title and URL of a repository; who maintains the repository; its access, deposit, and reuse policies; a concise abstract; annotations from other users; and Library of Congress Subject Headings, which are linked to other repositories in the same subject areas.  It is still in it’s infancy and is by no means comprehensive, but it’s potential for growth and usability is very good!  Just don’t confuse it with this sort of data bib.

From their website:

Databib is a tool for helping people identify and locate online repositories of research data.

Users and bibliographers create and curate records that describe data repositories that users can search.

  • What repositories are appropriate for a researcher to submit his or her data to?
  • How do users find appropriate data repositories and discover datasets that meet their needs?
  • How can librarians help patrons locate and integrate data into their research or learning?

Databib attempts to address these needs for the research community, including:

  • data users
  • data producers
  • publishers and professional societies
  • librarians
  • research funding agencies

Data Management De-Mystified

So obviously data management is a huge buzz word in the scientific and librarian communities right now, but what does it actually mean and how does an individual or organization go about implementing a data management plan (DMP)?  There is a lot of current literature batting this term around but it is often spoken of in a vague manner.  Even the funding agencies that are mandating DMPs as a provision of their financial assistance are not clear on their expectations and often give very little practical assistance in implementing a DMP.  A recent study conducted by librarians and graduate students from Cornell and Syracuse respectively, looked at 22 different policies from 10 of the major funders of research in the United States.

They found that, “these policies reveal gaps between data management goals and implementation realities, as policy requirements were vague. Many funders had policies stating data be made accessible yet did not supply implementation details. Funding for data preservation efforts continues to be an area of concern for both PIs and information professionals, yet we found that detailed language about funding for data management activities was only included in eight of the policies we studied.”

Expecting researchers to sift through the myriad of literature on DMP to satisfy vague requirements seems like somewhat of a waste of their precious time.  This again is another avenue where the services of librarians can be utilized to assist in this process.

“Understanding the data requirements researchers face with respect to data management is key: knowing which requirements are vague or under-supported reveals opportunities for outreach, education, and participation with researchers. As we noted, many of the data policies are more likely to be general rather than specific, and so information professionals have the opportunity to become involved, especially in these specific data management areas. Providing guidance in the selection of appropriate standards is one avenue for involvement; other avenues may include providing documentation on evaluation criteria for metadata and data standards or participating in the development of discipline-specific metadata and data standards.”

Dietrich, D., Adamus, T., Miner, A., & Steinhart, G. (2012). De-mystifying the data management requirements of research funders. Issues in science and technology librarianship, 70.

Retrieved from


noun plural but singular or plural in construction, often attributive \ˈdā-tə, ˈda- also ˈdä-\

1: factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation

2: information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful

3: information in numerical form that can be digitally transmitted or processed


noun \ˈma-nij-mənt\

1: the act or art of managing : the conducting or supervising of something (as a business)

2: judicious use of means to accomplish an end

3: the collective body of those who manage or direct an enterprise


Data, Data Everywhere

A recent report from the Council of Library and Information Resources(CLIR), entitled “The Problem With Data”, delves into the increasingly crucial issue of data management and it’s associated issues.  A quote at the beginning of “The Problem with Data”, states that, “Every day, we create 2.5 quintillion bytes of data—so much that 90% of the data in the world today has been created in the last two years alone.”—IBM, Bringing Big Data to the Enterprise 1.

I was unfamiliar with the word quintillion so I had to look it up.  Apparently it comes after a quadrillion.  That is a lot of data being generated daily!  On a side note, this is what a quintillion pennies would look like.

CLIR and The Digital Library Federation were commissioned by the Alfred P. Sloan Foundation to complete a study of data curation practices among scholars at five institutions of higher education, Penn State, Lehigh University, Bucknell University, John Hopkins, and University of Pennsylvania. They conducted ethnographic interviews with faculty, postdoctoral fellows, graduate students, and other researchers in a variety of social sciences disciplines.  The goals of the study were to identify barriers to data curation, to recognize unmet researcher needs within the university environment, and to gain a holistic understanding of the workflows involved in the creation, management, and preservation of research data (pg.5).

Some key findings included:

-None of the researchers interviewed for the study had received formal training in data management practices, nor did they express satisfaction with their level of expertise. Researchers are learning on the job in an ad hoc fashion.

-Few researchers, especially among those who are early in their career, think about long-term preservation of their data.

-The demands of publication output overwhelm long-term considerations of data curation. Metadata and documentation are of interest only if they help a researcher complete his or her work

-There is a great need for more effective collaboration tools, as well as online spaces that support the volume of data generated and provide appropriate privacy and access controls.

-Few researchers are aware of the data services that the library might be able to provide and seem to regard the library as a dispensary of goods (e.g., books, articles) rather than a locus for real-time research/professional support.

This seems to me that there needs to be some fundamental changes made in education to prepare researchers for the challenges of dealing with all of this data.  Alternately there could be a push for more collaborative efforts in data management.  Several participants in the study expressed a desire for help with their data management and would welcome the opportunity to collaborate on some level with a colleague with greater expertise (pg.15).  The authors end their report with recommendations for future as well as the role that libraries can play in shaping that future. “There is a clear need for libraries to move beyond passively providing technology to embrace the changes in scholarly production
that emerging technologies have brought. Few researchers see the library as a partner, and most of the researchers in this study seemed to regard the library as a dispensary of goods (i.e., books, articles) rather than a locus for badly needed, real-time professional support.” (pg. 16).

As we move forward with the “Problem of Data“, it is important to remember that we cannot expect researchers to know that we are here to help them with their data management issues.  Librarians have long been known for their outreach efforts and it’s time to double our efforts not only for the benefit of researchers but also to stay a relevant and integral part of the great data machine.