View more stories by categories: Data View more stories by categories: DataBits

IMCR Header

 

By Kristin Vanderbilt (EDI) & Colin Smith (EDI)

Scientists, students, and information management professionals often find themselves in need of code or software to help clean, process, document, and manage data. The Information Management Code Registry (IMCR) aims to make it easier to find existing tools by guiding users to discover software that performs the functions they require. The IMCR has scoured code repositories and located 187 open source programs, typically written in R or Python, to include in the registry.

The IMCR is implemented in OntoSoft, a framework for creating software catalogs developed by the NSF EarthCube project (Gil et al. 2015). OntoSoft is intended to describe software written by scientists themselves. OntoSoft allows users to specify detailed metadata about catalog entries to improve software discovery and reuse. Metadata are divided into six categories, and the Ontosoft display for a particular software includes a pie chart showing how much metadata in each of those six categories has been completed (Figure 1). The greener the pie piece, the more metadata has been entered for that category.

 

Image of IMCR interface

Figure 1. The IMCR entry for the R package ecocomDP in OntoSoft indicates with the figure on the right side that metadata have been provided to help a user 1) identify software, 2) understand and assess software, 3) execute software, 4) get support for the software, 5) do research with the software, and 6) update the software.

The IMCR has developed a controlled vocabulary of terms to use to describe the software, often according to its intended use. This allows the IMCR to be indexed on these terms, greatly improving discoverability. This vocabulary is maintained in TemaTres, an open source vocabulary server (Gonzalez-Aguilar et al. 2012)  (Figure 2). The terms in the IMCR Thesaurus are aligned with selected terms from other vocabularies, including the Software Ontology (SWO) (Malone et al. 2014), the Global Change Master Directory (GCMD) (Lief et al. 2005), and the LTER Controlled Vocabulary (Porter 2010).

The TemaTres interface to the IMCR Thesaurus

Figure 2. The TemaTres interface to the IMCR Thesaurus showing the highest level terms, within which other terms are nested. Users are welcome to browse this resource and suggest additional terms.

Users can search within the OntoSoft interface for appropriate software using several facets, including keywords, author, license, and language (Figure 3). The registry provides links to the software in the repository where it resides (e.g., github, CRAN).

Figure 3. Six registry entries are returned by a search for Keyword = “quality control” and language = “Python”.

The IMCR, which has slowly evolved over the last few years, is also a platform for community-building around the principle of sharing open source software to support information management. Earth Science Information Partners (ESIP) supports this project by hosting the IMCR Cluster. Members of the Cluster provide valuable input toward the development of this resource. Interested in getting involved?  Please see the contact information on the Cluster webpage.

The IMCR is a work in progress.  Comments on this beta version would be much appreciated.

References

Gil, Y., Ratnakar, V., Garijo, D. 2015. OntoSoft: Capturing Scientific Software Metadata.  In: K-CAP 2015: Proceedings of the 8th International Conference on Knowledge Capture. October 2015.  Article No. 32, pp. 1-4. https://doi.org/10.1145/2815833.2816955

Gonzales-Aguilar,  A., Ramírez-Posada, M., Ferreyra, D. 2012.TemaTres: servidor de vocabularios controlados para gestión de tesauros. El profesional de la información. 21:319-325. http://dx.doi.org/10.3145/epi.2012.may.14.

Lief, C., Olsen, L., Major, G.R. 2005. Global observing systems datasets in the Global Change Master Directory.  ESRI.https://proceedings.esri.com/library/userconf/proc05/papers/pap2070.pdf.

Malone, J., Brown, A., Lister, A.L. et al. 2014.The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation. Journal of Biomedical Semantics 5: 25. https://doi.org/10.1186/2041-1480-5-25.

Porter, J. H. 2010. A Controlled Vocabulary for LTER Datasets. LTER Databits, Spring 2010.