View more stories by categories: DataBits

This article part of DataBits, stories about data management, techniques, and tools. DataBits is curated by the LTER Information Managers. For more information and to contribute a DataBits article, reach out to the Network Office or Marina Frantz, current editor of DataBits.

by: Nick Lyon – LTER Network Office Data Scientist
Angel Chen – LTER Network Office Data Scientist

The LTER Network Office is proud to announce a new R package: ltertools!

One of R’s main strengths is that any user can turn their code into functions and then bundle these tools together into a package for use by others. However, developing the behind-the-scenes architecture and tests required for a full package is often  an insurmountable hurdle for many scientists. This means that many useful functions are never shared across the community. 

We developed ltertools to allow members of the LTER community to  contribute their functions to a shareable, open source R package without needing to take all those background steps themselves. 

The purpose of ltertools is simply to collect ‘functions developed by the LTER community,’ so we are not concerned with functions fitting a particular theme (at least not at this point). We welcome functions that are broadly useful to coders everywhere, need to be shared easily across a team, make a particularly thorny task easier, and more. 

The current primary functions of ltertools are those for streamlining data harmonization. Extensive harmonization is often required for synthesis science, and we’ve had the chance to develop one method for harmonization based on our collaborations with LTER Synthesis Working Groups.

Harmonization via column key

For example, the current suite of functions facilitates a “column key” based approach to data harmonization. This involves creating a “column key” file that identifies the column names in each data file, then specifies what their name should become in order to combine the files. This method has been hugely popular with working groups as this column key can be created and edited collaboratively in platforms like Google Sheets and then easily integrated into a reproducible scripted workflow. The fundamental function here is called ‘harmonize’ but there is a helper function called ‘begin_key’ that aims to avoid manual copy/pasting of the original column names that may also be of value.

Aside from harmonizing tools, this package also includes functions for reading in multiple files simultaneously, calculating coefficients of variation (CV), and identifying solar day information for a given set of coordinates, among others.

Contribute to the package

If you do decide to contribute a function to this package, we will happily grant you co-authorship on the package. If you share an idea about a function with us, we’ll list you as a contributor in the package. Similarly, if you contribute function code that isn’t written in R (e.g., Python, etc.) and we do the translation into R, we’ll add you as a package contributor. For specifics, please see our contributing guidelines. Generally, we’re hoping many functions eventually find their home in this package so please reach out if you’d like to discuss some facet of the contribution process.

Hopefully ltertools is valuable with the functions it already has, but we would also love to add some of your functions to this package! Please feel free to open a GitHub Issue on the package if you have function code or a concept to contribute. If you’d like to discuss this further before going straight to GitHub, you’re welcome to reach out to the package maintainer: Nick Lyon.