View more stories by categories: DataBits

–by John Porter, Virginia Coast Reserve Information Manager

In late in 2020 the LTERHub was established (https://lternetwork.force.com/lterhub/s/article/What-is-LTERHub). One of the valuable features of the hub is that it serves as the LTER Personnel Directory and powers the “Directory” link on the LTER home page.   The web interface allows individuals to search based on name, site, role, committee memberships and interests.  Participants can modify their own information in the directory to keep it up to date on LTERHub. However, for some tasks, such as linking to other data systems or checking for what needs to be updated, you need data in a machine-readable form, not just as a web page.

screenshot of LTERHubhomepageEnter the Application Program Interface (API) for the Directory within LTERHub.  It allows downloading of directory entries for a site, or downloading of information for a specific individual.  The information is requested using a REST web-service, and the information is returned in JavaScript Object Notation (JSON) – a structured format that is easily parsed using a wide variety of computer tools including Python, JAVA and R. 

The full documentation on the API is available to LTER Information Managers in the LTER-committee-information-management shared Google Drive, under Resources/personnel-updates/LTERHub-API.  Use of the API to retrieve site listings requires an 18-character Account ID that is specific to each site.  The list of Account IDs is also available in the “Site IDs” document in the same folder.  The Site ID list should not be shared outside of LTER, due to the potential for misuse in creating mass spam emails etc.

To get a site listing downloaded, you can use a web browser, or any software tool used to access the web (e.g., CURL) to access the URL:  http://lternet.force.com/services/apexrest/v1/Site/XXX_mySiteId_XXX/Directory/ where XXX_mySiteId_XXX is the unique 18-character Account ID for the site for which data is requested.

The information returned as JSON includes each individual’s name, email address, list of sites, and site contact ids and the role played at each site, the list of committees, the list of interests and links to relevant identifiers or social media such as ORCID, twitter, linkedin.  It also includes an 18-character “contact_id” – a unique identifier for each individual within the Directory database, which can be used to disambiguate individuals sharing the same name.   This unique “contact_id” can also be used with a second web service to return the information for a single individual, instead of the listing for the entire site.

Use of the API has the potential to eliminate the need for a LTER site to maintain a separate database of personnel working at the site.  For example, in generating Ecological Metadata Language (EML) metadata using a metadatabase, identifying individuals within the metadatabase using their contact_id’s would allow rapid retrieval of up-to-date contact information for each individual.  An important feature of the Directory that facilitates such use is that the database contains information on both active and inactive site participants, so that even if a graduate student moves on from a site, their information remains accessible in the Directory database for linking to metadata. Unlike the web interface, the API returns all individuals associated with a site, even former or inactive members.

Another use of the API is to help prepare lists of bulk personnel changes at a site,  particularly creating accounts for new participants, and marking departing researchers as inactive.  The protocol for bulk updating of personnel lists (https://lter.github.io/im-manual/site-personnel) recommends submitting a spreadsheet containing information for each individual to be added (Status=Current) and marked inactive (Status=Former). In sites that may have up to 100 or more participants in various roles, from high-school interns to investigators, keeping track of who is involved can be arduous and time consuming.  However, developing such lists is a standard part of the preparation of NSF Annual Reports.  So we have developed R code which uses the API to access our LTERHub site list, and compare it to the most recent personnel spreadsheet from NSF on an individual-by-individual basis. The program then prepares a bulk change list as recommended. Names that are in the NSF list, but not in LTERHub need to be added, and names that are in LTERHub but not in the NSF listing  are marked “Former.”

This process is not without its challenges.  The API returns name as a single string (e.g., “Joan Q. Doe”), but the bulk entry spreadsheet requires first, middle and last names be separated (e.g., “Joan”, “Q.”,”Doe”), so the name from the API needs to be parsed.  A bigger limitation is that the API does not return any status information (“active” vs “inactive”) , so that we need to go back to the directory web interface (which shows only “active” participants) to determine which individuals are newly “inactive” and which are already marked “inactive.”  Additionally, the bulk personnel change form requests additional information on past roles (and dates for them) that are not available from either the API or web interface.  Nonetheless, using the API allowed us to substantially speed up the process of updating the LTERHub directory. In principle, the process could be reversed, using LTERHub data to drive creation of the NSF listing, particularly with respect to new participants.