View more stories by categories: Data View more stories by categories: DataBits

By Jon Ide and Mark Servilla, Environmental Data Initiative

ezEML is a form-based online tool to streamline the creation of metadata in the Ecological Metadata Language (EML). It was created by the Environmental Data Initiative (EDI) and employs EDI’s Metapype library, which is a general-purpose framework for creating and validating metadata, along with Metapype’s implementation of validation rules for the EML 2.2.0 standard. ezEML continues to be under active development by EDI, and we welcome feedback from the information management and science communities.

The EML standard was designed to handle an enormous variety of data scenarios, so it is complex, with a steep learning curve. In many data scenarios, however, only a relatively small subset of EML fields are needed. In such cases, ezEML provides a do-it-yourself tool that greatly simplifies EML document creation, especially for users who are new to EML or use it only infrequently.

ezEML can be used as a “wizard” leading the user through EML document creation step by step, or it can be used in a more user-directed fashion. Among other things, ezEML can upload data tables and other data entities and automatically infer most of their characteristics, import EML content from other ezEML documents, check the EML for correctness and completeness, and download the finished EML document as an EML XML file, together with associated data files, for downstream use in a data workflow.

ezEML was built mainly for scientists who are not themselves proficient in the details of EML. Some of its features, however, may make it attractive to a professional data manager as well. ezEML comes with extensive built-in Help and an associated User Guide. It is designed to be largely self-explanatory and easy to learn. We won’t attempt a comprehensive description here, but the following should give the reader a sense of what it’s like to create an EML document using ezEML.

Two YouTube videos are available that demonstrate ezEML’s use. A very short introduction is available at https://www.youtube.com/watch?v=lhtq7iSQIyM, and a more comprehensive demo is available at https://www.youtube.com/watch?v=2Q0ryNoYK4E&t=7s.

Managing EML Documents in ezEML

To use ezEML, users need to sign in with EDI, Google, GitHub, or ORCID credentials. ezEML maintains a collection of EML documents saved to a user’s account. New documents can be created, existing documents modified, portions of one document imported into another, and documents can be downloaded as XML files. A document can be edited in multiple sittings, since all progress is saved.

Navigating in ezEML

 As a form-based web application, ezEML organizes the data entry process into logical groups of fields in pages. All fields are marked as required, recommended, or optional. The left side menu provides easy navigation through the pages, and all entered information is automatically saved, except when explicitly canceled. 

When you open a new or existing EML document, you’ll see a screen that looks like this:

 

Notice the help buttons designated by the “circled question mark” icons, which appear throughout ezEML. They provide guidance both on the EML standard itself and on how to use various ezEML features. Notice also the User Guide link in the toolbar at the top. The Save and Continue button on every page moves the user through the sequence of pages in “wizard” fashion. The Contents menu on the left also gives direct access to all sections while automatically saving all changes. 

Sections that allow for multiple entries use a layout as shown below with Edit, Remove and Ordering functionality for each entry. Adding a new entry or editing an existing one opens a form for the item in question, with Save and Continue to take you back to the multiple entries’ screen. For instance, the Creators screen looks something like this:

 

Clicking Add Creator or Edit takes you to a form for an individual Creator.

All of the other sections are similar in structure, consisting of a single form, like Title, or incorporating several forms, like Creators.

Uploading Data Tables

 A data package typically contains one or more data tables. These data tables need to be described in the EML metadata. Entering the needed metadata by hand can be laborious and error prone. ezEML assists in this process by letting you upload your data tables in CSV (comma-separated value) format. ezEML then infers many of the needed metadata attributes for you.

Although we refer to the file as a CSV (comma-separated value) file, separators other than commas are supported, including tabs, vertical bars, colons, and semicolons. In addition, the quote character used to enclose values in the table can be specified. 

ezEML will infer much of the metadata for the table:

 

In addition, ezEML does its best to infer the needed metadata for the columns, which may be manually edited. Alternatively, column information may be cloned (imported) from another table or from a metadata file previously created in ezEML. Clicking Edit Column Properties brings up a page like the following, which lets you change the variable type and/or properties of each column.

 

Edit Properties for a column provides several advanced features. For categorical variables, ezEML collects the variable codes from the data table, assuring complete code definitions. Additional codes may be added manually, if desired. Standard units may be looked up for numerical attributes, assuring correct spelling. Custom units may also be entered and described manually.

The foregoing gives just a taste of the many detailed forms available in ezEML to enter the information required to describe a data set in EML. A look at the Contents list shows that we’ve only scratched the surface here. For example, keywords may be looked up in the LTER controlled vocabulary. Tables may be re-uploaded if any problems with the data are discovered during the metadata creation process, without causing metadata to be lost. Taxonomic names may be searched in a taxonomic authority database and full taxonomic trees inserted.

Checking Metadata

Certain metadata values are required for an EML document to be considered valid. Various other metadata values are not strictly required by the EML standard but are highly recommended.

The Check Metadata feature displays an indicator that is red, yellow, or green depending on whether the metadata is missing one or more required values, is missing only recommended values, or is problem-free. Clicking Check Metadata provides a list of problems, each linked directly to the form page that needs editing.

 

Once all problems have been addressed, the EML document is ready to download for submission to EDI or another data repository that accepts metadata in the EML standard. 

Future Directions

ezEML includes a great many capabilities beyond what has been described above, and new features are being added all the time. We have several things in mind for the near future.

For example, ezEML has an Export Data Package feature that zips up the metadata document and the accompanying data files to ready them for submission to a data repository such as the repository hosted by EDI. Guidance is provided on how to obtain a Data Package ID from EDI and Submit to EDI. We intend to streamline this process, making it more of a unified, one-click step.

We’d also like to add various quality checks to be performed on data tables as they are uploaded. DateTime formats could be checked, for example, to determine whether they conform to the ISO 8601 standard. We could check whether column headers are well-formed, whether data values within a column are of consistent types, and so on.

Longer term, we’d like to explore ways that ezEML can leverage other online sources, such as standardized vocabularies and ontologies. ezEML already has the ability to query online taxonomic authorities to fill in a taxonomic hierarchy, and we should be able to expand on this kind of access to exploit other online resources. 

It may be possible as well for ezEML to be “smarter” about coaching and assisting users in the creation of EML documents, providing hints and suggestions beyond the current checks for errors and missing values.

Doubtless lots of other future possibilities exist. As we explore the possibilities, though, we intend never to lose sight of what we view as a key success factor for ezEML, namely that it should remain easy to learn and use. That is, it should live up to its name.

Acknowledgements

ezEML has benefitted from ideas contributed by everyone on the EDI team. Various users of ezEML have contributed ideas as well, and we are grateful for all of them.

In particular, we’d like to single out Duane Costa, formerly a member of the EDI team, who did initial programming on what later became ezEML, in addition to his substantial contributions to the Metapype library and Metapype’s implementation of rules for the EML 2.2.0 standard. Much of Duane’s code lives on in both ezEML and Metapype.