Even the best ecoinformatics tool requires a skilled hand: Best practices for taxonomic name resolution in biodiversity science

Even the best ecoinformatics tool requires a skilled hand: Best practices for taxonomic name resolution in biodiversity science

Increasingly, informatics approaches in biology and ecology are realizing that ‘Names are key to the big new biology‘. Did you know that in many large biological, ecological, and evolutionary datasets and databases contained on the order of 50% of their species names as being in some way wrong or uncertain?  (see also this additional recent study that has reached a similar conclusion as us). As a macro ecologist and a biodiversity scientist this finding came not only as a shock but also a realization that any science using large biodiversity databases (including genomic studies from many different taxa) will likely face a TNRSfundamental hurtle that limits the quality of the science.

A few years back we released the first version of a tool to bridge the names hurtle and to help standardise biological names and taxonomy. The Taxonomic Name Resolution Service or TNRS website is used by over 50 people per day now. The TNRS is also accessible via an API which is a part of the Taxize taxonomic standardisation R package.  Since we released the TNRS there have been over 90,000 page views of http://tnrs.iplantcollaborative.org, with 51,000 sessions made by 20,800 users from all over the world – and this is not counting usage via the API. We have continued to update it and have just released version 4.0 which now includes more taxonomic sources – including The Plant List from Kew.

While the TNRS can help standardise names the default output from the TNRS (or other taxonomic services) should not be taken as the definitive last word on taxonomic standardisation. In any large dataset, there is inevitably a handful of names that cannot be resolved with certainty and whose meaning must be researched manually.  Indeed, we have found that there continue to be questions on how best to use the TNRS, how to interpret its output, and in particular what to do when the TNRS fails to provide an accepted name.

The TNRS standardises names by checking them against authoritative taxonomic databases such as The Plant List (www.theplantlist.org), the Missouri Botanical Garden’s Tropicos (www.tropicos.org), USDA Plants (http://plants.usda.gov), and others. With so many potential sources, the question often arises, which  is best?

Recently, my long time collaborator, lab associate, and good friend Brad Boyle wrote up a guide for taxonomic name resolution. For any biodiversity scientist who understands the need to standardise taxonomic names for their analyses this is a must read.  We have added this guide to the TNRS site and I have included his slightly edited and modified thoughts below.


A guide to taxonomic name resolution – Brad Boyle

  1. Which (taxonomic) sources to use? In general, I recommend you use TPL, GCC, ILDIS and Tropicos. The first three, TPL+GCC+ILDIS, provide the same taxonomic coverage as the online version of The Plant List (www.theplantlist.org). Our data use agreement with The Plant List (source TPL in the TNRS) requires that we obtain taxonomic data separately from sources GCC and ILDIS (families Asteraceae and Fabaceae, respectively). If you use TPL alone, it will be missing those two families, so please remember to include all three. I also recommend you include Tropicos, even though it is a data contributor to The Plant List. The reason for this is that Tropicos is updated much more frequently than The Plant List. I have found many cases where a name is missing from TPL but present in Tropicos. If your list of species come only from the USA or Canada, you might consider using USDA (USDA Plants) in addition to—or instead of—the global sources listed above. Indeed, some US government agencies may be required to follow USDA taxonomy. However, you should avoid using this source if your list is likely to include species from outside North America north of Mexico.

In theory, The Plant List includes all known vascular plant names, world wide. In practice, I have encountered cases where legitimate, validly published names are missing. So far most have turned out to be Old World species, mostly from southeast Asia; by contrast, coverage should be good for the New World. Tropicos is nearly comprehensive for the New World, but much less complete for the Old World.

  1. Taxonomic status The most important thing to know is that Taxonomic_status refers to the matched name (Name_matched in the TNRS results file), not the accepted name (Accepted_name). An accepted name is ‘Accepted’, by definition.

If Accepted_name is provided, you should use it, regardless of the taxonomic status of Name_matched (a possible exception is Taxonomic_status=‘Missapplied’, see below).

If no accepted name is provided, I do not recommend using the matched name uncritically. I would research the name, taking into account its taxonomic status. The TNRS does not determine Taxonomic_status itself; it simply transmits the opinion provided by the taxonomic sources (TPL, GCC, etc.). The values of Taxonomic_status are as follows:

Accepted: Name_matched is accepted by the taxonomic sources, therefore Name_matched = Accepted_name. Use Accepted_name.

  • Synonym: Name_matched is legitimate and validy published under the botanical code, but is a synonym of a different name, therefore Name_matched <> Accepted_name. Use Accepted_name.
  • Illegitimate: Name_matched is illegitimate under the botanical code, meaning it was validly published but violates some other nomenclatural rule; for example it is a posterior homonym of the same name published previously for a different taxon. The Name_matched should not be used. There will be no Accepted_name because an illegitimate name cannot be a synonym of anything. Figuring out which name should have been used can be difficult or impossible in some cases.
  • Invalid: Name_matched may or may not be legitimate, but that is a moot point because it was never validly published. These are often informal or proposed names that have been used in the scientific literature without being formally published as new species. Such a name should not be used. There will be no Accepted_name because an invalid name cannot be a synonym of anything. Figuring out which name should have been used can be difficult or impossible in some cases.
  • Rejected name: Name_matched is legitimate and validly published, but was rejected for some other reason by a special committee at a botanical congress. Such names should not be used. There will be no Accepted_name because a rejected name cannot be a synonym of anything. Figuring out which name should have been used can be difficult or impossible in some cases.
  • Misapplied: Name_matched has been incorrectly applied to the wrong taxon in a taxonomic publication (often indicated by ‘auct. non’ as part of the authority in Name_matched_author). In theory, there should be two entries for a misapplied name: one with taxonomic_status=“Accepted’ and Name_matched=Accepted_name, and another with taxonomic_status=‘Misapplied’ and Name_matched<>Accepted_name (if the correct meaning is known) or no Accepted_name (if the correct meaning is unknown). However, TPL and ILDIS frequently indicate only the incorrect (misapplied) usage of a name, not the correct (accepted) usage. Even if an Accepted_name is provided for a misapplied name, I would recommend checking how and where the name was used in your original data. This is because you need to know if the correct meaning or the misapplied meaning was intended. Figuring out which name should have been used can be difficult or impossible in some cases.
  • No opinion: Name_matched is present in or more source databases, but no opinion as to taxonomic status is provided. ‘No opinion’ is not a term from the botanical code. Instead, it indicates that the source did not have enough information to reach a decision as to whether the name should be used. Tropicos returns “No opinion” for such names; TPL returns “Unresolved”. They have the same meaning, and the TNRS uses “No opinion” in all cases. In theory, Accepted_name should always be missing if the Taxonomic_status=’No opinion’. However, TPL sometimes marks a matched name as “Unresolved” (=“No opinion”) yet returns an Accepted_name. My understanding is that these are cases where one or more taxonomic references have suggested that the Name_matched is a synonym of the Accepted_name, but other references disagree. Therefore, uncertainty is high. It is up to you to decide whether or not to use the Accepted_name when Taxonomic_status=’No opinion’. If Accepted_name is not provided, you should always research the Name_matched before using it.

To research a name, you can begin by clicking on the link that takes you to the entry for that name in one or more of the taxonomic sources. Tropicos (www.tropicos.org) is particularly good as it provides information on the literature where the name has been used. In many cases, by going to the original literature, I have been able to resolve a name manually, even though Tropicos and TPL declare it unresolved. One simple trick you can use is to go with the opinion of the latest publication and ignore the earlier ones.

I hope I haven’t made this sound more complicated than it is. Resolving those last few “No opinion” names can be challenging, but in the majority of cases you should be able to use the accepted name without worry.


Categories: Uncategorized

1 Comment »

  1. Thanks, Brian and Brad, that is very useful and also an interesting insight into the ongoing evolution of the landscape of names databases for biodiversity users and machine clients.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s