A better way to tag taxonomic ranks

Hi mappers,

This thread was originally created for gathering feedback on a proposal. But with discussions came new insights. I’ve cancelled the proposal in favour of the discussion, which (hopefully) will lead to a new kind of proposal.

The original proposal can still be found here. But please join the discussion for a better solution!

I’ve rewritten the proposal to be clearer and more concise, but made no substantive changes.

The explanation and reason haven’t justified this. taxon= is exactly not meant to be parsed, It’s a method to fill in the taxonomy without ranking it. What’s the problem with using species= , taxon:*= ?

1 Like

That’s a fair question, we’re having a similar discussion in the Dutch community. Where I’m aming at with this proposal, is infraspecific names such as subspecies and cultivar. I think there is a lacking of consensus over how to tag these. One person prefers to add everything to the species=* tag, the other uses taxon=* in combination with unnecessary infraspecific tags. For example, taxon:cultivar=Flame, says something about taxon=*. That’s a tag saying something about another tag and a tag that is useless by itself, which (I heard) is a thing we should prevent.

So based on that I’m already deviating from my original proposal and was thinking about the following two options:

  1. Put infraspecific names, so everything that comes after species in hierarchy (subspecies, variety, form, cultivar, etc.), inside the species=* tag. Documenting that and deprecate/discourage infraspecific taxon sub-keys like taxon:cultivar=* & taxon:subspecies=*, which don’t hold any valuable information and only say something about taxon=*, in favour of that way of tagging. Like species=Fraxinus angustifolia var. pilosa. Or;
  2. Put infraspecific names in their own taxon:<taxon_rank>=* key. Such as taxon:cultivated_species=Fraxinus angustifolia 'Flame'. This is different from my original proposal for taxon:rank=cultivated_species, because that would be a tag saying something about taxon=*. The taxon:<taxon_rank>=* key is useful by itself.

The first option is less accurate and harder to parse. The second option is more explicit, but therefore making it more complex.

Either ways, I think it’s good to have consensus over which way to go when tagging infraspecific names.

From what I understand, species is basically a shorthand for taxon:species and below, i.e. it says something about taxon. taxon itself is supposed to be reserved for use when the specific subkeys aren’t known; it allows later mappers to categorise it accordingly.

taxon won’t usually be used, but it’s a good fallback as outlined above.

I’d prefer the second way of tagging, and where it is too complex, one can put the (full) name into the taxon key or species information with subspecies and cultivar into species.

E.g. Fraxinus angustifolia subsp. oxycarpa 'Raywood' would become
Full tagging:

taxon:genus=Fraxinus
taxon:species=angustifolia
taxon:subspecies=oxycarpa
taxon:cultivar=Raywood

A bit simpler at the expense of machine readability is

taxon:genus=Fraxinus
species=angustifolia subsp. oxycarpa 'Raywood'

And the simplest tagging for other mappers is
taxon=Fraxinus angustifolia subsp. oxycarpa 'Raywood'

Of course, a taxon:wikidata could alleviate the need for any other taxon:* key.

1 Like

Personally, I’m not a big fan of splitting the genus from the species. The species is a binomial name, it always consist of two parts. Splitting the species from genus would be make no sense, because the taxon:species(=angustifolia) tag would be unusable without the taxon:genus(=Fraxinus) tag.

The same goes for subspecies, cultivar, form, etc. There is no point to split each of those up into their own tags, because they all depend on each other. It’s only usable when they form a complete taxonomic name.

That’s why I’m still suggesting to go for these two options:

  1. Whole taxonomic name in the species= tag (e.g. species=Fraxinus angustifolia subsp. oxycarpa ‘Raywood’).
  2. Whole taxonomic name in their own dedicated taxonomic rank tag (e.g. taxon:cultivated_species=Fraxinus angustifolia ‘Flame’).

Afaict that’s for human consumption, so you can always preprocess like this
<taxon:genus> <taxon:species> to get Fraxinus angustifolia or even F. angustifolia.
On the other hand, it becomes much easier to search for all trees of genus Fraxinus which may not be the case if someone decides that F. angustifolia is a valid tag value.

But the other way around is useful? It’s not ideal, better would be a taxon:wikidata tag. That would also include left out classes like family, order, and kingdom (though kingdom can probably be inferred from natural=tree).
And arguably it’s quite easy to check that every taxon:species also has a taxon:genus.

And already, you show the issue with this tagging. Above, you used 'Flame', here you use ‘Flame’, making parsing incredibly difficult.
Edit: OK, the difference isn’t visual in my answer, because they get turned into the same quotation marks. Still, there are different types of quotation marks that cause issues within other tags, and here we have an opportunity to prevent that.
Second Edit: Using code blocks makes it easier to see.

I’m not actually against this tagging practice, as long as it’s clear that its use is purely for human consumption, i.e. basically like description or fixme.

No matter if for human or machine consuption, a machine can preprocess and split in several fields if needed.

I’m not a big fan of this, but it’s the current practice in OSM.

I think the taxon:<taxon_rank>= tag is the best of both worlds. It implies a structure that both humans and machines can read/parse. For example, cultivated_subspecies implies the format <genus> <species> subsp. <subspecies> '<cultivar>'. I’ve added a table of more examples on my personal OSM Wiki page.

This also leaves the purpose of the species tag intact, which is already used almost 2 million times.

The only compromise with this way of tagging is that, if you want to gather e.g. only the genus, you have to search across multiple keys. But as long as they are well documented (like in a table such as on my personal wiki page), that shouldn’t be a huge problem.

I do agree that different kinds of quotation marks are incredibly annoying :sweat_smile:

I don’t like the original taxon+taxon:rank proposal. It just complicates things in my opinion, also people could edit one value without editing the other one creating unnecessary maintenance.

I’m okay with more specific suffixes, but isn’t it what were we doing already?

At the moment it’s basically:

  • Do I know the species? I tag species
  • I don’t know the species but I know the genus? I tag genus
  • I know the species but also the subspecies/cultivar ecc.? I tag species + taxon:suffix=*
  • I read what this three is somewhere but I don’t know what species/genus/cultivar ecc. means? I tag a generic taxon.

I add that I’m for adding the whole name in every tag. So Fraxinus angustifolia 'Flame' and not just 'Flame', Fraxinus angustifolia and not just angustifolia and so on.

2 Likes

I cancelled the original proposal for taxon:rank= in favour of this ongoing discussion.

In my experience, most cultivated species are tagged this way:

  • taxon=Fraxinus angustifolia 'Flame'
  • species=Fraxinus angustifolia
  • taxon:cultivar=Flame

Are you suggesting we should tag like this?

  • species=Fraxinus angustifolia
  • taxon:subspecies=Fraxinus angustifolia subsp. oxycarpa
  • taxon:cultivar=Fraxinus angustifolia subsp. oxycarpa 'Raywood'

Because then I’ll still rather prefer the following, to not duplicate information and still imply the same information:

  • taxon:cultivated_subspecies=Fraxinus angustifolia subsp. oxycarpa 'Raywood'

What I see is a lot of duplication. E.g. what happens if there is a cultivated variant?
The base structure needs to be simple from a data perspective and that clashes with complex value syntax. Then we can add more tags that simplify things.
Otherwise we get stuff like Node: 4944112209 | OpenStreetMap. What exactly is Magnolia sp. as a species? I can only assume that Magnolia virginiana - Wikipedia as the type species is meant.

This is an example of what I envision:

Future tag Description Comment
taxon:cultivar Only contains the cultivar part of the taxonomic name if applicable Structured
taxon:form Only contains the form if applicable Structured
taxon:genus Only contains the latin genus Structured
taxon:species Only contains the latin species identifier without genus; Requires taxon:genus Structured
taxon:subspecies Only contains the subspecies if applicable Structured
taxon:variant Only contains the variant if applicable Structured
taxon Full taxonomic name to the knowledge of the mapper; May be a common name Unchanged
genus Only contains the genus; May be the genus’ common name Unchanged
species Species’ name including genus, as precise as possible; May be a common name Unchanged
cultivated_species Identifies the plant as a cultivated species. Human readable taxonomic name New tag
cultivated_subspecies Identifies the plant as a cultivated subspecies. Human readable taxonomic name New tag

taxon:* may include the common name within the taxon:<lang>:* namespace as additional information.

That’s my understanding as well. Are there written down rules of how to use them to achieve some sort of standardisation?

I’m okay with any species + taxon:suffix=*, so I’m fine with this as well. This creates duplication, but imho species is the main value that should always be present.

1 Like

Then we’ll add a new taxon_rank to the table called cultivated_variety. So taxon:cultivated_variety=* stands for [genus] [species] var. [variety] ‘[cultivar]’.


Our goals are the same: a structured way to tag (infraspecific) taxonomic names. But then the question is, do we structure them by splitting the name up into multiple smaller tags like this:

  • taxon:genus=Fraxinus
  • taxon:species=angustifolia
  • taxon:cultivar=Flame

Or do we structure the tags themselves, like this:

  • taxon:cultivated_species=Fraxinus angustifolia 'Flame'
    Where cultivated_species stands for [genus] [species] ‘[cultivar]’.

I think we both have a preference; I’m curious to hear what others prefer.

Cast your vote! See for examples the reply above this one.

  • Structuring by splitting name into smaller rank-specific tags
  • Use structured tags
  • None of these / Other
0 voters

Elaboration of your vote would be appreciated.

While I understand your concern, I think a very clear and consistent scheme would work a lot better. Especially with the use of ‘subsp.’ which is something we don’t want to do.

1 Like

By the way, it isn’t semantically correct, but I know I’m not alone in treating species=* as a general slot for any species-related information I can get my hands on, whether it’s a binomial, trinomial, or genus, or whether it comes with a variety or cultivar. This is all for human consumption. After all, how would we even attempt to structure a hybrid across multiple subkeys? Semicolons would risk confusion with a less specific, mixed-species feature.

As far as I know, the main reason we started using scientific nomenclature in species=* was to avoid confusion among common names in various languages, not out of a need for machine readability. species:wikidata=* is the key that really matters for anyone who needs machine readability. Or I suppose taxon:wikidata=* if it’s important enough to avoid the impression that we’re only tagging binomials, but I don’t think laypeople have that impression, based on all that’s in species=* these days…

1 Like

I support splitting into structured tags as proposed by @Jofban, since they’re less likely to be misused. But the Wiki for the species=* tag should be updated to state that infraspecific names are allowed there and that it’s only intended for human readability. The only thing I disagree with is to also allow common names inside the species=* & genus=* tag, we have species:en, genus:en=* & taxon=* for that purpose.

1 Like