This thread was originally created for gathering feedback on a proposal. But with discussions came new insights. I’ve cancelled the proposal in favour of the discussion, which (hopefully) will lead to a new kind of proposal.
The original proposal can still be found here. But please join the discussion for a better solution!
The explanation and reason haven’t justified this. taxon= is exactly not meant to be parsed, It’s a method to fill in the taxonomy without ranking it. What’s the problem with using species= , taxon:*= ?
That’s a fair question, we’re having a similar discussion in the Dutch community. Where I’m aming at with this proposal, is infraspecific names such as subspecies and cultivar. I think there is a lacking of consensus over how to tag these. One person prefers to add everything to the species=* tag, the other uses taxon=* in combination with unnecessary infraspecific tags. For example, taxon:cultivar=Flame, says something about taxon=*. That’s a tag saying something about another tag and a tag that is useless by itself, which (I heard) is a thing we should prevent.
So based on that I’m already deviating from my original proposal and was thinking about the following two options:
Put infraspecific names, so everything that comes after species in hierarchy (subspecies, variety, form, cultivar, etc.), inside the species=* tag. Documenting that and deprecate/discourage infraspecific taxon sub-keys like taxon:cultivar=* & taxon:subspecies=*, which don’t hold any valuable information and only say something about taxon=*, in favour of that way of tagging. Like species=Fraxinus angustifolia var. pilosa. Or;
Put infraspecific names in their own taxon:<taxon_rank>=* key. Such as taxon:cultivated_species=Fraxinus angustifolia 'Flame'. This is different from my original proposal for taxon:rank=cultivated_species, because that would be a tag saying something about taxon=*. The taxon:<taxon_rank>=* key is useful by itself.
The first option is less accurate and harder to parse. The second option is more explicit, but therefore making it more complex.
Either ways, I think it’s good to have consensus over which way to go when tagging infraspecific names.
From what I understand, species is basically a shorthand for taxon:species and below, i.e. it says something about taxon. taxon itself is supposed to be reserved for use when the specific subkeys aren’t known; it allows later mappers to categorise it accordingly.
taxon won’t usually be used, but it’s a good fallback as outlined above.
I’d prefer the second way of tagging, and where it is too complex, one can put the (full) name into the taxon key or species information with subspecies and cultivar into species.
E.g. Fraxinus angustifolia subsp. oxycarpa 'Raywood' would become
Full tagging:
Personally, I’m not a big fan of splitting the genus from the species. The species is a binomial name, it always consist of two parts. Splitting the species from genus would be make no sense, because the taxon:species(=angustifolia) tag would be unusable without the taxon:genus(=Fraxinus) tag.
The same goes for subspecies, cultivar, form, etc. There is no point to split each of those up into their own tags, because they all depend on each other. It’s only usable when they form a complete taxonomic name.
That’s why I’m still suggesting to go for these two options:
Whole taxonomic name in the species= tag (e.g. species=Fraxinus angustifolia subsp. oxycarpa ‘Raywood’).
Whole taxonomic name in their own dedicated taxonomic rank tag (e.g. taxon:cultivated_species=Fraxinus angustifolia ‘Flame’).
Afaict that’s for human consumption, so you can always preprocess like this <taxon:genus> <taxon:species> to get Fraxinus angustifolia or even F. angustifolia.
On the other hand, it becomes much easier to search for all trees of genus Fraxinus which may not be the case if someone decides that F. angustifolia is a valid tag value.
But the other way around is useful? It’s not ideal, better would be a taxon:wikidata tag. That would also include left out classes like family, order, and kingdom (though kingdom can probably be inferred from natural=tree).
And arguably it’s quite easy to check that every taxon:species also has a taxon:genus.
And already, you show the issue with this tagging. Above, you used 'Flame', here you use ‘Flame’, making parsing incredibly difficult.
Edit: OK, the difference isn’t visual in my answer, because they get turned into the same quotation marks. Still, there are different types of quotation marks that cause issues within other tags, and here we have an opportunity to prevent that.
Second Edit: Using code blocks makes it easier to see.
I’m not actually against this tagging practice, as long as it’s clear that its use is purely for human consumption, i.e. basically like description or fixme.
I think the taxon:<taxon_rank>= tag is the best of both worlds. It implies a structure that both humans and machines can read/parse. For example, cultivated_subspecies implies the format <genus> <species> subsp. <subspecies> '<cultivar>'. I’ve added a table of more examples on my personal OSM Wiki page.
This also leaves the purpose of the species tag intact, which is already used almost 2 million times.
The only compromise with this way of tagging is that, if you want to gather e.g. only the genus, you have to search across multiple keys. But as long as they are well documented (like in a table such as on my personal wiki page), that shouldn’t be a huge problem.
I don’t like the original taxon+taxon:rank proposal. It just complicates things in my opinion, also people could edit one value without editing the other one creating unnecessary maintenance.
I’m okay with more specific suffixes, but isn’t it what were we doing already?
At the moment it’s basically:
Do I know the species? I tag species
I don’t know the species but I know the genus? I tag genus
I know the species but also the subspecies/cultivar ecc.? I tag species + taxon:suffix=*
I read what this three is somewhere but I don’t know what species/genus/cultivar ecc. means? I tag a generic taxon.
I add that I’m for adding the whole name in every tag. So Fraxinus angustifolia 'Flame' and not just 'Flame', Fraxinus angustifolia and not just angustifolia and so on.
What I see is a lot of duplication. E.g. what happens if there is a cultivated variant?
The base structure needs to be simple from a data perspective and that clashes with complex value syntax. Then we can add more tags that simplify things.
Otherwise we get stuff like Node: 4944112209 | OpenStreetMap. What exactly is Magnolia sp. as a species? I can only assume that Magnolia virginiana - Wikipedia as the type species is meant.
This is an example of what I envision:
Future tag
Description
Comment
taxon:cultivar
Only contains the cultivar part of the taxonomic name if applicable
Structured
taxon:form
Only contains the form if applicable
Structured
taxon:genus
Only contains the latin genus
Structured
taxon:species
Only contains the latin species identifier without genus; Requires taxon:genus
Structured
taxon:subspecies
Only contains the subspecies if applicable
Structured
taxon:variant
Only contains the variant if applicable
Structured
taxon
Full taxonomic name to the knowledge of the mapper; May be a common name
Unchanged
genus
Only contains the genus; May be the genus’ common name
Unchanged
species
Species’ name including genus, as precise as possible; May be a common name
Unchanged
cultivated_species
Identifies the plant as a cultivated species. Human readable taxonomic name
New tag
cultivated_subspecies
Identifies the plant as a cultivated subspecies. Human readable taxonomic name
New tag
taxon:* may include the common name within the taxon:<lang>:* namespace as additional information.
That’s my understanding as well. Are there written down rules of how to use them to achieve some sort of standardisation?
I’m okay with any species + taxon:suffix=*, so I’m fine with this as well. This creates duplication, but imho species is the main value that should always be present.
Then we’ll add a new taxon_rank to the table called cultivated_variety. So taxon:cultivated_variety=* stands for [genus] [species] var. [variety] ‘[cultivar]’.
Our goals are the same: a structured way to tag (infraspecific) taxonomic names. But then the question is, do we structure them by splitting the name up into multiple smaller tags like this:
taxon:genus=Fraxinus
taxon:species=angustifolia
taxon:cultivar=Flame
Or do we structure the tags themselves, like this:
taxon:cultivated_species=Fraxinus angustifolia 'Flame' Where cultivated_species stands for [genus] [species] ‘[cultivar]’.
I think we both have a preference; I’m curious to hear what others prefer.
While I understand your concern, I think a very clear and consistent scheme would work a lot better. Especially with the use of ‘subsp.’ which is something we don’t want to do.
By the way, it isn’t semantically correct, but I know I’m not alone in treating species=* as a general slot for any species-related information I can get my hands on, whether it’s a binomial, trinomial, or genus, or whether it comes with a variety or cultivar. This is all for human consumption. After all, how would we even attempt to structure a hybrid across multiple subkeys? Semicolons would risk confusion with a less specific, mixed-species feature.
As far as I know, the main reason we started using scientific nomenclature in species=* was to avoid confusion among common names in various languages, not out of a need for machine readability. species:wikidata=* is the key that really matters for anyone who needs machine readability. Or I suppose taxon:wikidata=* if it’s important enough to avoid the impression that we’re only tagging binomials, but I don’t think laypeople have that impression, based on all that’s in species=* these days…
I support splitting into structured tags as proposed by @Jofban, since they’re less likely to be misused. But the Wiki for the species=* tag should be updated to state that infraspecific names are allowed there and that it’s only intended for human readability. The only thing I disagree with is to also allow common names inside the species=* & genus=* tag, we have species:en, genus:en=* & taxon=* for that purpose.