[Massive edit] Fixing typos and other errors in species=* for trees

Hello everyone,

I’m planning to fix some typo errors in species tag, and, since it’s quite a massive edit, I would like to get your feedback before I proceed.

The background:

It’s not a secret that species names are not particularly consistent, but I’ve found very popular values, which are mainly merely typo errors.

There are some more cases, with count<500, but let’s start with ones described above.

Why this matters:
Tags should be consistent and useful for data consumers (Including myself, I would like to use species tag in the UrbanEye3D for trees).

How I plan to do it:
Osm files with the necessary changes will be created by a python script, and then uploaded back to OSM via JOSM.

When I plan to do it:
The change will be applied on Thursday, May 14.

If anyone has suggestions, concerns, or objections, I’d be happy to hear them before I start.

Best regards,
Zkir

13 Likes

No objections. I made a MapRoulette challenge to fix species tags, but most values could be fixed with a mass fix, as you are proposing.

1 Like

Another easy mass fix could target species tags with one-word values. If that value appears in this list, then species should either be changed to genus if missing or simply removed otherwise.

For example. we currently have 406 instances of species=Prunus

We also have values in the “genus + sp.” form. The logic would be the same: move the value to genus (without the “sp.” suffix) or delete. For example, we have 531 instances of species=Prunus sp..

We even have 411 trees tagged as… species=tree :face_with_crossed_out_eyes:

1 Like

@ivanbranco thank you for the quick reply :slight_smile:

The map roulette challenge is a good thing indeed, and it may serve us yet, but most obvious offences can be fixed in one go :slight_smile:

The one-worders are obvious mistakes, but it may be not that easy to fix them [all] automatically.
The first random example I’ve encountered is:

genus=Prunus
taxon=Prunus cerasus
natural=tree
produce=cherry
species=cerasus
leaf_type=broadleaved
leaf_cycle=deciduous

is it safe to just delete species tag in this case? Seems it should be rather changed to species=Prunus cerasus …

We even have 411 trees tagged as… [species=tree]

I guess I can just delete species tag for those ones :slight_smile:

1 Like

Same with me. The tagging of genus/species of trees is a bit messy and everybody doing some corrective work there is highly welcome!

+1 but one has to be aware that also that one word often is wrong. I already have manually corrected things like species=Eiche or species=Linde and the like.

1 Like

EDIT: Oooooh, I see now. Cerasus is a genus as well :upside_down_face:

Then the workflow could be:

  • species is a single word, and its value is identical to the genus value. Fix :white_check_mark:
  • species is a single word, and the genus tag is missing. Fix :white_check_mark:
  • species is a single word, but its value differs from the genus value. Do not fix, manual check required :cross_mark:
1 Like

I support this :slight_smile:


Also maybe worth deciding on this:

TL;DR: For a Pacific Sunset maple (Q127947609)

1 Like

good idea. One remark

What about changing the species-tag but adding a species:en=*. The english name isn’t wrong. Simply overwriting is deleting valid information!

that may be valid, but is not a typo fixing

can you change the thread title?
I also suggest to move it species:en rather than remove

also not a typo fixing at all
would it be removed only from natural=tree objects?

The total instances of that tag are 418. 411 of those are natural=tree, the number Zkir listed.

3 Likes

Sure, only natural=tree objects are affected.

I also suggest to move it species:en rather than remove

Ok, if you guys think that species:en=Linden Littleleaf make sense, I will do it that way.

The initial post updated.

3 Likes

To my knowledge the correct british english term for Tilia cordata is “littleleaf lime” whereas “littleleaf linden” is american english.

2 Likes

Given that these values all seem to exist in a larger number, has it been assured that whatever/whoever generated the values has been corrected?

2 Likes

assuring this is impossible, for example Node History: 3683985924 | OpenStreetMap is part of species=tree cluster added over decade ago in something what looks like HOT-mapping or similar activity, see overpass turbo

it is impossible to assure it will not happen again, and “HOT mappers do bad imports/add weird unneeded or wrong tags” has not been solved

I guess that I can add them to Dubious tags, MK list | Projects | OpenStreetMap Taginfo but it also does not really assure anything as thousands of dubious one are known, more than enough for people interested in cleanup

making this proposed edit a recurrent bot edit is also an option (I can schedule those, at least for near future)

investigating sources of these tags may be worth doing though, is there some obvious source of mistakes - but it can be also done after cleaning current ones and checking later is any new case appearing

4 Likes

Hmm, I guess we also should decide on the formatting of the (genus|species|taxon):en=* tags for trees - I have been using US English names in Title Case, ex:

    <tag k="species" v="Tilia cordata"/>
    <tag k="species:en" v="Littleleaf Linden"/>
    <tag k="species:wikidata" v="Q158746"/>
    <tag k="species" v="Fraxinus angustifolia"/>
    <tag k="species:en" v="Narrowleaf Ash"/>
    <tag k="species:wikidata" v="Q518949"/>
    <tag k="taxon" v="Fraxinus angustifolia ‘Flame’"/>
    <tag k="taxon:en" v="Flame Narrowleaf Ash"/>
    <tag k="taxon:wikidata" v="Q128601714"/>
Additional examples
    <tag k="species" v="Acer rubrum"/>
    <tag k="species:en" v="Red Maple"/>
    <tag k="species:wikidata" v="Q161364"/>
    <tag k="species:wikipedia" v="en:Acer rubrum"/>
    <tag k="start_date" v="2006-05-15"/>
    <tag k="taxon" v="Acer rubrum ʽOctober Glory’"/>
    <tag k="taxon:en" v="October Glory Red Maple"/>
    <tag k="taxon:wikidata" v="Q110765852"/>
    <tag k="species" v="Fraxinus americana"/>
    <tag k="species:en" v="White Ash"/>
    <tag k="species:wikidata" v="Q1193369"/>
    <tag k="taxon" v="Fraxinus americana 'Empire'"/>
    <tag k="taxon:en" v="Empire Ash"/>
    <tag k="taxon:wikidata" v="Q115916809"/>
    <tag k="taxon" v="Acer truncatum × Acer platanoides"/>
    <tag k="taxon:en" v="Pacific Sunset Maple"/>
    <tag k="species" v="Acer saccharum"/>
    <tag k="species:en" v="Sugar Maple"/>
    <tag k="species:wikidata" v="Q214733"/>
    <tag k="taxon" v="Acer saccharum subsp. grandidentatum"/>
    <tag k="taxon:en" v="Rocky Mountain Glow Sugar Maple"/>
    <tag k="taxon:wikidata" v="Q15286462"/>
    <tag k="species" v="Acer × freemanii"/>
    <tag k="species:en" v="Freeman Maple"/>
    <tag k="species:wikidata" v="Q9577378"/>
    <tag k="taxon" v="Acer × freemanii 'Armstrong'"/>
    <tag k="taxon:en" v="Armstrong Freeman Maple"/>
    <tag k="taxon:wikidata" v="Q18471079"/>

See: Overpass Turbo - Trees in Seattle, WA, US with (genus|species|taxon):en=* tags

(I am happy to change both the formatting and special characters to whatever is decided by consensus - just let me know :slight_smile:)

2 Likes

@Lumikeiju, I am afraid that special characters is not a thing that can be decided. No any standard requires ‘single quotes’ to be curly :slight_smile:

regarding multiplication sign there are currently examples of everything:

And it’s not the thing I personally would like to enforce, at least right now.

maybe Platanus xacerifolia is worth changing to Platanus × acerifolia :slight_smile:

If you don’t mind, may I add another set that should be replaced?

It’s type=palm on palm tree nodes. type=* has been reserved for relations; thus, they should not be present on nodes or ways. The wiki’s replacement tag is taxon=Arecaceae. Thanks!

2 Likes

that should be a separate thread as it is not changing species= at all and should not be assumed to be liked by people who reviewed this thread already

6 Likes

Lets do it as a next iteration, after I commit changes already discussed here (planned for tomorrow).

2 Likes

I didn’t saw this kind of fix in the main post: Node History: 1136100349 | OpenStreetMap

As I wrote, imho it would be better to just move those values to genus.

So in this case:
species=Malus unidentified speciesgenus=Malus
instead of
species=Malus unidentified speciesspecies=Malus sp.

See also this related discussion where I go more into detail.