Fixing an old `natural=tree` import in Bolivia (circumference in wrong unit)

Hello everyone,

I’m planning to fix up a data issue in Bolivia and would like to get your feedback before I proceed.

The background:
Quite a while ago, user rodolfovargas imported 10,000 trees (natural=tree ) in Bolivia. During that import, the circumference tag was set in centimeters, but the unit of measurement was not specified.

According to the OSM wiki, the value for circumference should be in meters or, according to the common practice the unit should be explicitly stated (e.g., circumference=250 cm ).

Currently, all 10,000 nodes still have the original values (e.g., circumference=250 ), which are clearly meant to be centimeters.

What I plan to do:

  • For 9,999 of these trees, I will divide the value by 100, thus converting centimeters to meters, like this:
    circumference=123circumference=1.23
    circumference=250circumference=2.50

  • For 1 tree, the value is circumference=0 . Since infinitely thin trees do not really exist, I will delete this tag entirely.

Why this matters:
Besides the fact that the current tagging is simply incorrect, these unrealistically thick trees are distorting statistics. I need accurate statistical data to estimate height of trees in a JOSM plugin I’m developing (UrbanEye3D).

Fixing these values will greatly improve its functionality and the overall data quality in the region and tree species statistics worldwide.

How I plan to do it:
I’ve written a simple Python script that:

  1. Reads node IDs from an OSC file of the import in question.
  2. Fetches the current version of each node from a freshplanet.osm file .
  3. Compares the current circumference value with the imported one.
    If they are still the same (i.e., no one has edited them since the import), the script applies the changes described above.

As of now, all 10,000 nodes still carry the original values.

If anyone has suggestions, concerns, or objections, I’d be happy to hear them before I start.

Best regards,
Zkir

10 Likes

Why not circumference=250circumference=2.50?

5 Likes

Obviously, it’s an option too.

In my opinion, however, the most straightforward way to fix this import is to add unit (it was missed that time, and now we are adding it). There are already a lot of values of circumference with unit (like this), so we do not change anything to better or worse.

Also, if the original author wanted centimeters, why not :smiley:

If the precision is 10 cm, circumference=2.5 would be another option.

There are “only” 6.859 trees with cm in the circumference value out of a total of 958.752 trees with the circumference tag (0.71%).

In my opinion, keeping plain numeric values makes the data easier to use, especially for less experienced users like me who use tree data, since the values are already normalized and don’t require additional parsing or unit conversion.

That said, this is just my opinion.

It doesn’t seem to be the case for this import. I see values like 164, 324, 249, etc. so it seems to have precision of 1 cm. I would keep the extra 0.

6 Likes

Or just 2.5?

We would lose information about the original resolution. It would be like converting 245 to 2.4 instead of 2.45.

Cm precision, really? How often do you intend to update the data? Nevertheless, OK to keep the initial “precision”.
249 means 1/1000 precision, impressive especially for living matters.

The values of all tags have a “.0” that appears to be an artificial precision.

Sample: Node: 8955437596 | OpenStreetMap

1 Like

Yep, the “.0” is artificial, I think we all agree on that. I also notice it appears in height and ele. If those are artificial as well, they could be removed, because they give the false impression of higher precision than actually recorded.

What I was referring to is the part before it: for example, this tree has circumference=329.0. I suggest converting it to 3.29 rather than 3.2 to preserve the original resolution.

It’s not unusual for trees. The Italian Ministry of Forests uses cm precision for monumental trees, and those are quite large.

I don’t intend to update the data (I’m not a local), and the dataset is at least four years old, so the trees have grown in the meantime. But if we’re keeping the original data, I see no reason not to preserve their original resolution[1]. That’s all I’m saying.


  1. Here an example from the Paris import ↩︎

2 Likes

In my opinion, whether 1% or 10% or 100% of values have explicit units makes no difference for ease using the data – you must support them for your application to work correctly.

And personally, I would prefer if mappers added explicit units more often. This mistake would likely not have happened in the first case if doing so was common practice. And neither would many similar mistakes have happened – it’s quite common that people make wrong assumptions about the default units for circumference=* (defaults to m) or diameter=* (defaults to mm).

5 Likes

well, it is fine to just throw away any malformed data, there is no mandate to parse every value

(what counts as malformed depends on specific data and data user)

3 Likes

Hola!
Les escribe el autor de esos 10000 árboles cargados en OSM. Efectivamente el “problema” es tan solo de unidades métricas. Es correcto, el diametro cargado en OSM está en centímetros “cm”, pero tienen que ser transformados a METROS (1 m = 100 cm). Evidentemente por la cantidad de datos cargados, la operación tiene que ser de manera automática, aplicando código Python. Les agradecería de corazón que los especialista en base de datos y maneo de código puedan aplicar esta transformación.

Quedo atento a sus cordiales comentarios.

4 Likes

Hi Rodolfo, thank you for your input.
Since there are no objections from your side, I will proceed with the fix on Monday.


Folks in the comments above have convinced me that the more proper way is just to convert centimeters to meters, without adding unit. I’ve updated the post.

Current statistics of used units in circumference for nodes with natural=tree. (JFYI)

No unit : 10 732 075
m: 8 454
cm: 6 705
": 325
inches: 89
meter: 19
см: 13
in: 12
м: 10
ft: 8
inch: 7
': 5
feet: 3
meters: 3
ca: 2
o: 1
estimation: 1
metres: 1
metros: 1
metri: 1

6 Likes

OSM database updated.

There a few more left: I found Node: 8955500908 | OpenStreetMap with “310.0” by searching “10.0” at circumference | Keys | OpenStreetMap Taginfo

It would be good if one could order the values there by numeric value in meters. It might avoid that this remains undetected for so long.

1 Like

Well, it’s another changeset, for another 1959 trees, also imported by @rodolfovargas.

I will process it in the same way.

Perhaps @rodolfovargas could list all of the import changesets to save people hunting for data?

2 Likes

(continuing the OT - mods may want to move all these)

Some of those trees were reverted years ago (DWG Ticket#2021061910000101). Unfortunately, that was a complaint about some specific changesets (106490346 and 106490431). I’ve alerted the perpetrator of the import to the problem via a “message that they have to read before continuing to edit”. If anyone has any changesets showing the same problem by the same user please email data@openstreetmap.org with a subject line of “[Ticket#2021061910000101] More poor quality Bolivian trees (forum)”.

Edit: updated with link to block

2 Likes

@SomeoneElse thank you very much. So el Grupo de Trabajo de Datos de OSM is involved in this case directly, that’s great.