Documenting use of (manually copied bits of) third party data

JeroenHoek · March 12, 2019, 5:36pm

I have a small CC0 (public domain) licensed dataset (monumental trees with plant date and species or genus) for use in a single Dutch municipality. The licensing is clear, permission has been explicitly granted.

The plan is to use this dataset as a layer in JOSM from which trees can be copied into OSM for edits where the existence and location of the trees have been confirmed by the mapper. That is, this is not an automated import.

What are the requirements (beyond getting the licensing right) for us to start using this data? Import/Guidelines seems way too involved for this type of non-automated use of a quite localised third party resource.

I am thinking of creating a page on the wiki that documents this dataset and its application, would that suffice? The JOSM data itself is already documented (by me) here: https://github.com/jdhoek/bomen-leeuwarden-osm (in Dutch).

SK53 · March 12, 2019, 10:44pm

This looks to be around 2000 trees. Is this the complete set of street & park trees in the municipality?

If I were you I’d go & check a sampled subset. In general my experience is that some of these will no longer exist (& of course others will have been planted). One hopes that they are accurately located, but it is as well to check. When I’ve sampled tree register data the following issues have arisen: tree has gone (2-3% of time); tree is inaccurately identified (~5%, often arising for trivial reasons); non-trees exist in the data (stumps, tree pits, planned trees, groups of trees too difficult to survey individually, shrubs which aren’t trees - Hazel, Buddleja, Lilac).

I’m surprised to see young oak trees planted in 1904 labelled as natural monuments. 1 km from where I’m sitting are perhaps a thousand trees of that age, and I can only think of 3-4 which qualify as natural monuments. The oldest of these is around 600 years old. Natural monument trees should be significant at a national level and assigned using consistent criteria (such as those used by The Tree Register in the UK)

Leaf cycle is useful too ( e.g., for rendering, for mappers verifying a tree who don’t then need to be able to identify the species etc). With the exception of Yews (Taxus) & Cedars (Cedrus) most seem to be deciduous.

Species remarks:

Either Platanus hispanica or Platanus acerifolia, not Platanus hispanica (acerifolia). You should use the one of the two names preferred in the Netherlands, which I think is acerifolia, whereas *hispanica *is currently the name used in the UK literature.
Cultivar names are problematic in the species key: e.g., Carpinus betulus ‘Fastigiata’. It is better to populate taxon with this value and keep species for the species Carpinus betulus. For trees with only a genus name & a cultivar name taxon and genus are the appropriate keys.
Adding genus may appear to be redundant but often improves data quality insofar that most misidentifications are correct at Genus. Because the species key often contains lots of values which are not valid specific names parsing species to find the genus is not as foolproof as one may wish.
*Populus *x *canadensis *or Populus canadensis. Former preferred.
A couple of other hybrid trees I noted: Alnus spaethii should be Alnus x spaethii; Populus canescens should be Populus x canescens; Tilia europea should be *Tilia *x europea; and according to my latest botanica source Platanus should have an x too. Probably not too important in the latter cases as these are common trees, but relevant in the first case. The “x” is easy to remove in parsing, but not easy to add. Probably best to stick to names widely used by https://waarneming.nl/ or the most common existing values on OSM.

None of this of course answers how you approach an import, but this volume is an import. Verifying the individual trees takes around a minute each in my experience, so catching issues at this point is notionally valuable even if you have no immediate expectation of any field verification occurring.

JeroenHoek · March 13, 2019, 8:12am

Thanks for the hints with regards to the species naming. The source data contained a number of mistakes already fixed (and reported upstream) by me; this will help improve the data.

As I’ve mentioned above, the intent of this data set is not to import it verbatim. Local mappers intend to use the data from within JOSM to augment their mapping. So when we work on a couple of streets or a small park, we have the trees ready to copy over after confirming their presence. That is, tens of trees at a time, not 2000.

The local municipality considers these monumental. Should I use monument=yes instead?