Sorry haven’t read all the replies as I’ve been looking at the data. Broadly it looks good:
Basic values are roughly in expected ranges (but note 0 in the pflanzejahr field):
Suggestions of some outliers, which need to be checked (circumference cf. height):
More detailed breakdown with potential outliers (sorry graph labels are slightly offset):
Data used as CSV, imported into R for some basic cross-checks. I think actual geographical location check has been mentioned by @milet). An existence check should be carried out on a sample (at least 200 trees I’d suggest) and involve basic checks that the other values are correct. On the Birmingham tree dataset I did this for just under 200 trees or 1% of the total and the basic error rate was around 6%. The oddest one being a few dozen trees which did not belong to the city, but most were trees having died been replaced, or removed. There were also species errors (Turkish Hazel, a tree as ordinary Hazel (Nussbaum)), but these were mainly obvious on inspection.
Remarks on tags:
Keeping genus is extremely useful for general use, and as @streckenkundler remarks is likely to be reliable when species may be incorrect. In my experience this is particularly true with rare/unusual trees in a genus which were not familiar to the person entering the data, who then assigns them to a common species.
I would retain sorten and follow the convention used in the Vienna tree import of taxon:cultivar='Globosum'
for sorten_latein='Acer platanoides 'Globosum'
etc. Many street trees will be chosen from particular cultivars for aesthetic or practical reasons and some (e.g., Populus nigra ‘Italica’ are familiar to most people. The entire construction can be placed in the taxon
tag if desired.
A few remarks about taxonomy of the trees (not exhaustive):
- Sophora is an old name and the current generic name is Styphnolobium
- Sorbus might be problematic (although I don’t think so) because it was split recently into 7 different genera. Although widely adopted in the UK, at least, the experts believe this change is premature.
- Genera which are mainly shrubs should be checked, examples Cornus, Corylus, Euonymus, “Sambucus” possibly Rhus (Essigbaum). Salix may also include shrubs
- Mischbestand (fortunately only 1) and those with genus absent need to be treated with caution. In the UK one often finds things like “new tree”, “empty pit” in these entries.
For wikidata
tags I would recommend using one corresponding to the actual taxon entered (e.g., if Sophora is retained use wikidata for Sophora japonica). Wikidata handles current synonymies reasonably well, and this allows better cross-referencing to original data. This can be important because at a national level the accepted scientific name of a species may be different to that advocated by places like wikipedia, IPNI or iNaturalist. Often there are good reasons for this, and it helps anyone used to the regular names in floras (e.g., Rothmaler) and other works.
Hope this helps
PS. I’ll look at species & cultivar values and add anything else which crops up in this thread.