Following the discussion in this post
I’d like to suggest an import of data for german wind power plants from the "Markstammdatenregister” (MaStR) data set, which was approved as data source for OSM 2023. The import is currently in the planning stage and I’ll link the proposal later on (probably tomorrow or so). Basically the dataset contains every detail of every generator/power plant that feeds energy into the german power grid.
As the dataset is really large and rather complex I’d like to limit myself on improving data quality for already mapped wind power plants onshore.
I’ve written some custom software to download and process the data before importing.
The idea would be to first import the ref:mastr (uniq id of all generators in DE)
based on matching already mapped OSM objects and confirming these with some existing tags where possible. After that importing of missing tags like power values, height etc will be very easy either automatic or by hand. The goal is not to cover everything, but to improve data quality where it’s reasonably “easy” to do so without to much headache. Improving already existing tags e.g. with minor differences will probably need further discussion later on.
From ~29k currently active wind turbines (>300kW) in the MaStR and about ~31k mapped wind turbines in OSM, I was able to generate ~26.5k matches within 50m based on coordinates. Then I looked at how many of these matches can be confirmed with existing tag, e.g. the ones which - I think - could readily be imported without conflicts:
- ~ 3k based on turbine model
- ~ 3.6k based on rotor diameter
- ~ 3.5k based on hub height
- ~4.5k based on power values in kW
- ~6k based on power values in MW
- ~2k based on start_date
- all of the above, minus duplicates: ~12.6k
- ~12k based on manufacturer
- all of the above minus duplicates: ~15.7k
- without any ~11k
These numbers are with strict identity matching, which I am working on relaxing for some of the tags like e.g. model, which should improve numbers significantly. Matching on manufacturerer only I’d like to avoid where possible because that might not be very robust e.g. with repowering.
There are cases where OSM tags are formatted in non-standard ways, like date or wrong seperators in power values etc. And then there are many different spellings of the same manufacturer. I’m currently looking at how to "correct or rather improve these either manual or automatic before importing, depending on the impact it might make on confirming matches.
ref:mastr seems to increase in use recently and in preparation I’d also rename where ref:MaStR was used instead.
The code is available on Github. This includes some generated maps where the matches (from the above example) can compared visuallly to OSM data. Feel free to have a look at these maps, the code or some other part of the data set and report any issues. And of course I’ll very much welcome help with this import later on, too.
If there are some questions already feel free to aks, such that I can include it in the writeup or we can discuss here.