Identifying Importable Datasets From Sheffield City Council Open Data

The Sheffield City Council Open Data website contains a lot of datasets that are the kind of thing that would be excellent to import into OSM and I wanted to begin the process of identifying which datasets would be suitable for an import into OSM and classify them in terms of priority.

High Priority (Direct Import Potential)

Dataset OSM Utility & Drawbacks Potential Associated Tags License
Benches on the highway network Excellent for micro-mapping infrastructure. Misses benches on private property unfortunately. amenity = bench OGLv3
Street Lights on the highway network Huge time saver for mapping street lights. Worried about alignment with road network though. highway = street_lamp OGLv3
Public Toilets Huge amount of incredibly useful data. amenity = toilets OGLv3
Litter Bins on the Highway Network Highly useful for micro-mapping. amenity = waste_basket OGLv3
CCTV and Traffic Cameras Locations of cameras. man_made = surveillance or highway = traffic_signals + camera:type = * OGLv3

Medium Priority (Import or Tracing/Verification Potential)

Dataset OSM Utility Potential Associated Tags License
Libraries Unlikely to need an import as there aren’t many in the data set but still excellent for verification purposes. amenity = library OGLv3
Bring Recycling Sites Locations of recycling facilities (e.g., bottle banks). amenity = recycling INSPIRE
Household Waste Recycling Centres Locations of waste management services. amenity = recycling INSPIRE
Parks & Countryside Service Sites Boundaries of parks, woodlands, playgrounds, and sports sites. Useful for checking and improving existing boundaries though will need special care. leisure = park, landuse = wood, leisure = pitch No
Historic Parks And Gardens Boundaries for historic sites. Tags: leisure = park, historic = yes OGLv3
Conservation Areas Boundaries of protected historical/environmental areas. Useful for context, but usually mapped as OSM relations. Tags: boundary = protected_area, protection_title = Conservation Area INSPIRE
Ancient Monument Locations of protected heritage features. Tags: historic = monument OGLv3
Listed Buildings Locations of Listed Buildings listed_status = * OGLv3
Local Nature Reserve Boundaries of protected nature sites. leisure = nature_reserve, boundary = protected_area No
Public Rights Of Way Working Copy Paths that should be mapped as rights of way. Caution is needed as it’s a working copy and not the Definitive Map. highway = path, designation = public_footpath, designation = public_bridleway OGLv3

License

Some of the data is under the INSPIRE license or doesn’t have a license listed. This will need contact with the council to determine if the data can be re-released under OGL or otherwise licensed for OSM. Everything under the OGL is ready to go from the looks of things but will still need care when importing as well as attribution.

Feedback, Help and Discussion

From here I now need feedback on this list of data sets as to where the community priorities lie in Sheffield as well as if I’ve missed any important data sets from the Sheffield City Council Open Data.

It would be a massive help as well if more experienced OSM members were willing to lend a hand evaluating the import potential and problems associated with each dataset. Feedback and discussion is greatly appreciated!

1 Like

I used the grit bin data a while ago, it was good. I considered using the waste bin data, it was questionable around The Moor, so I wasn’t sure on the age of the data. I can’t remember if that was the show stopper, or the technical, or because I moved away.

2 Likes

The grit bin data has been incredibly useful from my experience surveying round Sheffield using street complete! Thank you for the work you put into that! I want to see if it’d be possible to set up a semi-automated import from the live data for grit bins at some point in the future.

As someone with experience doing imports in Sheffield what would be your recommendation for evaluating these datasets for import viability?

1 Like

Many of these datasets will probably need some form of manual conflation, rather than just a straight import. But it would be great if people can make use of them. Any sort of conflation/matching and ongoing maintenance is much easier if the is a suitable ID for each object in the dataset. It’s also helpful if the datasets are complete (in the sense that they include all objects of a certain type in OSM in that area) for then you can look for unexpected objects in OSM as well as missing ones.

If anyone is interested in using the Public Rights of Way data, do let me know, as I’d be happy to add it to my comparison tool at https://osm.mathmos.net/prow/progress/ if it would be useful to someone.

2 Likes

“live data” do you just mean update the data, or is there something more available?

It would be interesting to compare current OSM to new data, see where OSM has changed since then, if the council has been adding adding, removing or moving them.

Defiantly check the data out before doing anything, check out your local ones, see if they are accurate in terms of up to date or geospatially. Consider each dataset totally different, they may be made by different people, made years apart, have different attributes.

Some of this data wont be good enough, sometimes its just part of the dataset, sometimes its the whole dataset. That’s fine, move on, don’t be a completionist.

Just looking at the “Public Toilets” dataset, it is so small (41 locations) that you should do it all manually. However I spot checked one (100052216711) it seems like the toilets are long gone. The metadata says “20 February 2025”, but that’s when the file was uploaded, the data must be old. This is pretty typical, my new council has a toilets dataset that is really outdated too. Do manual surveys/open notes, but don’t trust the dataset.

The “Litter Bins on the Highway Network” dataset has a “startdate” attribute, presumably the new ones will be way more trust worth than older ones, but check it out. Have a survey and see.

2 Likes

Oh nothing actually real-time but some of those data sets are supposedly updated nightly so monitoring them for changes and updates and writing some automation to identify any changes made and generate a change-set that can be reviewed and then pushed to OSM so it doesn’t become an import that needs re-doing every time the data changes enough.

And ty for all the tips! Will publish updates here when I get round to surveying the accuracy of the different data sets and absolutely need to keep my perfectionism in check.