alltheplaces.xyz - (Businesses/Opening Hours/contact attributes). Also discussed a number of times, with careful guardrails, some data in some cases is useful when coupled with human review. Solves a bit of âwhere is an Xâ and âwhen can I go to X?â.
Building footprints - Microsoft/RapID for example. The main use case is 3D maps; or in disaster response (ie: HOTOSM mapping buildings for ebola etc)
LIDAR measurements of building height; much less widely available but often collected by aerial mapping companies and occasionally permissively licenced by local governments.
Public amenities or facilities - ie; council or state governments publishing public parks, forestry, playgrounds, fire stations, police stations, bike parking, public toilets or even public pinic/bbqs
GTFS transist data from a variety of agencies is generally useful, though OSM tends to lag
Pushing this further; whatâs the next level of data that could be inferred from open data and made easier for a human to review, validation and add; vs the value it provides?
Mapillary doing street sign image recognition (mixed results, but actually useful with a human in the loop using it simply as a detection signal; adds speed limit data or a stop sign with the embedded tools in ID)
There are plenty of worldwide or local datasets that would deserve to be compared with OSM in the view of gardening our own lawn. Anyone can raise needs here, the list could be pretty long.
I wanted to advocate for a methodology about processing these situations to ensure quality on the long run.
Whatever the dataset can be, we need a systematic QA approach as to grow together. Osmose QA is my preferred automation tool chain for that. It already provides a reviewed and shared framework to deal with external datasets and get warnings about what is missing in OSM, without going for full import but to encourage individual review of each warning.
Globally I guess, the highest-value yet still incomplete data for map consumers includes accurate and comprehensive addresses, detailed road networks with surface and condition information, up-to-date business listings, pedestrian and cycling infrastructure, indoor maps, and reliable opening hours and accessibility details. But we already know all that.
Isnât the real ânext levelâ simply the data that already exists in public databases but hasnât been released under an OSM-compatible license yet - like complete address datasets - so the real innovation would just be finally being allowed to use whatâs already there?
A complete map?
Updates. Updates, oh and: Updates. How to ensure quality control, preventing vandalism, outdated import-data and license things.
Automatic conflation checks, rollback plans, monitoring, reviews, documentation and and and.
In my opinion, anything that is difficult to achieve with human-scale best efforts should be kept elsewhere, not in OSM. Because if it cannot be added by humans then likewise humans will not be able to sustain it, to keep it accurate. We should make it easier for people to âmix inâ third party sources when they render maps, instead of mixing these third party sources into OSM.
The task of doing these comparisons or value-adds is best placed in the end-user mapping application. We can think about how to make data joins easier, but the selection of which data joins to do isnât something that OSM itself should be spending too much brain power on.
With the caveats stated by others, Iâd strongly encourage this brainstorming exercise at a more local level. Many of us are fortunate to hail from localities where there are half-decent local datasets for a variety of themes, often published by government agencies under licenses that are suitable (or almost there, just need to ask). A local community can decide priorities relevant to them and their neighbors, that will motivate them to diligently clean up and maintain the data, working towards making OSM the gold standard for coverage of that feature type in that locality. Involving local community members early in the planning process will hopefully make them more invested in the outcome.
In this global forum, we can trade notes about strategies that have worked well in our communities. For example, thanks to the particular datasets available in my locality, we were able to import buildings and parcel-derived addresses, then use that coverage as a foundation for importing points of interest. Based on a foundation of imported streets and sidewalks, we were able to map crosswalks, bike lanes, and bus stops in larger numbers than we wouldâve been able to otherwise. The procedures and code for each of these initiatives is available as open source, potentially enabling other communities to benefit from our experience.
External datasets can be valuable at a broader scale too, but the global community has less control over specific outcomes. In fact, national or global bulk imports could easily preempt more careful local imports by burdening local, less resourced communities with the difficult task of conflation. So at that scale, external datasets are more valuable for less invasive purposes, such as gauging completeness and flagging discrepancies.
One of the main draws of OSM for data users is that itâs a single global dataset that does not require puzzling together hundreds of local sources from various open data portals. While a strategy of letting others do the mixing allows us to avoid dealing with that challenge ourselves, it makes OSM less attractive than it could be. Thereâs also the issue that integrating many of these datasets will require at least some manual work.
While itâs not an easy problem, I think the OSM community would benefit from improving tools to make it easier for mappers to use external sources. That way, we would raise the bar of whatâs possible to achieve with human-scale best efforts. Much like how the availability of aerial imagery means that each mapper can now achieve more than when all they had were GPS tracks.
And, of course, these tools and processes should ideally be reusable to avoid each local community having to roll their own.
A word of caution: many mappers, including me, have a tendency to import objects and details just because they are available. A possible (future) use case is very easy to conceive for practically any dataset. Currently import and conflation is a PITA, which prevents a lot of dataset hoarding. Making import and conflation easier will probably also increase the prevalence of this kind of data in OSM.
Maybe itâs important to think about the âadd value to the mapâ part.
One of the main draws of OSM for data users is that itâs a single global
dataset that does not require puzzling together hundreds of local
sources from various open data portals. While a strategy of letting
others do the mixing allows us to avoid dealing with that challenge
ourselves, it makes OSM less attractive than it could be.
I am thinking of something like the MS building footprints. The dataset
is too low-quality to âjust import itâ, but some commercial providers
have chosen to render those buildings into their maps nonetheless.
With vector tiles it would be relatively easy for someone to publish a
vector tile dataset that has âall MS building footprints that do not
intersect with an OSM buildingâ, allowing someone who builds a web map
to integrate these buildings into their map with just a line of code or
two. The work required to publish this âadd-onâ dataset is a fraction of
the work required to make a good import, and the existence of such an
âadd-onâ dataset would relieve the pressure that drives some people to
drop our quality standards and import low-quality building outlines
wholesale in their region.
I am sure similar approaches are possible elsewhere.
While itâs not an easy problem, I think the OSM community would benefit
from improving tools to make it easier for mappers to use external
sources.
As long as it doesnât make it so easy that it essentially becomes a
cloaked import (and when challenged, the puzzled mapper says âI just
clicked on the big flashing âimprove the mapâ button in my editorâ),
thatâs fine with me.
While we see a lot of potential for accelerating and improving power mapping via QA as @infosreseaux suggested, automation and tooling, I agree with @woodpeck that we should not bulk upload data to OpenStreetMap without the human capacity to maintain it. The high quality and global coverage of the transmission grid and power plants was only possible thanks to the human-in-the-loop approach.
This does not mean that we cannot empower these mappers with better tools, training and âhintâ datasets, but ultimately, a human should validate and create the data using open imagery and public verifiable information as a source. What is really needed is human capacity building on the mapping side.
Unfortunately, this is becoming increasingly difficult as many false prophets are telling people all around the world that AI will automate most software and data work. Much of todayâs impressive AI technology would not be possible without access to large amounts of high-quality open data and open-source resources. However, these resources are becoming smaller and smaller over time because most AI use cases do not support the resources they exploit.
One area in which mappers could be given more power is the mapping of rooftop and ground-based solar farms. As part of MapYourGrid, we have identified multiple AI-generated datasets around the world. Most of these AIs used OpenStreetMap data to train their detection algorithms. Feeding this data back into OSM could create significant value, as in many countries, the rapid growth of rooftop solar and solar farms has created a situation where grid operators and even governments donât know what is actually out there, making energy planning extremely challenging. This also includes use cases like the solar nowcasting @CloCkWeRX mentioned. Here some data we have found that still needs to be harmonized:
The challenge for us is to help data consumers appreciate the value that our data brings beyond consuming that external dataset wholesale. For example, Mapbox used to include OSM address data in the U.S. but later switched to some other dataset, probably proprietary. At just that time, my local community was carrying out a very careful import of addresses that avoided the systematic errors founded in other datasetsâ coverage of the area, including Mapboxâs OSM replacement.
To us, it felt kind of like dumping OSMâs streets in favor of TIGER. Yuck! But whoever made that decision wasnât thinking about one city. They wanted the assurance of coverage everywhere in the country, even rural areas far away from any local mapping community, even if it meant inferior coverage in some urban areas. OSM didnât have enough of a critical mass of address coverage to justify figuring out address conflation (which is harder than building conflation). OSM does have that critical mass of buildings, but we wouldnât if we had been more conservative about building imports and tasking manager projects. We know that our value goes beyond the raw numbers, but it can be difficult to make that case to developers and product managers who are focused on their own selling points.
By the way, if Iâm not mistaken, publishing a conflated or intersected dataset of OSM buildings is permissible but could trigger the ODbLâs share-alike provisions depending how itâs done. Of course the Microsoft building dataset is also under the ODbL, and the other mentioned external datasets are presumably compatibly licensed; otherwise we wouldnât be discussing them at all. But the overall message encouraging data consumers to join datasets would need to be careful not to incentivize violations of the license.
GTFS transist data from a variety of agencies is generally useful, though OSM tends to lag
The https://transitous.org/ project exists to create a worldwide open database of links to GTFS files. Instead of importing data from all of them, I see in a good light in the future, that there should be a way to more properly link OSM and Transitous (especially stop + platform/quays location for accurate pedestrian routing).
The same applies to similar open projects. I think that we should strive to create an interoperable ecosystem of them, which could then allow for the creation of truly complete and open map clients by combining these linked databases.
this is a good example where part of data can be useful (stop locations) and partially utterly out of scope (daily changes to timetables) and should be integrated by data consumers
I would rather look for human-scale drudgery that can be supported or partially automated by using extra datasets.
For example, automated notification that new road was build and is now visible on aerial and can be traced.
Can relate, I used to tediously import buildings in Belgium and burned myself out of OSM entirely due to this (enough to delete my account out of sheer frustration, though I made a new one recently to start afresh).
That and getting scolded because I didnât keep the history for this or that specific itemâŠ
Welcome back, I hope you will have a more satisfying run this time!
In Nederland, importing buildings and addresses from the BAG has become a cornerstone of the mapping community.
We are already linking datasets for power plants. OpenStreetMap power plants are linked to more detailed information in Global Energy Monitor using WikiData. We are also planning similar approaches for power lines and substation.
However, I have to say that the tooling for linking WikiData in JOSM is still in its infancy. Better tooling and better cooperation with WikiData could really be helpful here.