About mapping features with computer vision

daavoo · March 23, 2025, 7:17am

Hello community!

I am the author of a recently release open-source project GitHub - mozilla-ai/osm-ai-helper: Blueprint by Mozilla.ai for mapping features in OpenStreetMap with Computer Vision. Sorry for the long post, I swear I wrote every word.

I am here to:

First, apologize to any OSM member whose time has been wasted because of the project outputs.
I should have discussed with the community before making the project public.
I am happy to assist addressing any negative outcome the project might have generated.

Second, I want to help improving openstreetmap, open-source AI and the union of both. I am happy to learn from anyone willing to discuss how to improve a project aiming to contribute features generated by computer vision.

On a side node, I perceived a little bit of aggressivity on some of the HN comments. I am not saying that those comments were from the OSM community but, still, I feel the need to justify myself and my decisions here as a member of the community so apologize if I sound over-defensive.

My not-ai-generated summary of the raised issues is the following. Happy to hear ideas on improvements.

Misunderstanding of the automation definition and/or data policies.

I did read different pages of the wiki (i.e. Automated Edits code of conduct - OpenStreetMap Wiki) and (miss)understood that the human verification step I added was enough to fulfill the automation policies.

Issues with traceability

I made a huge mistake on the first days the project was published where my personal token was being used when uploading data from the demo. Several swimming pools were added under my profile (daavoo).

Once I realized, I created a separated user for better traceability osm-ai-helper | OpenStreetMap .

Aside from that, any changeset included the created_by=https://github.com/mozilla-ai/osm-ai-helper.

I manually monitored the contributions and corrected mistakes, according to my criteria (see next point).

Quality of the predictions

My original goal for the swimming pools example was to be able to perform some simple spatial analysis around my local area. For these analysis a very accurate contour was not needed (i.e. I cared about location of the centers and orientation of the shape), so I might have been biased on what makes a good enough polygon.

I though that for a case like swimming pools, which are usually isolated “non-critical” features, the predictions were good enough.

My main worry was for the predicted features to contain too many nodes and overload the database unnecessarily. For that, I added a conservative polygon simplification which might made the predictions actually worse (in terms of adjusting to the actual shape).

Zverik · March 23, 2025, 10:13am

daavoo, the negativity comes from similar tools actively harming OSM data over the last five years. You did not do any research, and believe that you can do better than Meta and Google. They harmed the map, and your tool will also harm the map. The Aperture Science mindset is detrimental to the world around you, and the communities you touch. Even worse that this comes from Mozilla, the only company I trusted with my browsing experience.

SomeoneElse · March 23, 2025, 11:22am

Andy from the DWG here. Firstly - thanks for posting here. Many / most of the problematic swimming pools have been reverted, firstly in https://www.openstreetmap.org/changeset/163963906 by @Stereo last night, and secondly in https://www.openstreetmap.org/changeset/163476128 by me this morning. As you noted above different sorts of editing were mixed together in each of the two accounts used. I tried to exclude “regular iD editing” from that second revert, but unfortunately missed some (see changeset discussion for details) - sorry about that.

Overall the scope was much more limited and much less damaging than Facebook’s initial exploits 6 or 7 years ago, but it’s certainly not yet “the finished article” (some of the comments in this related topic are also relevant).

One thing that OSM isn’t good at is storing things that are “vaguely there or thereabouts”. For example, someone fighting a fire might want to know where there is a swimming pool full of water, but not be bothered what shape it is. One part of the culture of OSM is that you verify that something’s OK is by making sure that the geometric shape matches reality, so that it won’t look silly in a renderer. As noted in the AI building thread, detecting that there is a building is easy; the time-consuming part is accurately drawing it, and it’s often actually quicker to draw a building from scratch than try and rescue something that is a not-very-good-attempt.

No change there, then

More seriously, the comments from names that I recognise from OSM seem designed to be helpful (Stereo’s initial comment was very direct, but in a necessary “please let’s not make this situation any worse” kind of way).

With regard to the swimming pools, what possibility is there for you to take into account the different imagery offsets in any particular area? A human can easily compare available imagery (and other) sources, and can take account of imagery offsets. One problem I’ve personally see a lot of (like when Amazon did it a few years ago) was “pick one imagery source in a region, not noticing a significant offset”. I can still tell unmodified Amazon service roads locally because (a) they’re offset from reality and (b) they might technically be a patch of concrete, but in a place that other mappers may not add them.

More generally, I’m sure that there are many potential uses for this sort of thing beyond the basic “detect feature X in imagery” that I remember learning in the 1980s, taught be people who’d been doing it for a dozen or so years than. An example might be “how likely do I think this changeset is to contain accurate data, and how do I know?”. Anyone with a reasonable familiarity with OSM can look at the OSM history feed and think “that changeset needs checking” but often the cues that lead to that aren’t easy to write down.

Edit: added “from OSM” above to make meaning clearer.

daavoo · March 23, 2025, 11:39am

I will first reply this as I feel kind of obligated now because you brought my employer to the discussion and questioned my investments/research in the topic.

I will try to then be more proactive suggesting some ideas I have to improve the verification of AI systems that want to contribute to OSM.

I have never claimed to be better than anyone. Here is an outline of my research for the topic:

My first interaction with training models from OSM data was 6 years ago when I took part of several public and private projects. One example is Inicio | Inspeccións Intelixentes Avanzadas . During this project data from OSM was used and curated but never contributed back. I didn’t have any decision power to make it different. I did read every paper?(at least all from stuff like CVPR) and investigated every (I could find)? open source project mixing osm and ml.

That’s when I started wanting to work on a side project to use OSM data to train a model to then contribute to OSM back.

Over 2 years ago I got the opportunity to work on a similar project and I started manually mapping swimming pools (Changeset: 129636667 | OpenStreetMap). After some initial efforts I found that the existing polygon data was not reliable enough to train a polygon predictor, so I focused on training bounding boxes that I used to find the swimming pools and then manually draw the polygons.
A couple of weeks ago I got the opportunity to work on a project of my choice and I chose to work on this. I tried recent segmentation models that don’t rely on the quality of OSM polygons (SAM2) and found decent enough*(criteria discussed above) results. I discussed with my employee the idea to publish the project. I didn’t discuss it with the OSM and should have done it, as I already said.

First of all, I work for mozilla.ai which is a separate startup (funded by mozilla) focusing on exploring open source AI. That is not the same people behind your browsing exprience (which is also exploring AI, and we all should start accepting that there is no going back).
I know this company structure is not obvious for outcomers (or me) but just wanted to clarify that.

The only thing that mozilla.ai did was allow me to dedicate part of my time to a project of my choice, and I chose to do this.

silversurfer83 · March 23, 2025, 11:49am

It’s a predictor, not an “Intelligence” and no, I for one will not accept and keep fighting back where neccessary.

daavoo · March 23, 2025, 12:16pm

I use AI as it is the most commonly known term from time to time, sorry. As you might have seen I tried to use Computer Vision (also not accurate) as the closest replacement here.

I didn’t mean you should not fitght back whenever you feel like it.
I work building predictors and still fight back against use cases that don’t align with what I believe.

I once (a little bit also yesterday/today) had doubts about continuing working in the field and wanted to quit. I think of that as a “denial” because AI is going to affect you regardless of your wiling.

By “accepting” I mean to assume that AI is going to affect your life (and communities) and your money/taxes are going to be spent on it. That doesn’t mean to not work on trying to make the existing use cases align with the values of the majority and find new ones.

Mateusz_Konieczny · March 23, 2025, 12:25pm

one option, if such tool is very good at detecting exact location but not so great at detecting actual shape: maybe add them as points (nodes), rather than as areas?

obviously, this still requires human review of each entry or going through import process

okainov · March 23, 2025, 12:37pm

As a random stranger and just OSM enthusiast here, I probably first would like to appreciate @daavoo initiative and efforts! OSM community is incredibly conservative and you don’t need to dig deep to notice it. But there will be no progress without risks, it should be clear.

There can be errors of course, and that’s not specific to pure human inputs vs computer-assisted inputs. I don’t think there is author of the tool to blame (however, I have seen neighboring thread about MapRoulette where many conservative folks try to blame its creator for the contributions made by the users, so it’s inevitable).

So don’t let the loud reaction of few individuals put you down, learn from it and make it better next time!

SimonPoole · March 23, 2025, 12:49pm

We have at least 8 years experience of nonsense being dumped in the database in a larger scale by generating geometries by ML from imagery (not to mention similar import related issues).

Please don’t try to lecture the wider OSM community on such things, they are the ones doing the clean up after the fail.

Tatti_Barletta · March 23, 2025, 1:19pm

As another random stranger and OSM enthusiast here is my take: use whatever you want to pinpoint the features you need, then open JOSM or iD and draw them manually as accurately as possible. In the absence of this step, which you are trying to force us to do for you, I’m 100% in favor of rejecting every direct AI contribution without even looking at what it is.

SomeoneElse · March 23, 2025, 1:50pm

That … isn’t obvious from the website (4 of the 10 links are to mozilla.ai, 6 of the 10, including “legal”, to mozilla.org).

Ahem. Some of us are old enough to remember the gaps between previous periods of AI boosterism (from before someone at Gartner coined the term “hype cycle”). I was certainly working in a related field in the late 1980s / early 1990s and what we were doing then and what every man and his dog is doing now is not magic, it’s just programming - neither particularly artificial nor particularly intelligent**.

It’s true that tools for scaling computing and other resources have made the ability to “do X cheaply and quickly” much, much easier; and “X” may be managing a hierarchy of models at some distance removed what most people might call “just programming”, but it’s not magic. The same issues remain - how do I know that the thing that I have done is what I intended to do? How do I know that I have not broken something that I didn’t realise was there for a reason?

** it’ll certainly look intelligent to the average punter, but take an electronic greetings card that signs “happy birthday” back a couple of centuries and try explaining that actually, it’s not very “clever” at all…

Lumikeiju · March 23, 2025, 6:49pm

Personal and Professional Context and Disclaimers

I work at the Taskar Center for Accessible Technology (TCAT) at the University of Washington. To be clear, AI (and specifically/especially its use in the OSM ecosystem) is without a doubt a core part of our (though not my) focus.

This post is intentionally made under my personal account because it is a sharing of my own personal views. These opinions are, naturally, partly the product of my experience working directly with the AI-produced outputs of parts of our internal tools. My perspective is also informed by the uncompensated hobby work that I’ve done in OSM over the last year and my own views on AI and related technologies.

Thank you for engaging here (and elsewhere) with the community. That’s definitely a step in the right direction.

One thing that frustrates me when these topics come up is that the low quality of the final output often complicates discussions about the quality of specific parts of the process - I think it’s clear that the machine-generated ways produced by the currently available tools in this space are not at the quality level desired by many for inclusion directly into OSM.

If you develop a tool which can accurately and efficiently recognize specific features from aerial imagery and pass those detections to humans for verification - great! If you develop a tool which can take those verified detections and use them to insert low-quality geometry directly into OSM - that’s not so great.

Please consider narrowing the scope of the contributions that your tool makes to OSM and repositioning OSM within your project. In the swimming pools example, the human-vetted detections could result in swimming pool nodes (centroids calculated from the AI-inferred geometry) being contributed back to OSM. If you want to use the inferred geometry for something, such as answering “What percentage of the area of Example Town is covered by swimming pools?”, then that’s fine - the geometry itself just doesn’t belong in OSM. One could always make something like a follow-up MapRoulette challenge for “Upgrade leisure=swimming_pool nodes to areas” or similar! @mvexel might be able to help you there.

Additional reflections, based on work experience

[Again note that this work was mostly done before I started working for UW, and it’s not my focus - so “we” here generally means “our team, not including me”]
We (TCAT) made heavy use of AI to create the OS-CONNECT dataset (viewer) for the WA Proviso project, and we’re not shy about that - refer to the first sentence on the project’s main page:

Under the directive of the state legislature and using innovative technology by the Taskar Center for Accessible Technology, Washington is now creating an AI-generated, human-vetted network graph inventory of sidewalks across Washington State—the OS-CONNECT dataset.

This data is not added directly to OSM, because that’s not where it’s useful. It doesn’t mean it’s not useful (it is!) and that doesn’t mean we don’t contribute directly to OSM (we do!) but it does mean that we (meaning both our team and the broader OSM community) recognize AI-generated data, even when inextricably related to OSM, shouldn’t be OSM.

Also, you may be interested in [2303.02323] APE: An Open and Shared Annotated Dataset for Learning Urban Pedestrian Path Networks.

I respect and appreciate that you are excited about the technology you’re working with and want to use it to improve OSM - that’s a wonderful thing, and I hope that you’re not discouraged from continuing to work in this space because I definitely see the potential of these tools which your project highlights.

daavoo · March 23, 2025, 6:49pm

Hi there. Thanks for your quick response addressing the issues and your work on the DWG in general.

No worries, this was my mistake.

I understand it was necessary, as I said I was not pointing to any OSM members. I am helpful about the comment being one of the first so I could take (some) immediate actions (shut down the demo).

What I have and did use for my personal usage of the tool is a code that runs inference using 3 different tile providers (for my use case from Galicia, this were spanish PNOA, Bing Aerial and Mapbox). I will try to polish and release this code.
The best I could do with that code is detect heavy discrepancies in alignment and discard features affected by them (i.e. predictions across the 3 providers have low IoU). I haven’t found an useful way to combine the predictions when images are wrongly aligned, the aggregated results are just worse.

I agree, one of the most useful outcomes I found was to review the own model predictions against the training dataset. Heavy discrepancies there usually led me to existing features that were partially? wrongly mapped (i.e. an indoor swimming pool missing the indoor tag).

I didn’t have time to polish this code for release. I will try to do it

daavoo · March 23, 2025, 6:51pm

I was not trying to lecture anyone, and apologies if that is how I sounded. I was trying to apologize, offer help, and expose my points

daavoo · March 23, 2025, 6:54pm

The (or me) here was not ironic

SimonPoole · March 23, 2025, 10:58pm

I wasn’t replying to you.

Fizzie-DWG · March 23, 2025, 11:15pm

Just for interests sake, how does you tool go with partially covered pools e.g. OpenStreetMap. Will it recognise them?

daavoo · March 24, 2025, 2:00pm

Helo there!

I have published a new relase where any code to directly upload to OSM has been replaced with an export to OsmChange format.

I hope this is a step in the right direction.

There are a lot of interesting comments I want to catch up but I don’t have the time right now, will try to do it soon

daavoo · March 27, 2025, 10:15am

TLDR: As of today, there is no chance that the tool will generate the polygon outline correctly, for the case you shared. However, it *should be able to locate it so you can edit the prediction manually.

Any occluded example I have found with the tool, I have rejected it during the review step and labeled myself manually in OSM.

Longer explanation

The tool currently uses 2 models:

Bounding Box Predictor

It is trained with OSM data (the bounds of the swimming pool areas in this example) and it is in charge of predicting bounding boxes.

In theory this model would be able to predict the bounding box “correctly” (accounting for the occlusion) if you collect enough samples in that are labeled consistently. Spoiler: there are not enough in the dataset I used mozilla-ai/osm-swimming-pools · Datasets at Hugging Face so it will most likely predict a partially incorrect bounding box.

Polygon predictor

The real problem is this model, which receives the bounding box from the first one and tries to generate the polygon outline.

For this task, the data in OSM is not good enough to reliably train the model. Because of that, I opted to use a “general purpose” segmentor : SAM2.

This model doesn’t account for occlusions, as it was not trained for it. So there is no chance that predicts a polygon that matches what you shared.

You can test it here sam2/notebooks/image_predictor_example.ipynb at main · facebookresearch/sam2 · GitHub

daavoo · March 27, 2025, 10:41am

Thank you for your comment, it is very insightful. I love what you are working on.

So my point for debate here is:

do you think there are/will be some specific cases where the machine-generated features (will) match the desired level and belong to OSM?

If yes, how can someone validate the performance in a measurable/objective manner (assuming the community has been informed and agreed with the import)?

Here is what I did for this project, and maybe it could be extended to gather more human input for evaluation

I made some tools to compute metrics (i.e. IoU for the polygons) between changesets. I used those to compare the degree of agreement/consensus between different humans (mostly me and my wife ) but also against machine generated outputs. I only published the tool once I reach a point where the agreement score of the machine generated was higher than the human consensus (and, as we have seen, this was clearly not enough).

I am hoping to find the time to implement that, thanks for the idea.

I am not sure I understand how do you reach the conclusion that OSM is not where that info is useful. Is that the case even if there is a point in time (maybe unreachable) where that AI-generated data matches the level of what humans annotate today?