Hello community!
I am the author of a recently release open-source project GitHub - mozilla-ai/osm-ai-helper: Blueprint by Mozilla.ai for mapping features in OpenStreetMap with Computer Vision. Sorry for the long post, I swear I wrote every word.
I am here to:
First, apologize to any OSM member whose time has been wasted because of the project outputs.
I should have discussed with the community before making the project public.
I am happy to assist addressing any negative outcome the project might have generated.
Second, I want to help improving openstreetmap, open-source AI and the union of both. I am happy to learn from anyone willing to discuss how to improve a project aiming to contribute features generated by computer vision.
On a side node, I perceived a little bit of aggressivity on some of the HN comments. I am not saying that those comments were from the OSM community but, still, I feel the need to justify myself and my decisions here as a member of the community so apologize if I sound over-defensive.
My not-ai-generated summary of the raised issues is the following. Happy to hear ideas on improvements.
Misunderstanding of the automation definition and/or data policies.
I did read different pages of the wiki (i.e. Automated Edits code of conduct - OpenStreetMap Wiki) and (miss)understood that the human verification step I added was enough to fulfill the automation policies.
Issues with traceability
I made a huge mistake on the first days the project was published where my personal token was being used when uploading data from the demo. Several swimming pools were added under my profile (daavoo).
Once I realized, I created a separated user for better traceability osm-ai-helper | OpenStreetMap .
Aside from that, any changeset included the created_by=https://github.com/mozilla-ai/osm-ai-helper
.
I manually monitored the contributions and corrected mistakes, according to my criteria (see next point).
Quality of the predictions
My original goal for the swimming pools example was to be able to perform some simple spatial analysis around my local area. For these analysis a very accurate contour was not needed (i.e. I cared about location of the centers and orientation of the shape), so I might have been biased on what makes a good enough polygon.
I though that for a case like swimming pools, which are usually isolated “non-critical” features, the predictions were good enough.
My main worry was for the predicted features to contain too many nodes and overload the database unnecessarily. For that, I added a conservative polygon simplification which might made the predictions actually worse (in terms of adjusting to the actual shape).