Mapillary sourcing doesn't seem to have taken off. Is my analysis below reasonable?

I was interested to find out how much mapillary imagery helps bettering OpenStreetMap. To find out, I downloaded OpenStreetMap extracts for 3 countries (Slovakia, Norway, Lithuania) from GeoFabrik website. I created GeoJSONs from the extracts, then extracted the features where the source contained mapillary. I then counted these features, and plotted them on a map using QGIS. Based on this, it seems to me that mapillary images have not been widely used to better OpenStreetMap.

Below I I) detail the steps I just described and II) show my results. I also III) give an overview of what other Mapillary related tools I looked at and finally IV) ask your opinion on the validity of my findings, and possible ways to improve them.


I: Extract features with source=mapillary

Starting from this gis.SE question, through the GDAL website, I figure I need to an edited version of osmconf.ini. I put ,source at the end of line 38, 58, 90, 108 and 126 and remove it from every other place. My new osm_source.ini:

#
# Configuration file for OSM import
#

# put here the name of keys, or key=value, for ways that are assumed to be polygons if they are closed
# see http://wiki.openstreetmap.org/wiki/Map_Features
closed_ways_are_polygons=aeroway,amenity,boundary,building,craft,geological,historic,landuse,leisure,military,natural,office,place,shop,sport,tourism,highway=platform,public_transport=platform

# Uncomment to avoid laundering of keys ( ':' turned into '_' )
#attribute_name_laundering=no

# Some tags, set on ways and when building multipolygons, multilinestrings or other_relations,
# are normally filtered out early, independent of the 'ignore' configuration below.
# Uncomment to disable early filtering. The 'ignore' lines below remain active.
#report_all_tags=yes

# uncomment to report all nodes, including the ones without any (significant) tag
#report_all_nodes=yes

# uncomment to report all ways, including the ones without any (significant) tag
#report_all_ways=yes

# uncomment to specify the the format for the all_tags/other_tags field should be JSON
# instead of the default HSTORE formatting.
# Valid values for tags_format are "hstore" and "json"
#tags_format=json

[points]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,barrier,highway,ref,address,is_in,place,man_made,source
# keys that, alone, are not significant enough to report a node as a OGR point
unsignificant=created_by,converted_by,time,ele,attribution
# keys that should NOT be reported in the "other_tags" field
ignore=created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

[lines]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,highway,waterway,aerialway,barrier,man_made,railway,source

# type of attribute 'foo' can be changed with something like
#foo_type=Integer/Real/String/DateTime

# keys that should NOT be reported in the "other_tags" field
ignore=created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

#computed_attributes must appear before the keywords _type and _sql
computed_attributes=z_order
z_order_type=Integer
# Formula based on https://github.com/openstreetmap/osm2pgsql/blob/master/style.lua#L13
# [foo] is substituted by value of tag foo. When substitution is not wished, the [ character can be escaped with \[ in literals
# Note for GDAL developers: if we change the below formula, make sure to edit ogrosmlayer.cpp since it has a hardcoded optimization for this very precise formula
z_order_sql="SELECT (CASE [highway] WHEN 'minor' THEN 3 WHEN 'road' THEN 3 WHEN 'unclassified' THEN 3 WHEN 'residential' THEN 3 WHEN 'tertiary_link' THEN 4 WHEN 'tertiary' THEN 4 WHEN 'secondary_link' THEN 6 WHEN 'secondary' THEN 6 WHEN 'primary_link' THEN 7 WHEN 'primary' THEN 7 WHEN 'trunk_link' THEN 8 WHEN 'trunk' THEN 8 WHEN 'motorway_link' THEN 9 WHEN 'motorway' THEN 9 ELSE 0 END) + (CASE WHEN [bridge] IN ('yes', 'true', '1') THEN 10 ELSE 0 END) + (CASE WHEN [tunnel] IN ('yes', 'true', '1') THEN -10 ELSE 0 END) + (CASE WHEN [railway] IS NOT NULL THEN 5 ELSE 0 END) + (CASE WHEN [layer] IS NOT NULL THEN 10 * CAST([layer] AS INTEGER) ELSE 0 END)"

[multipolygons]
# common attributes
# note: for multipolygons, osm_id=yes instantiates a osm_id field for the id of relations
# and a osm_way_id field for the id of closed ways. Both fields are exclusively set.
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,type,aeroway,amenity,admin_level,barrier,boundary,building,craft,geological,historic,land_area,landuse,leisure,man_made,military,natural,office,place,shop,sport,tourism,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

[multilinestrings]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,type,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

[other_relations]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,type,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

To download OSM data and create a GeoJSON with sourced features, I wrote osm_source.sh:

set -o errexit
set -o nounset

geofabrik_name=$1

curl https://download.geofabrik.de/europe/${geofabrik_name}-latest.osm.pbf -o latest.osm.pbf

for each in $(ogrinfo latest.osm.pbf | tail -n +3 | awk '{print $2}'); do

    rm -f osm_source_${each}.geojson

    ogr2ogr -f GEOJSON \
            -dialect sqlite \
            -sql "SELECT geometry, source FROM ${each} WHERE source IS NOT NULL" \
            osm_source_${each}.geojson latest.osm.pbf \
            -nln main \
            --config OSM_CONFIG_FILE osm_source.ini

done

To run it for Slovakia, I do: ./osm_source.sh slovakia (after chmod +x osm_source.sh, of course). I have produced 5 files:

  • osm_source_points.geojson
  • osm_source_lines.geojson
  • osm_source_multilinestrings.geojson
  • osm_source_multipolygons.geojson
  • osm_source_other_relations.geojson

I would like to visualize features of these GeoJSONs where mapillary was given as a source. To do that, I use osm_source.py:

import geopandas as gpd
import pandas as pd
import sys

country = sys.argv[1]

points = gpd.read_file("osm_source_points.geojson")
lines = gpd.read_file("osm_source_lines.geojson")
multilinestrings = gpd.read_file("osm_source_multilinestrings.geojson")
multipolygons = gpd.read_file("osm_source_multipolygons.geojson")
other_relations = gpd.read_file("osm_source_other_relations.geojson")

df = gpd.GeoDataFrame(pd.concat([points,lines,multilinestrings,multipolygons,other_relations]))

df = df.assign(geometry = df.geometry.apply(lambda row: row.centroid))

df = df[df.source.str.lower().str.contains("mapillary")]
print(f"Mapillary is cited {len(df)} times as a source in {country}")

df.to_file(f"mapillary_centroids_{country}.geojson")

I run this script via python3.11 osm_source.py slovakia. I repeat the above procudure for Norway and Lithuania.


II: Results

Slovakia

python3.11 osm_source.py slovakia gives the textual output:

Mapillary is cited 1191 times as a source in slovakia

and the file mapillary_centroids_slovakia.geojson. I open this file in QGIS (over OpenStreetMap basemap):

Most contributions are around highways. Some other contributions here and there, but the map is clearly highway heavy.

Lithuania

Mapillary is cited 39 times as a source in lithuania

This low number is even more surprising if we see how good the Mapillary coverage is in Lithuania, even if we consider only panorama images:

Much better than most other countries.

Norway

Mapillary is cited 12 times as a source in norway

All points are in the proximity of Oslo.


III: Other explored connections between Mapillary and OSM

There is the How to Use Mapillary Data in OpenStreetMap-titled Mapillary blogpost. No images load, it seems abandoned. It leads me to Pic4Review, which, after logging in, repeatedly fails to load:

Oops ! Something went wrong when fetching missions (Failed to fetch)

There is also mapillary.com/osm. After clicking Mapillary in RapiD, I get to:

which seems like a standard iD editor. I havenā€™t spent a lot of time here, but I havenā€™t been able to figure out how I could use Mapillary imagery through this site.

IV: Conclusions

Based on these findings, contributing to Mapillary does not seem to be a good way to improve OSM. (I admit, I probably I could have analyzed more countries - the trends uncovered in the case of the above 3 are quite worrying nevertheless.)

  • Is there anything fundamental my code misses?

  • Is there a much easier way to perform this source analysis than the one presented above?

  • If my view is wrong, and Mapillary is useful for OSM more than I realize, what tool can I use to efficiently utilize Mapillary imagery?

1 Like

Interesting start! If Iā€™m reading this correctly you looked at certain map features with a source tag mentioning Mapillary? Iā€™m not sure if that is the case for most changes that Mapillary was involved in.

Did you look at whether Mapillary might be mentioned in the source tag on the changeset instead?

There is also the mapillary=* key which has been used 300k times.

In iD, press ā€˜uā€™, then under Photo Overlays, activate Mapillary. If you do that and then make a map change, iD will add ā€œmapillaryā€ to the source tag on the changeset.

6 Likes

Your analysis is flawed because of your assumption that individual objects will carry a source=mapillary. You would (in addition) have to download the tags for all changesets that were applied in the respective country since Mapillary was available, to analyze if there are mentions of Mapillary being used in the changesetā€™s comment or source tags. This would require downloading the changeset dump and looking at the bounding boxes (imprecise because it will include world-spanning boxes with no edits in the country you are looking at), or retrieving all relevant changeset IDs from the countryā€™s history PBF and cross-referencing them with the changeset dump.

Edit: or, what osmuser63783 said a minute before me :wink:

12 Likes

(trivia: I would also write about the changesets, but then I saw others drafting the reply)

But yes, for the author of the topic, @woopeck described the extra steps. But I think you could still be able to do it (you seem to know shell script very well and be able to do the other parts in python). Go ahead! I still didnā€™t do exactly this kind of analysis, but I know it is feasible.

PS.: Maybe you could also consider writing a Diary on your account (even if with a warning that is still incomplete because you miss the changeset tags)? The second part (if you manage to archive, others here could review your steps) could come later. But the scripts themselves are more generic than mapillary. So, yes, I think itā€™s worth going ahead if you want it! In general people who would post this kind of thing would eventually be cited on the OSM Weekly.

1 Like

Thanks @osmuser63783 @woodpeck @fititnt for the useful suggestions. Iā€™ll look into changesets. Once Iā€™m done, I plan to share my results again.
I havenā€™t thought about a Diary post, Iā€™ll keep that option in mind!

2 Likes

Some additional things to consider:

It is possible some European countries were mapped pretty well before the advent of Mapillary. I donā€™t think it is controversial to make the claim that Europeans took to OSM faster than anywhere else. OSM dates to 2008? Mapillary to 2014.

Unfortunately, citation habits are not equal among contributors. Certainly, people have added objects from Mapillary imagery in the past and failed to cite it.

The same imagery may have been uploaded to both Mapillary and Kartaview, and only Kartaview was cited after the fact. Depending on the community, Kartaview (or even Yandex) may be the more popular tool.

Itā€™s not a ridiculous supposition that local contributors have at times uploaded imagery to mapillary, but only after they finished making edits corresponding to data from their images. I know some people have observed a decrease in resolution level and this motivated them to initially contribute data directly from hard-drive images before sharing the same images publicly on the Mapillary platform.

It is possible that OSM contributors have personally collected images for an area, uploaded those images to mapillary and after contributing new data cited only ā€˜local/personal knowledgeā€™ as a source. This is not the most helpful habit, but there is nothing terribly wrong with this practice as they did visit in person and therefore have ā€˜personal knowledgeā€™.

3 Likes