After seeing that Meta have released their own proprietary Gaussian Splat creator for Meta Quest VR, and that there’s no Mapillary Android app on Meta Quest, I was a bit worried that Mapillary’s street view images will be at risk in future. So I wrote a tool to download the data so it can be saved to the Internet Archive. If anyone is interested in helping, it can be installed with:
pip install mapillary-downloader
I’ve been running it for a few months and have a couple of other volunteers helping, we’re focusing on 1024p “thumbnails” (--quality=1024) and have archived ~10% of them:
If you clone the above in Linux, a Mac, or in WSL2 in Windows, and drop the output of mapillary-downloader into the ./4.ship directory, then type ./scrip/4.ship.sh it’ll upload for you.
The first time it uploads, it’ll fail due to your user not having rights to the collection. Either edit .meta/collection to opensource (use./scrip/dip.sh for a bash TUI editor) and I’ll bulk move them as they appear, or send me an email (garethdavidson@gmail.com) and I’ll request that you’re added to the collection.
It’s been a few months, so I thought I’d post an update:
======================================================================
Mapillary Downloader - Archive.org Statistics
======================================================================
Total Collections: 23,324
Total Users: 22,615
Total Images: 509,015,299
Unique Images: 506,591,498 (25.330% of 2B)
Total Size: 26951.81 GB
By Quality:
----------------------------------------------------------------------
original 445 collections 419,079 images (0.021%) 308.48 GB
1024 21374 collections 476,322,249 images (23.816%) 24578.19 GB
2048 1501 collections 32,268,148 images (1.613%) 2065.06 GB
256 4 collections 5,823 images (0.000%) 78.82 MB
My pipeline used to be “go to the web, find an interesting location and download the top 100 users for that area”. But I’ve now written a scraper for this and released the data so nobody else has to run it.
My new approach is: ./scripts/get-users.py --exclude-archived | grep $'^1..\t' | xargs mapillary-downloader --quality 1024" (or some similar pattern). This helps with data diversity - people who only uploaded a few hundred images tend to have much more carefully selected ones, and are scattered all over the world. I’ll do a proper API bbox search over the planet as a detail pass in future