Archiving Mapillary data on archive.org

Hi all

After seeing that Meta have released their own proprietary Gaussian Splat creator for Meta Quest VR, and that there’s no Mapillary Android app on Meta Quest, I was a bit worried that Mapillary’s street view images will be at risk in future. So I wrote a tool to download the data so it can be saved to the Internet Archive. If anyone is interested in helping, it can be installed with:

pip install mapillary-downloader

I’ve been running it for a few months and have a couple of other volunteers helping, we’re focusing on 1024p “thumbnails” (--quality=1024) and have archived ~10% of them:

I’ve got another tool that rips and uploads CDs/DVDs to the archive, which can be found here: GitHub - bitplane/rip: Uploading old PC CDs to the Internet Archive

If you clone the above in Linux, a Mac, or in WSL2 in Windows, and drop the output of mapillary-downloader into the ./4.ship directory, then type ./scrip/4.ship.sh it’ll upload for you.

The first time it uploads, it’ll fail due to your user not having rights to the collection. Either edit .meta/collection to opensource (use./scrip/dip.sh for a bash TUI editor) and I’ll bulk move them as they appear, or send me an email (garethdavidson@gmail.com) and I’ll request that you’re added to the collection.

Thanks!

Gareth

Couldn’t link this above due to posting restrictions on new users. But here’s the collection so far if anyone wants to take a look:

Nice work !

I’ve opened a couple of issues to improve a few things having in mind all EXIF tags that would be nice to have before uploading to Panoramax…

I’ll try to create some PR asap.

3 Likes

Awesome, thank you! I merged your pull request and hopefully fixed the other issues :slight_smile:

1 Like

It’s been a few months, so I thought I’d post an update:

======================================================================
Mapillary Downloader - Archive.org Statistics
======================================================================

Total Collections: 23,324
Total Users:       22,615
Total Images:      509,015,299
Unique Images:     506,591,498 (25.330% of 2B)
Total Size:        26951.81 GB

By Quality:
----------------------------------------------------------------------
  original  445 collections       419,079 images (0.021%)  308.48 GB
  1024      21374 collections   476,322,249 images (23.816%)  24578.19 GB
  2048      1501 collections    32,268,148 images (1.613%)  2065.06 GB
  256         4 collections         5,823 images (0.000%)  78.82 MB

My pipeline used to be “go to the web, find an interesting location and download the top 100 users for that area”. But I’ve now written a scraper for this and released the data so nobody else has to run it.

My new approach is: ./scripts/get-users.py --exclude-archived | grep $'^1..\t' | xargs mapillary-downloader --quality 1024" (or some similar pattern). This helps with data diversity - people who only uploaded a few hundred images tend to have much more carefully selected ones, and are scattered all over the world. I’ll do a proper API bbox search over the planet as a detail pass in future :slight_smile: