What happened roughly every two hours between 12 Oct and 20 Oct 2022 with mapillary-sourced features?

I downloaded changesets-230508.osm.bz2 from https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/planet/2023/:slight_smile:

curl https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/planet/2023/changesets-230508.osm.bz2 -o changesets-230508.osm.bz2

I wanted to see all the changesets mentioning mapillary. I restricted my search to changesets overlapping with the approximate bounding box of Norway:

osmium changeset-filter --bbox -10,57,34,81 --progress -f opl changesets-230508.osm.bz2 > changesets_filtered.txt
grep -i mapillary changesets_filtered.txt > changesets_mapillary.txt

(Here I’m heavily relying on this post and osmium documentation.)

To visualize how these changesets are distributed over time, I do:

import pandas as pd
import datetime
import matplotlib.pyplot as plt

df = pd.read_csv("changesets_mapillary.txt",sep=' ',header=None)
df = df.assign(ts = df[2].apply(lambda row: row[1:]))
df = df.assign(timestamps = df.ts.apply(lambda s: datetime.datetime.timestamp(pd.to_datetime(s))))

df.timestamps.hist(bins=100)
plt.axvline(x=1.66*10**9,c='r',alpha=0.8)

Giving me:

The horizontal axis is unix timestamp, the vertical axis is number of changesets. As one can see, there is a big spike slighlty after the red line.

To see how the timings of changesets are distributed over the course of a day, I do:

datetime_series = pd.to_datetime(df.timestamps, unit='s')
(datetime_series.dt.hour*60+datetime_series.dt.minute).hist(bins=int(24*60))
plt.xlim([0,24*60])

(Horizontal axis: quarter hour within a day, vertical axis: how many changesets originating from that quarter.)

There is a strong 2-hour periodic signal present. To see if this is the same signal as the spike above, I consider only the changesets before the red line from the first image, and replot this second distribution:

datetime_series = pd.to_datetime(df.timestamps[df.timestamps<1.66*10**9], unit='s')
(datetime_series.dt.hour*60+datetime_series.dt.minute).hist(bins=int(24*60))
plt.xlim([0,24*60])

Clearly, the 2-hour signal disappeared. I replot the first plot, but now I zoom to the spike:

lower = 1.66557*10**9
upper = 1.66625*10**9

plt.figure(figsize=(10,5))
df.timestamps[(df.timestamps>lower) & (df.timestamps<upper)].hist(bins=500)
plt.xlim([lower,upper])

As expected, a seemingly periodic signal. Let’s try to fold it with a 2-hour period:

(df.timestamps[(df.timestamps>lower) & (df.timestamps<upper)]%(120*60)).hist(bins=120)

Result:

I believe this demonstrates that there is a strong 2-hour periodic signal between lower and upper. What do lower and upper and upper actually mean?

print(pd.Timestamp(lower,unit='s').strftime('%Y-%m-%d %H:%M:%S'))
print(pd.Timestamp(upper,unit='s').strftime('%Y-%m-%d %H:%M:%S'))

I get:

2022-10-12 10:20:00
2022-10-20 07:13:20

I don’t know what’s behind this observation. Hence, the question:

What happened between 12 Oct 2022 10am and 20 Oct 2022 07am, roughly every two hours, with mapillary-sourced features?

4 Likes

Are you aware of this thread:

3 Likes

No, I wasn’t aware… Thanks!

Hey @zabop,

I am torn between astonishment for the amount of work you put into such detail and feeling bad for all the work you put in it, when the solution was so simple!

Was there any special reason for you to especialy looking for the mapillary key?

K

1 Like

Hey @kmpoppe,

Don’t worry, it was fun to explore these things! I have two aims:

  1. Learn more about OSM & related tools.
  2. Come up with some quantification of how much Mapillary influences OpenStreetMap edits.

There is a lot of armchair mapping left to do, which I could spend my time on; alternatively, I could go and upload more imagery to Mapillary. Both seem fun.

To be able to take nice Mapillary photos, I would need to invest a bit in equipment. If this actually helps OSM, I am happy to do so. If it doesn’t really help, I would be spending my money on making a Meta-owned product better. It’s not like I’m holing particular grudges against Meta, but then this way of spending my money is probably suboptimal.

5 Likes