As I’ve said in the title, i wonder if there’s a simple way to find the mappers that contributed the most changes/changesets ever on a per-country basis (that means the data should be restricted to the country’s bbox or something even more precise)
The options I’ve tried are:
The country pages in OSMstats (this gives the right information I need but it’s limited to the last 2 months; brute-forcing the HDYC pages of the users would get me part of the way there but unfortunately some mappers would be missed and that’s a bit of a time-consuming/complex process)
Manually querying the changeset dump to get that data (unfortunately the dump is a bit slow to download and even slower to process, and I’ve even ran into an XML parsing error so I don’t think I’ll try to do that again)
This example extracts the top contributors from an OpenStreetMap history file, ranked by total number of object versions (n/w/r) they created or modified. ( not changeset based )
You can download the history file from the Geofabrik internal server with your osm id.
Converts the OSM history file to OPL (text) format
2
cut -d' ' -f7
Extracts the 7th field (user field) from each line
3
sort
Sorts lines alphabetically (required for uniq)
4
uniq -ci
Counts occurrences, case-insensitive
5
sort -rn
Sorts by count, highest first
6
head -20
Shows top 20 results
Output Format
21172 urene78
6345 uRatZillaS
5772 uVlad
The number represents total object versions (nodes, ways, relations) attributed to that user.
The u prefix is part of the OPL format, indicating the user field.
URL-encoded characters may appear (e.g. %20 for space).
Important Notes
The .osh.pbf extension indicates a history file containing all versions of each object.
The count includes every version, not unique objects. If a user edited the same road 50 times, it counts as 50.
PS: Some users may change their username over time, which can split their contributions across multiple names in the output. For more accurate results, consider aggregating by the User ID field (field “i” in OPL format) instead of the username.
This works well enough that I could query HDYC for other data (like more precise changes count or changeset count) so I’m currently marking this as a solution (until a solution for changesets or lower RAM usage gets posted)
# ------------------------------------------------------------
# Top 3 Most Active Years in the Region
# ------------------------------------------------------------
# Shows which years had the most editing activity.
# Explanation:
# - cut -d' ' -f5 - extracts timestamp field
# - cut -c2-5 - extracts year (characters 2-5, skipping the "t" prefix)
# - Useful to identify peak mapping periods
osmium cat monaco-internal.osh.pbf -f opl | cut -d' ' -f5 | cut -c2-5 | sort | uniq -c | sort -rn | head -3
# ============================================================
# Convert OSM OPL to gzipped CSV, then query with DuckDB
# ============================================================
#
# Performance note:
# The "osmium cat" command is the memory-intensive part of this pipeline.
# Processing large .osh.pbf files (e.g. country or planet-level history)
# may require significant RAM on the server.
#
# Recommended workflow for large files:
# 1. Generate the .csv.gz file once on a server with sufficient memory
# 2. Download the compressed CSV to your local machine
# 3. Run DuckDB queries locally (DuckDB handles gzipped CSV efficiently)
#
# This approach separates the heavy lifting (osmium) from the analysis (DuckDB).
# ============================================================
# Single pipeline: OSM -> OPL -> CSV -> gzip (run on server)
(echo "obj_type,obj_id,version,deleted,changeset,timestamp,user_id,username"; \
osmium cat monaco-internal.osh.pbf -f opl | awk '{
print substr($1,1,1) "," substr($1,2) "," substr($2,2) "," substr($3,2) "," substr($4,2) "," substr($5,2) "," substr($6,2) "," substr($7,2)
}') | gzip > monaco.csv.gz
# DuckDB reads gzipped CSV directly (run on client)
duckdb <<'SQL'
-- Top 3 Users by Distinct Changeset Count
SELECT
username,
COUNT(DISTINCT changeset) AS changeset_count
FROM read_csv_auto('monaco.csv.gz')
GROUP BY username
ORDER BY changeset_count DESC
LIMIT 3;
SQL
Log
$ duckdb <<'SQL'
> -- Top 3 Users by Distinct Changeset Count
> SELECT
> username,
> COUNT(DISTINCT changeset) AS changeset_count
> FROM read_csv_auto('monaco.csv.gz')
> GROUP BY username
> ORDER BY changeset_count DESC
> LIMIT 3;
> SQL
┌────────────────┬─────────────────┐
│ username │ changeset_count │
│ varchar │ int64 │
├────────────────┼─────────────────┤
│ Oleg%20%Kholin │ 541 │
│ rene78 │ 529 │
│ Pa413 │ 410 │
└────────────────┴─────────────────┘