The answer may be here filtering on specific words. The other solution no longer works, it complaining being in need of a var value but not telling what var.
There are various download scripts kicking around (the perl revert scripts have a version) that you should be able to modify to include the discussion.
Of course, the wiki “documentation” tends to tag along after the code rather than be maintained with it, and I haven’t tested it so there may be a problem with this example.
The complete list of comments can be found in the discussions planet dump.
I’m interested too in changeset metadata, but I was not lucky in finding a tool able to filter this file… maybe libosmium can help, but my current C++ knowledge does not help.
Anyway I made some experiments with Python scripts in the past which I now patched to allow managing these multi gigabyte XML based files with regular PC RAM size.
I tested it with a 10GB discussions dump from 2015 that my old i5 cpu @ 3.1GHz processes at roughly 4.5M changesets per minute. I estimate that the analysis of the current dump should take 30 minutes, less if you have a better processor.
The script loads the dump in an incremental way, scans for comments made by a list of users and outputs a csv file with changeset id, user, comment time and comment text (this last field has carriage returns transformed to \n to allow better csv ingestion in other tools).
Let’s say that it’s far from perfect:
the input file needs to be in plain xml, so you have to decompress the .bz2 before