Not to worry, we’re working out all the edge cases. Your user accounts were just the first edge case to hit the test importer.
There are some interesting “conflicts” in the old forum (fluxbb) database. E.g.: Multiple user accounts with the same email address and different OSM IDs. (Example Scenario: User 1 logged into forum, User 1 email changed on osm.org, User 2 is created on osm.org with old User 1 email, User 2 logs into forum.)
The test import account merging now appears to be working well.
The logic is approximately:
If no known osm.org id (uid) and no known email for user: Create New community.osm.org account with an invalid email address. New account immediately suspended. (Can be manually merged post import by admins. Posts are imported OK and displayed normally)
Content formatting will not be perfect as a prefect conversion from fluxbb bbcode → discourse’s flavour of markdown is impossible, but the formatting now looks “good enough” to me.
The old forum.osm.org has a few old “mojibake” posts, I am unsure what happened here, likely a mysql unicode issue. These posts are broken in the fluxbb database and it is unlikely they could be easily recovered.
Note the test site does NOT currently allow login.
Do you know the original intended encoding? If so, it’s sometimes possible to recover the text from mojibake, though not always. If it’s only a few posts, then maybe their authors could patch them up after the fact if there’s any confusion.
However, in my experience mysql itself often keeps enough information to reconstruct such messages (usually by doing the dump to a file, editing a file and changing CHARSET=xxxx where needed (utf8? utf8mb4 ?), and then importing again) - it is often connecting client that mangles it (if one sets use names to same thing as the database/table/column uses, one can usually get data out in raw form, which can then be converted. If client however does not issue correct commands on startup, it will get gibberish or ?).
I’ve had some experience with cp1250 and utf8 being stuffed into database marked as charset=latin1 (which showed as all kind of corruption in the app/web) and recovering it without too much trouble; but I have no experience with Russian charsets (was it UTF8 or KOI8-R or something else initially?); and I have a feeling multi-charset nature of the database might make it more problematic.
Perhaps an file with the result of the the mysqldump --hex-block --no-opt --where=.... containing several problematic messages (as well as few non-problematic messages in other languages/charsets) might inspire someone to take a look, while not being problematic from privacy side (which giving mysql access to test instance probably would).
I agree that we don’t have to make the import ideal. But I think at least issues 3 and 5 are very important: asterisks are common in tagging discussions, and multi-level quotations are important. 4 was announced as finished by Harry (?) and I’m puzzled at why that didn’t work.
I think “good enough” will be the migrated message keeps reasonable formatting and the ability to understand the message is not changed.
Weird parsing error, not caused by importer. I will look into workaround, unlikely to be fixed.
Colour based markdown [color=gray] (bbcode source) is not supported by discourse. Unlikely to be fixed.
Difficult, the importer input: [building=*] output is [building=*], but discourse is swallowing the *. Likely not “good enough”, but unsure how I’d approach this.
This is already be handled by the permalink redirect code. Take the old url + parameters and use them on test forum URL. eg: https://forum-import-test.openstreetmap.org/viewtopic.php?pid=145548#p145548