Discourse Translator - Pricing and Language Coverage

A few thoughts around translation volume and costs.

To minimise the translation volume and thus the cost risk, the translation function could be limited to a few categories and languages at the beginning. Experience can then be gained in this “sandbox” and scaling up to unlimited language translation should be possible easy and safe.

I’m not sure if that’s a realistic option, but it would also be possible to offer the translation provider a cooperation. Translation corrections (of available or new languages) by the OSM community in exchange for free translation service from the translation provider. The prerequisite is, of course, that this is also technically realisable.

I think, this approach underlies the DeepL translation website DeepL Translate: The world's most accurate translator. There it is also possible to correct the translation by clicking on a word, which will probably be used to train the translation AI. A win-win situation for provider and user.

Another advantage would be, that the translation AI can learn how the OSM community communicate and this will increase translation quality also rapidly.

This list of diary posts that Tom shared is very telling, since I assume it’s not biased by the communication channel these people use (that can be forum, mailing list or even telegram people) and gives a good picture about OSM Communities activity.

Do we know what’s the coverage of these languages per translation engine? Maybe it should be optimized to at least cover languages with >50 posts?

It looks like the plugin is calling the API more often that expected, only to detect in which language the post is written.

I discovered that because my free 500KB quota on DeepL has been exhausted in a couple of days only on an instance I was barely using.

There is another option to detect post languages: GitHub - peterc/whatlanguage: A language detection library for Ruby that uses bloom filters for speed.

No API, only local

I’ve also sent a mail to DeepL to see if they have special offers for free / open projects… it cost nothing to ask :wink:

1 Like

The DeepL fork in particular or the translator plugin in general?

DeepL does not provide a language detection endpoint on its API, so the plugin is translating the post to the default instance language.

Google and Microsoft have a language detection endpoint, I don’t know if it is out of quota.

I don’t know much about ruby, but the code looks quite simple to me and it should not be too complex to improve it by doing language detection locally and/or mixing APIs.

Comparing the table posted by @TomH with the supported languages by translation engines yields this overview:

Translation Engine % of translatable diary posts
LibreTranslate 97.0%
DeepL 95.7%
Yandex 99.6%
Bing 99.5%
Google 99.5%

Not sure if this helps to make a decision, given the high coverage overall.

2 Likes

This sounds borked: So the plugin translates the text of the post automatically just in order to decide whether to show the :globe_with_meridians: (translate) button at all? Could then just translate everything automatically without user interaction, it would result in a smaller amount of API calls. (Maybe this is the reason why there is no support for DeepL on the mainline version of the plugin?)

FYI, LibreTranslate quality decreases when translation is not to or from english.

french → german will in fact be french → english → german and of course, the quality decreases.

I’ve installed it some time ago to do some local tests.


Language detection is really something that could be done locally, its is much simpler then translation and a lot of code exist to do this without calling an API, for DeepL and others too.

1 Like

Sure, this goes beyond using a plugin or using a fork of a plugin though. It requires a developer (Ruby).

It is of course biased by the communication channel that is used (diary). For example I am quite active in the German forum and on the Italian mailing list and telegram channel (all in local language), but all diary posts I have made were in English because it is the language I expected would have the farest reach, and those diary posts were not intended for a specific local audience.

4 Likes

I looked into the DeepL thing a bit.

  1. I found out why this fork is not merged: See DeepL integration for Translator plugin - marketplace - Discourse Meta and DeepL support for Discourse Translator - #28 by koen360 - feature - Discourse Meta . Summary: DeepL API is a bit dicey according to the plugin author and the fork is not in a state to merge it. The author estimated that it would take at least $500 to get it to a state where it could be merged. The issue @cquest encountered with that the plugin creates too many API calls is known and there is no workaround for it it looks like.

  2. The Ruby library GitHub - peterc/whatlanguage: A language detection library for Ruby that uses bloom filters for speed. to detect languages is 15 years old and hasn’t been updated much since then, the number of languages it supports is very limited. It is documented to work only well with posts that are long enough (longer than a twitter post). So, if the translator plugin was modified to do the language detection locally with that library, it will probably never be merged upstream due these limitations.

Given these two things, using the DeepL-fork of the translator plugin sounds a bit dicey to me. I don’t know what is missing in the DeepL fork, but given the amount the author estimated to finish it, I don’t have a good feeling about this. If we don’t get DeepL to sponsor us, the possible solutionss to workaround the “too many API calls” issue sound really dicey to me.

Do you have a source for this?

LibreTranslate is an API and web-app built on top of Argos Translate.

Here are the vailable models: Argos Open Tech

All of them (unless I missed one) are from or to english.

Another option is to use LibreTranslate which seems to support language detection…

I’m quite sure that with all the available tools we have, this plugin can be improved to better fit our needs.

I see that Bing has been mentioned as an option. They are an OSMF corporate member so might be open to sponsoring this. The named contact on the OSMF Advisory Board is Harsh Govind. Let me know if you’d like me to send him an email.

2 Likes

@RobJN I’m not sure if there is anyone participating in this thread who would have the authority to say to go ahead but you could ask tentatively (de:unverbindlich - “without obligation”?) - like @cquest did for DeepL.

If Microsoft would sponsor automatic translation in this forum, I think it would be the best and easiest solution because:

  1. Bing has pretty much complete language coverage
  2. Discourse Translator plugin can be used out of the box, basically no effort for the admins (unlike for DeepL where that fork with rough edges needs to be used and probably some sort of workaround be implemented)
  3. From the few that tried Bing translation so far, it sounded like the translation quality was reasonably good, apparently better than Google and maybe even on par with DeepL
3 Likes

@Firefishy would this work based on the installation that was decided to try?

Mir entzieht sich die Notwendigkeit einer maschinellen Übersetzung noch, nach ein wenig Überlegungen. Wer soll das alles lesen, wenn die ganze Welt sich an einem Thema abarbeitet? Mir scheint es ausreichend, wenn die Sprache des Topics maschinell erkannt werden kann. Dann können der Sprache Mächtige sich damit beschäftigen. Zusätzliche Hinweise, um was es geht, dienen natürlich dem Zweck, dann könnten nur Leute angesprochen werden, die in der Sache Bescheid wissen, usw. Kurz gesagt, Übersetzung ein Extra, Erkennung wünschenswert.

Mir entzieht sich die Notwendigkeit einer maschinellen Übersetzung noch, nach ein wenig Überlegungen. Wer soll das alles lesen, wenn die ganze Welt sich an einem Thema abarbeitet?

sicherlich wird man kaum alles lesen, aber derzeit finden defacto die meisten Diskussionen z.B. zur tag-Findung auf englisch statt, und wenn man das nicht versteht ist man auf Übersetzung angewiesen oder ausgeschlossen. Ähnlich auch bei Themen zur policy, und allgemein alles auf internationaler Ebene.

ENGLISH translation:
Nobody will read everything, but as most of the discussion about tagging and other topics (policy) etc. is in English, people who do not understand English are depending on automatic translation or are excluded from participating.

1 Like

@Hungerburg @dieterdreist It would be appreciated if you can post in English (even using a machine translation site) in topics that are already being discussed in this language. The forum doesn’t have yet capabilities to provide translation of your messages and it would help people here to understand you.

Thanks!

2 Likes