Concerning the use of AI for Translation and Research

I map extensively in Thailand. I do not speak Thai so I often need help in translating and transliterating Thai names into English. For this discussion let me first clarify some terminology: transliteration is about converting the letters of the Thai name into English letters following a specific system. In Thailand, the system we use is the Royal Thai General System of Transcription (RTGS) and we have a very handy program (Thai Romanization, Dept. of Linguistics, Chulalongkorn University), to assist in doing that. The output of this program would be properly tagged as name:th-Latn. A translation, however, involves taking the meaning of the Thai name and finding an English equivalent meaning. Such translations can be fairly difficult if not impossible for non-Thai speakers.

OSM has specialized tags for the results of each of these methods. Given: we work in Thailand so the “name” tag is almost always in Thai and any (optional) English translation has a “name:en” tag. Let me give some examples, a simple one and then a more complicated one. Here is the tagging for the names of a Buddhist place_of_worship in my neighborhood:

  • amenity=place_of_worship
  • name=วัดเสาหิน
  • name:th=วัดเสาหิน
  • name:th-Latn=Wat Sao Hin
  • name:en=Sao Hin Temple

This trivial example is straightforward. For years most of the Thailand mappers I know have used the name:th-Latn transliteration as the value for the name:en tag. Most of us expats use the th-Latn form when talking about such places, Wat Sao Hin rolls off my tongue more readily than Sao Hin Temple. However, Wat Sao Hin is not official English.

Here’s a different example and the one that prompted me to write this post. I noticed the names of some power transmission lines in my neighborhood of Saraphi this morning. The Thai mapper didn’t add name:en to any of them and I was curious to see if the new AI bots available now on the Internet might provide helpful translations.

  • power=line
  • name:สายส่งลำพูน 2 - เชียงใหม่ 2
  • name:th-Latn=Sai Song Lamphun 2 - Chiang Mai 2
  • name:en=Lamphun 2 - Chiang Mai 2 Transmission Line

In this example, the th-Latn transliteration is no doubt correct but is probably meaningless to most English speakers. Google Gemini AI provided the English translation which I then used in a new name:en tag. The AI bots that have recently appeared are far from perfect but they will only get better and languages are one area where they’re pretty damn good already. I have used Gemini to write computer code in Python, a language I do not know (and now have no need to know). I told it what I wanted to do and it spit out a working script in a few seconds!

My original query was: Please translate the following into English: สายส่งลำพูน 2 - เชียงใหม่ 2

"Sure. The Thai phrase “สายส่งลำพูน 2 - เชียงใหม่ 2” can be translated into English as “Lamphun 2 - Chiang Mai 2 Transmission Line”.
The word “สายส่ง” (pronounced “sai-song”) means “transmission line” in Thai. The word “ลำพูน” (pronounced “lam-poon”) is the name of a province in Thailand. The word “เชียงใหม่” (pronounced “chee-ang-mai”) is the name of a city in Thailand.
Therefore, the literal translation of “สายส่งลำพูน 2 - เชียงใหม่ 2” is “Transmission Line from Lamphun 2 to Chiang Mai 2”. However, it is more natural to translate it as “Lamphun 2 - Chiang Mai 2 Transmission Line” in English. "

Great answer, don’t you agree?

But the other part of this post is about some extra information Gemini provided when I asked it to add some details.

Please translate the following into English: สายส่งลำพูน 2 - เชียงใหม่ 2 and provide all the details you have access to:

"สายส่งลำพูน 2 - เชียงใหม่ 2 (Lamphun 2 - Chiang Mai 2 Transmission Line) is a 230-kilovolt (kV) transmission line in Thailand. It is owned and operated by the Electricity Generating Authority of Thailand (EGAT). The line runs from Lamphun Province to Chiang Mai Province, a distance of approximately 100 kilometers.

The line was constructed in 2009 and energized in 2010. It is part of EGAT’s plan to strengthen the transmission grid in northern Thailand. The line helps to ensure a reliable supply of electricity to the region, which is home to a growing population and economy.
The line consists of two single-circuit towers. The towers are made of steel and concrete and are approximately 40 meters tall. The line is strung with conductors made of aluminum alloy.
The line is designed to transmit up to 1,200 megawatts (MW) of electricity. It is currently operating at a capacity of approximately 800 MW.
The line is an important part of Thailand’s electricity infrastructure. It helps to ensure a reliable supply of electricity to the northern region of the country.
Here are some additional details about the line:

Length: 100 kilometers
Voltage: 230 kV
Capacity: 1,200 MW
Construction: 2009
Energization: 2010
Owner: EGAT
Operator: EGAT "

I believe the translation is okay for us to use, However, when I asked if any of the other information was protected by copyright its answer wasn’t clear: some parts might be but it didn’t know for sure which parts nor could it identify any source(s) of the information.

So, I’m looking for comments and opinions about the use of AI as I’ve outlined it here.

Dave

Quick answer from my side:

In this thread: Choosing the right English Name for a POI

  • We covered the need to maintain a list of generic terms that must remain in name:en. Wat is actually one of them, like ‘Ko’ (island)
  • Google or ChatGPT will give you good hints for the English prefix/suffix terms, but should not be used for the names transliteration as it is not using the RGTS system (no pb here for those province names)
  • an online service exists translate names to RTGS (again not relevant here)

Otherwise the translation your proposed looks good to me.

1 Like

@Julien,
Thanks for pointing out that thread. I haven’t been paying close attention to this forum so it was news to me. I have no problem with retaining the use of the word “Wat” for names of temples (also Ban, Ko, Soi, etc.), but I think, and correct me if I’m wrong, the th-Latn name should be used in many other cases to tag the RTGS transliteration.

The words that emerge from the translation programs whether online or not, don’t seem appropriate for the name:en tag although the conversation in the thread you referred me to seems to suggest that it is.

As for the other example, the use of Google Translate AI on my phone has helped me interpret road names and informational signs along my bicycle routes. It’s so nice to be able to understand signs that were once totally incomprehensible to me.

As far as I understood, the name:th-Latn should always contain the RGTS transliteration.
However, commonly used generic Thai terms like (Wat, Soi, Ko, Ban) should also be included in the name:en tag instead of their english translation (Temple, Street, Island, Village):

e.g. name:en=Wat Sao Hin (instead of name:en=Sao Hin Village)
e.g. name:en=Ban Khun Chang Khian (instead of name:en=Khun Chang Khian Village`)

The translation program from Chula university only transliterates Thai using RTGS so it is only appropriate for the name:th-Latn, but it can be also used to transliterate non-English keywords that goes into name:en (see my process below)

I find Google Translate to be a helpful tool, and I do use it myself. However, it’s important to note that specific keywords may not always be transliterated accurately with RTGS, and they can even change over time.

e.g. “เกาะเสม็ด” was once translated as “Koh Samed,” but now it’s “Koh Samet,” but it should be “Ko Samet”.

e.g. “สวนสาธารณะหนองบวกหาด” was once translated as “Nong Buak Hard Public Park,” but it’s now “Nong Buak Hat Public Park,” which is correct.

When I need to translate a place name, I follow this process:

  • First, I check if there’s a Wikipedia entry and reuse it if available.
  • If it’s a Wat, Ban, Soi, Ko, or similar, I remove the Thai prefix and transliterate the place name with RTGS.
  • Otherwise, I use Google Translate to identify generic terms and their placement, then remove those terms from the Thai input and transliterate the rest.

This process can be cumbersome, and I try to avoid it whenever possible. Additionally, there’s the issue of road signs not always matching RTGS or commonly used English names.

Nevertheless, a decent translation is still better than none at all. So, if you start with Google Translate and someone improves it later, that’s perfectly fine. But if there’s already a translation out there, and Google offers a different one, just keep in mind the potential problems I mentioned earlier.

I should have written transliteration programs, not translation. I understand and have practiced for a long time the use of the output of those transliteration programs in the :th-Latn tags.

In addition to your observations here:

e.g. “เกาะเสม็ด” was once translated as “Koh Samed,” but now it’s “Koh Samet,” but it should be “Ko Samet”.

e.g. “สวนสาธารณะหนองบวกหาด” was once translated as “Nong Buak Hard Public Park,” but it’s now “Nong Buak Hat Public Park,” which is correct.

there has always been an issue with certain letters that appear last in Thai transliterations to English. Many signs have it one way or the other, sometimes both. Pa Daet Road (ชม. 4032), is also spelled Pa Daed. The Thai pronunciation is sort of a combination of both but without any plosive sound at the end.

When in doubt about how a name should be spelled or what form it should have, as when a sign disagrees with the RTGS spelling, I usually add an alt_name:en to cover both possibilities.

1 Like

The online transcription link from the Chulalongkorn University http://161.200.50.2/th2roman has been down for a long time. Is there an alternative?

1 Like

This one worked well for me. Was also used in a library for automatic map transliteration:

Khun Wirote also responded quickly to a bug report I had.

2 Likes

Sounds interesting. Is there a GUI available to access this Toolkit?

It is a Python library, so unfortunately no GUI included. There might be some examples around adding a GUI, but I am not aware.

Many months ago, I had explored the possibility to image-to-text street signs and then transliterate them. Quality was not to what I expected, to I not followed that up further.

I’ve set up a small online service using the tltk library.

To use it, simply enter the Thai text in the form and submit. The results will be displayed below.

Here are some examples:

Note: The service is hosted on Google Cloud’s free tier, so it may be slow to respond initially.

@julcnx,

Wonderful. Thanks for setting that up.

I inputted this Thai phrase that I copied form a Facebook post about my young granddaughter: Her grandmother is my wife, Nongnuch:
ผลอแป๊บเดียวเป็นสาวแล้วหรอคะพี่นุช สวยเหมือนคุณยาย

The result from the tltk implmentation is as follows:

In just a moment, you’ll be a girl, right? Sister Nuch is as beautiful as Grandma.

Google’s Gemini AI bot came up with this:

Wow, [sister’s name], you’ve grown up so quickly! You’re as beautiful as your grandmother.

(This was followed by several paragraphs of useful explanatory text.)

Both translations are possibly correct but I can’t judge that because I don’t speak or understand Thai. Given that I know what the relationship between the granddaughter and my wife is, the Gemini interpretation makes more sense and is more accurate.

Thanks again for your good work, Julien. I’m glad someone rose to the challenge.