본문 바로가기

IT for my Life/CAT Tool Programming

Flawless TM that can save your life

There is a popular norm when it comes to a management. You have to see both forests and trees. The same comes to Translation Memory too. In this article I will define what forests and trees are in TM, and provide a guide how the computer coding and artificial intelligence can help you in your journey of endless piles of translations.

 

The whole article will be divided into 2 posts. First post will focus on more general & concept. Second part will focus on actual coding & programming language.

So, what is TM consisted of? Primary key, Source Text, and Translation. A primary key is called context in TM. If we can nail the perfect Context, Source, Translation then boom! Bright future lies ahead. However in real life, rest assured that will never happen. So the burden comes down to the translator to trim these flawed TM compartments into a spotless beauty.

Forests in TM

Let's say your client asks you to translate a line, "Got it." You immediately realize that you had translated this phrase yesterday. Then would it be okay to just copy-paste yesterday's translation? Well, it depends.

 

A line "Got it." could be used in many situation. Acknowledging, approval, picking up an item, destroying an enemy and so on. In some languages, copy pasting item pick up "Got it." to enemy destroyed "Got it." can lead to a serious mistranslation.

 

To solve this, you need to designate the situation that this phrase is taking place. This is called context. So in this case,

  • Context: Picking up an item
  • Source: Got it.

After checking that it's the same context, you can finally copy-paste the previous translation safely. This only works if you had set up your context very wisely. The only exception is the Narrative Audio-Visual(AV) Translation Memory.

 

Narrative AV TM

What's so special about Narrative AV TM? This particular TM has both narrative (storyline) and audio visual (subtitle) perspective. Whenever you come cross a client job that has characteristics like this, you'd better be at full alert.

 

In video games translation, the most common example of Narrative AV is the Single Player(Campaign). In Single Player scenes, there are almost infinite number of situation where conversations can take place. Based on location, number of speakers, how far the listeners are, time of the day, items they are holding, emotional mood they are in, personality, ethnic background, education, and so on. Naturally it's impossible to set up a perfect context for future copy-pasting.

 

This Narrative AV translation is where the translator's true proficiency shines, and also it is heavily labor focused. Like a copywriter, sometimes you have to spend hours to translate just one simple sentence. But even in the Narrative AV translation, TMs can provide a good well of resources you can grab from, instead of starting everything from zero.

 

To sum up, forests in TM look like this.

 

Our goal is to set up a perfect TM so that you can re-use the translation that you had done before. Now, let's move on to trees in TM.

 

Trees in TM

Trees in TM. They are individual lines. Just like a real tree, a TM tree has a leaf, stem and root. Each corresponds to context, source and translation.

 

These are the fundamentals of TM, and if anything happens to the integrity of these components your TM is effectively doomed. Think of a hospital where it's database has been compromised. You pay a visit to your doctor to get a medicine for a flu virus, but what you get is a prescription for amnesia.

 

The most important thing in TM integrity: If it's the same job, context and source MUST be identical.

 

Sadly clients often make errors with not only capitalization and typos. They also commit mistakes on style, context rules, linebreak, whitespace and symbols.

 

Context Errors

Let's first look into context errors. There is a line where a game character "John" fires his assault rifle. Context would be;

 

voice_john_scene_attack_assault_rifle_01

But on the next day the client delivers the similar context like this;

voice_scene_attack_assault_rifle_02_john

The client wrongly placed the character part at the end. So you have to grab the last character part move it to the position right after "voice".

 

Source Errors

Then there are source errors.

It's john --- atticking -- with (beat) assault... (linebreak) *sigh* Rifle?! --

The client misspelled John as john. Assault Rifle as assault Rifle, typo in attacking, overusing dashes, sentence symbols, and exerts. You need to fix this to;

It's John. Attacking. With Assault Rifle.

It is very important for a skilled programmer/translator to find out the overarching rule of your translation task. Only after then you can build up your own AI - or even a deep learning neural engine to fix these errors that your client made.

 

On my experience, without properly fixing context & source errors, you can copy-paste around 30% of your previous translation. With properly fixing one of context or source errors, the percentage can go up to 50%. However, if you fix both, you can make the number go up to even 100%. Meaning if you have task of 100,000 words you can just copy-paste everything and return the perfect, amazing quality translation.

AI & coding is essential in this process. Also you need to understand the rules and principles which even your client did not realize in their work.

 

In the next article, I'll explain about ComOBJ method, Safearray and Regex approach that can make all of these possible.