Managing Identifiers for Translations

Issue #2 – Pros and cons of three different approaches

Jul 10, 2023

In this issue we talk about different methods for managing identifiers associated with translations with their pros and cons. And did you know that some people don’t count their age from their actual date of birth? Some of us are even 500 years older! More on that later.

a bunch of keys sitting on top of a table — Photo by rc.xyz NFT gallery on Unsplash

Key Management

I found that the way of handling identifiers (or keys) for translations is a major upfront decision factor when working on software localization projects. Because later on, it is hard to switch to a different approach for managing keys. So let’s look at three popular approaches, and evaluate their impact.

1)

The first approach is to use the source text as the key. We see this with formats popular from the 90s, like gettext’s PO files.

Example: “Hello world!” → “Bonjour le monde!”.

With this approach it is easy to see the translation together with the source text, since the values are stored side by side in the file. Though this obviously means a lot of repetition of the source text in files, and if that text is long, files become large.

Another benefit of this approach is that within source code, the source text acts as a lookup key for retrieving translations. Plus, if there no translation provided, the code can easily fall back to the hardcoded source text.

However, trouble brews when you need to update the source text. Even when just adding a period or remove a space. Because when you alter the keys, you would need to sync the changes across all old and future translation files. And if the translators work asynchronously, merging that together afterwards could mean a loss of translations if you carelessly align them, because translations could point to the outdated key.

So this seems to be a bad approach, however it is suitable for small projects where you can oversee all translations easily. A neat little open source library for use with Node.js projects embodies this spirit: https://github.com/yargs/y18n.

2)

The second approach is having a user specified key for each source text. The translations refer to this key. We see this strategy with frameworks like i18next, FormatJS and .NET’s RESX files.

Example: “Hello world!” → [hello_world] → “Bonjour le monde!”.

The benefit of having this intermediate key is that you can give it meaning. For example, calling it [welcome-heading].

Additionally, keys can be categorized by giving them a specific prefix, e.g., [pages.welcome.heading]. This gives context to the translator, by grouping related keys together that match the same prefix.

The grouping allows to optionally perform code splitting, reducing the need to load unused translations.

Having the ability to differentiate with the key can also help resolve ambiguity. For example [military.order] → “Order” vs [commercial.order] → “Order” would indicate the difference in meaning of the same words.

Duplication is a possible drawback, because different keys could point to the same source text. Especially if people independently specify the keys.

Should the implementation hardcode the source text together with the key in source code, then you could end up with key collisions. In case there are different source texts associated with the same key. Something to be aware of and scan for in your codebase.

When the source text is in a language other than English, which is the de-facto language for source code, then the keys can also help to represent the source text in English, making it easier for developers to understand the code.

Renaming keys could still cause the same issues as with using the source text for the keys. Even worse, if you update the source text over time, but keep the same key, then you also need to keep track of the translations being updated accordingly. Tooling could detect for these problems though.

3)

The third approach is auto-generated keys. This is sort of a combination of the previous two approaches. In this case the source text is used to generate a cryptographic hash, which is used as the key. The Angular web framework uses this approach.

Example: “Hello world!” → [86fb269d190d2c85f6e0468ceca42a20] → “Bonjour le monde!”.

Because this approach uses the source text as input for the key, it has similar issues of the source text as key approach, like problems with renaming. Although the algorithm could ignore whitespace and special characters, but that could cause ambiguity.

Auto-generated keys do help with reducing the file size when source texts are large.

In Angular’s implementation of this approach, it abstracts the key generation away from the developer and translator. But because it is baked into the generated translation files, it does mean that you could be tied up with this approach over time.

In conclusion, renaming keys or source text remains problematic regardless of the chosen approach. However, having user specified keys does offer some additional benefits like grouping, avoiding ambiguity, implicit usage indication and code splitting, if you are willing to handle the side effects. Ultimately, the key management approach chosen should depend on the specific needs of the project.

man in gold wedding band — Photo by Eduardo Barrios on Unsplash

Did You Know?

So some of us being over 500 years older, what’s up with that? Well that cheeky age difference is because of various calendar systems used around the world. For instance, Thailand uses a Buddhist Era calendar system, which is roughly 543 years ahead of the Gregorian calendar.

And talking about birthdays, in some places in East Asia (including China, Japan, and Korea), instead of counting age based on the number of years since birth, it starts at one year old and adds one year to the age on each Lunar New Year's Day. It’s called age reckoning.

This means that someone born on December 31st could be considered two years old on January 1st. Additionally, everyone's age increases by one year on New Year's Day, regardless of their actual birthday. This system is still used in some formal and informal settings, particularly among older generations. With South Korea officially stopped using the older system on June 28, 2023.

Thanks for reading!

If you liked this newsletter or know someone else who might, consider subscribing at www.l10n.email to get new issues into your mailbox or view the archives.

L10n Guide

Managing Identifiers for Translations

Issue #2 – Pros and cons of three different approaches

Key Management

1)

2)

3)

Did You Know?