gettext 1.0.8
Internationalization compatible with the GNU gettext utilities.
To use this package, run the following command in your project's root directory:
Manual usage
Put the following dependency into your project's dependences section:
This package provides sub packages which can be used individually:
gettext:merge - Merge existing translations with a new template.
gettext:po2mo - Batch execution of gettext msgfmt.
gettext:todo - Find unmarked string literals.
Gettext
The GNU gettext
utilities provide a well established solution for the internationalization of software. It allows users to switch between natural languages without switching executables. Many commercial translation offices can work with GNU gettext
message catalogs (Portable Object files, PO), and various editors exist that help with the translation process. The translation process and programming process can happen asynchronously and without knowledge of each other. New translations can be added without recompilation.
The use of GNU gettext
in D has been enabled by the mofile package, that this Gettext package builds on. If you would only use mofile
directly then you would depend on the GNU xgettext
utility for the task of string extraction, hoping it can parse D code as if it were C code. You would also be dealing with a number of limitations that are native to GNU gettext
.
This Gettext package removes the need for an external parser and provides a more powerful interface than GNU gettext
itself. It combines convenient and reliable string extraction - enabled by D's unique language features - and a comprehensive integration with Dub, while leveraging a well established ecosystem for translation into other natural languages.
Contents
- Features
- Installation
- Usage
- Example
- Impact on footprint and performance
- Limitations
- Credits
- Todo
Features
- Concise translation markers that can be aliased to your preference.
- All marked strings that are seen by the compiler are extracted automatically.
- All (current and future) D string literal formats are supported.
- Static initializers of fields, constants, immutables, manifest constants and anonymous enums can be marked as translatable (a D specialty).
- Translatable strings may be part of generated and mixed in code (another D specialty).
- Concatenations of translatable strings, untranslated strings and single chars are supported, even in initializers.
- Arrays of translatable strings are supported, also when statically initialized.
- Plural forms are language dependent, and play nice with format strings.
- Multiple identical strings are translated once, unless they are given different contexts.
- Notes to the translator can be attached to individual translatable strings.
- Code occurrences of strings are communicated to the translator.
- Available languages are discovered and selected at run-time.
- Platform independent, not linked with C libraries.
- Automated generation of the PO template.
- Automated merging into existing translations (requires GNU
gettext
utilities). - Automated generation of Machine Object files (MO) (requires GNU
gettext
utilities). - Includes utility for listing unmarked strings in the project.
Installation
Dub configuration
Add the following to your dub.json
:
<details open>
<summary>dub.json</summary>
"targetType": "executable",
"dependencies": {
"gettext": "~>1"
},
"configurations": [
{
"name": "default"
},
{
"name": "i18n",
"preGenerateCommands": [
"dub run --config=xgettext",
"dub run gettext:merge -- --popath=po --backup=none",
"dub run gettext:po2mo -- --popath=po --mopath=mo"
],
"copyFiles": [
"mo"
]
},
{
"name": "xgettext",
"targetPath": ".xgettext",
"versions": [ "xgettext" ],
"subConfigurations": {
"gettext": "xgettext"
}
}
]
</details>
or its equvialent dub.sdl
:
<details>
<summary>dub.sdl</summary>
dependency "gettext" version="~>1"
configuration "default" {
targetType "library"
}
configuration "i18n" {
targetType "library"
copyFiles "mo"
preGenerateCommands \
"dub run --config=xgettext" \
"dub run gettext:merge -- --popath=po --backup=none" \
"dub run gettext:po2mo -- --popath=po --mopath=mo"
}
configuration "xgettext" {
targetType "library"
targetPath ".xgettext"
subConfiguration "gettext" "xgettext"
versions "xgettext"
}
</details>
This may seem quite the boiler plate, but it automates many steps without taking away your control over them. We'll discuss these further below.
Module import
import gettext;
main()
function
Insert the following line at the top of your main
function:
mixin(gettext.main);
Ignore generated files
The PO template and MO files are generated, and need not be kept under version control. The executable in the .xgettext
folder is an artefact of the string extraction process. If you use Git, add these lines to .gitignore
:
.xgettext
*.pot
*.mo
Usage
Marking strings
Prepend tr!
in front of every string literal that needs to be translated. For instance:
writeln("This string will remain untranslated.");
writeln(tr!"This string is to be translated");
Note that you may rename tr
to whatever you want:
import gettext : _ = tr;
writeln(_!"This string is to be translated");
No additional changes to any configurations are needed to make this work.
Plural forms
Sentences that should change in plural form depending on a number should supply both singular and plural forms with the number like this:
// Before:
writefln("%d green bottle(s) hanging on the wall", n);
// After:
writeln(tr!("one green bottle hanging on the wall",
"%d green bottles hanging on the wall")(n));
Note that the format specifier (%d
, or %s
, etc.) is optional in the singular form.
Many languages have not just two forms like the English language does, and translations in those languages can supply all the forms that the particular language requires. This is handled by the translator, and is demonstrated in the example below.
Marking format strings
Translatable strings can be format strings, used with std.format
and std.stdio.writefln
etc. These format strings do support plural forms, but the argument that determines the form must be supplied to tr
and not to format
. The corresponding format specifier will not be seen by format
as it will have been replaced with a string by tr
. Example:
format(tr!("Welcome %s, you may make a wish",
"Welcome %s, you may make %d wishes")(n), name);
The format specifier that selects the form is the last specifier in the format string (here %d
). In many sentences, however, the specifier that should select the form cannot be the last. In these cases, format specifiers must be given a position argument, where the highest position determines the form:
foreach (i, where; [tr!"hand", tr!"bush"])
format(tr!("One bird in the %1$s", "%2$d birds in the %1$s")(i + 1), where);
Again, the specifier with the highest position argument will never be seen by format
. On a side note, some translations may need a reordering of words, so translators may need to use position arguments in their translated format strings anyway.
Note: Specifiers with and without a position argument must not be mixed.
Concatenations
Translators will be able to produce the best translations if they get to work with full sentences, like
auto message = format(tr!`Could not open the file "%s" for reading.`, file);
However, in support of legacy code, concatenations of strings do work:
auto message = tr!`Could not open the file "` ~ file ~ tr!`" for reading.`;
Passing attributes
Optionally, two kinds of attributes can be passed to tr
, in the form of an associative array initializer. These are for passing notes to the translator and for disambiguating identical sentences with different meanings.
Passing notes to the translator
Sometimes a sentence can be interpreted to mean different things, and then it is important to be able to clarify things for the translator. Here is an example of how to do this:
auto name = tr!("Walter Bright", Comment("Proper name. Phonetically: ˈwɔltər braɪt"));
The GNU gettext
manual has a section about the translation of proper names.
Disambiguate identical sentences
Multiple occurrences of the same sentence are combined into one translation by default. In some cases, that may not work well. Some language, for example, may need to translate identical menu items in different menus differently. These can be disambiguated by adding a context like so:
auto labelOpenFile = tr!("Open", Context("Menu|File"));
auto labelOpenPrinter = tr!("Open", Context("Menu|File|Printer"));
Notes and comments can be combined:
auto message1 = tr!("Review the draft.", Context("document"));
auto message2 = tr!("Review the draft.", Context("nautical"),
Comment(`Nautical term! "Draft" = how deep the bottom` ~
`of the ship is below the water level.`));
They work on plural forms too:
writeln(tr!("One license.", "%d licenses.", Context("software"),
Comment("Notice to translator."))(n));
writeln(tr!("One license.", "%d licenses.", Context("driver's"))(n));
Selecting a translation
Use the following functions to discover translation tables, get the language code for a table and activate a translation:
string[] availableLanguages(string moPath = null)
string languageCode() @safe
string languageCode(string moFile) @safe
void selectLanguage(string moFile) @safe
Note that any translation that happens before a language is selected, results in the value of the hard coded string.
Finding unmarked strings
To get an overview of all string literals in your project that are not marked as translatable, execute the following in your project root folder:
dub run gettext:todo -q
This prints a list of strings with their source file names and row numbers.
Fixing compilation errors
An attempt to translate a static string initializer will cause a compilation error, because the language is only selected at run-time. For example:
const string statically_initialized = tr!"Compile-time translation?";
will produce an error like this:
d:\SARC\gettext\source\gettext.d(285,20): Error: static variable `currentLanguage` cannot be read at compile time
source\mod1.d(7,24): called from here: `TranslatableString("Compile-time translation?").gettext()`
Unless you're initializing a mutable static variable, the solution is to remove the explicit string
type and let the type be inferred:
const statically_initialized = tr!"Compile-time translation!";
The correct translation will then be retrieved at the places where this constant is used, at run-time.
The way this works is that the type of the constant gets to be inferred as TranslatableString
, which is a callable struct defined by this package. Whenever an instance of this struct is evaluated, the value of the translation is retrieved.
But, there are places where you wouldn't want to change the type away from string
, like the initializer of a mutable static variable or an aggregate member. In these cases there is no other way than to move to run-time assignment until after the language has been selected.
Added steps to the build process
Since the first configuration in your dub.json
is empty (the "default"
configuration) nothing special happens when you just do
dub run
So your normal code - compile - run - test cycle is not slowed down by any additional steps.
But when you do
dub run --config=i18n
the preGenerateCommands
and copyFiles
sections of the i18n
configuration kick into action, which cause a couple of tasks to be performed:
- Translatable strings are extracted from the sources into a PO template.
- Translations in any existing PO files are updated according to the new template.
- PO files are converted into binary MO files.
- MO files are copied to the target directory.
We'll discuss these in a little more detail below.
Creating/updating the PO template automatically
In other languages, string extraction into a .pot
file is done by invoking the xgettext
command line tool from the GNU gettext
utilities. Because xgettext
does not know about all the string literal syntaxes in D, and cannot scan any generated code that may be mixed in, we employ D itself to perform this task.
This is how this works: The dub run --config=xgettext
line in the preGenerateCommands
section of your Dub configuration compiles and runs your project into an alternative targetPath
and executes the code that you have mixed in at the top of your main()
function. That code makes smart use of D language features (see credits) to collect all strings that are to be translated, together with information from your Dub configuration and the latest Git tag. The rest of your main()
is not executed in this configuration — but strings are still extracted. In any other configuration the mixin is actually empty.
By default this creates (or overwrites) the PO template in the po
folder of your project. This can be changed by using options; To see which options are accepted, run the command with the --help
option:
dub run --config=xgettext -- --help
Example
The teohdemo
test contained in this package produces the following teohdemo.pot
:
# PO Template for teohdemo.
# Copyright © 2022, SARC B.V.
# This file is distributed under the BSL-01 license.
# Bastiaan Veelo, 2022.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: v1.0.4\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2022-07-09T20:52:52.4027136Z\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: source/app.d:10(main)
#, c-format
msgid "Selected language: %s"
msgstr ""
#: source/mod1.d:13(fun1) source/mod2.d:15(fun3)
msgid "Identical strings share their translation!"
msgstr ""
#: source/mod1.d:7
#, c-format
msgid "Hello! My name is %s."
msgstr ""
#: source/mod2.d:13(fun3)
msgid "Never used, but nevertheless translated!"
msgstr ""
#: source/mod2.d:8(fun2)
#, c-format
msgid "I'm counting one apple."
msgid_plural "I'm counting %d apples."
msgstr[0] ""
msgstr[1] ""
Updating existing translations automatically
The "dub run gettext:merge -- --popath=po"
pre-generate command invokes the merge
script that is included as a subpackage. This script runs the msgmerge
utility from GNU gettext
on the PO files that it finds. When needed, the path to msgmerge
can be specified with the --gettextpath
option. Any additional options are passed on to msgmerge
directly, see its documentation. For example, you can use the --backup=numbered
option to keep backups of original translations.
Note that if translatable strings were changed in the source, or new ones were added, the PO file is now incomplete. This is detected by the script, which then prints a warning. Changed strings are marked as #, fuzzy
in the PO file, which can be picked up by editors as needing work. If a lookup in an outdated MO file does not succeed, the application will show the string as it occurs in the source.
Converting to binary form automatically
Similar to the previous step, the "dub run gettext:po2mo -- --popath=po --mopath=mo"
pre-build command invokes the po2mo
subpackage, which runs the msgfmt
utility from GNU gettext
. This converts all PO files into MO files in the mo
folder. This folder is then copied to the target directory for inclusion in the distribution of your package. Any additional options are passed on to msgfmt
directly, see its documentation.
Adding translations
Each natural language that is going to be supported requires a .po
file, which is derived from the generated .pot
template file. This .po
file is then edited to fill in the stubs with the correct translations.
There are various tools to do this, from dedicated stand-alone editors, editor plugins or modes, web applications to command line utilities.
Currently my personal favourite is Poedit. You open the template, select the target language and start translating with real-time suggestions from various online translation engines. Or you let the AI give it its best effort and translate all messages at once, before reviewing the problematic ones (requires subscription). It supports marking translations that need work and adding notes.
Updating translations
Any translations that have fallen behind the template will need to be updated by a translator. To detect any such translations, you can scan for warnings in the output of this command:
dub run -q gettext:merge -- --popath=po
Warnings will also show if GNU gettext
detected what it thinks is a mistake. Sadly it sometimes gets it wrong: Weekdays, for example, are capitalized in English but not in many other languages. If a translation string only consists of one word, a weekday, it guesses that it is the start of a sentence and will complain if the translation does not start with a capital letter. Therefore, translatable strings should be full sentences if at all possible.
PO file editors will typically allow translators to quickly jump between strings that need their attention.
After a PO file has been edited, MO files must be regenerated with this command:
dub run gettext:po2mo -- --popath=po --mopath=mo
Of course you can also simply rerun
dub run --config=i18n
once more to execute both above commands in succession.
Example
These are some runs of the included teohdemo
test:
d:\SARC\gettext\tests\teohdemo>dub run -q
Please select a language:
[0] default
[1] en_GB
[2] nl_NL
[3] uk_UA
1
Hello! My name is Joe.
I'm counting one apple.
Hello! My name is Schmoe.
I'm counting 3 apples.
Hello! My name is Jane.
I'm counting 5 apples.
Hello! My name is Doe.
I'm counting 7 apples.
d:\SARC\gettext\tests\teohdemo>dub run -q
Please select a language:
[0] default
[1] en_GB
[2] nl_NL
[3] uk_UA
3
Привіт! Мене звати Joe.
Я рахую 1 яблуко.
Привіт! Мене звати Schmoe.
Я рахую 3 яблука.
Привіт! Мене звати Jane.
Я рахую 5 яблук.
Привіт! Мене звати Doe.
Я рахую 7 яблук.
Notice how the translation of "apple" in the last translation changes with three different endings dependent on the number of apples.
Impact on footprint and performance
The implementation of Gettext keeps generated code to a minium. Although the tr
template is instantiated many times with unique parameters, it does not instantiate a new function each time. All that is left of a tr
instantiation after compilation are the references to the strings that were passed in.
The discovery of translatable strings happens at compile time in the xgettext
configuration, and the generation of the PO template happens during execution of the result of that compilation. This process takes about as much time as a regular compilation of your project.
There is a run time cost to the lookup of strings in the MO file. Currently, mofile reads the entire file into memory and does a binary search for the untranslated string to find the translated string. In case the cost of this lookup would become noticeable, mofile
could easily be modified to cache the search with std.functional.memoize
. Even memoizing a small number of lookups could have a big impact on the evaluations in an event loop.
Limitations
Wide strings
Attempts to translate a wstring
or dstring
will result in a compilation error:
auto w = tr!"Hello"w; // Error: template `gettext.tr` does not match any template declaration
It would be pointless for this package to try and support all string widths. After all, the hello
literal above is assembled as an array of UTF-8 chars, which is then converted to wstring. GNU gettext
works internally with UTF-8, so it would need to convert the wstring from UTF-16 back to UTF-8, and after translation convert to UTF-16 again before it returns.
This limitation is easily dealt with by converting the translated string after lookup:
auto w = tr!"Hello".to!wstring;
Forced string evaluation
In some cases it may be necessary to forcefully evaluate a translatable string as a string instead of a TranslatableString
instance:
static const tr_and_tr = tr!"One " ~ tr!"sentence.";
assert (tr_and_tr.toString == tr!"One sentence.".toString); // Fails without `.toString`.
Justified Strings
Format strings accept a width argument so that
"hi".format!"%10s"; // " hi";
produces a string of width 10 in which the contents are right justified. However, passing a translatable string directly will not work as intended:
tr!"hi".format!"%10s"; // "hi";
Justification can be made to work by forcing translation of the translatable string before feeding it into format
, like so:
tr!"hi".toString.format!"%10s"; // " hi";
But since std.format
is known to be heavy on compile times, it is probably better to use std.string.rightJustify
instead, with either of these two alternatives:
tr!"hi".rightJustify!string(10); // " hi";
tr!"hi".toString.rightJustify(10); // " hi";
Note that using rightJustify
directly without explicit !string
instantiation will not compile due to the isSomeString
template requirement of rightJustify
.
Named enums
Members of named enums need forced string evaluation, otherwise they resolve to the member identifier name instead:
enum E {member = tr!"translation"}
writeln(E.member); // "member"
writeln(E.member.toString); // "translation"
Contrary, anonimous enums and manifest constants do not require this treatment:
enum {member = tr!"translation"}
writeln(member); // "translation"
Credits
This package was sponsored by SARC B.V. The idea for automatic string extraction came from H.S. Teoh [1], [2], with optimizations by Steven Schveighoffer [3]. Reading of MO files was implemented by Roman Chistokhodov [4].
TODO
Investigate the merit of:
- Domains and Library support.
- Default language selection dependent on system Locale.
- Using Compendia.
- 1.0.8 released a year ago
- veelo/gettext
- BSL-1.0
- Copyright © 2022, SARC B.V.
- Authors:
- Sub packages:
- gettext:merge, gettext:po2mo, gettext:todo
- Dependencies:
- mofile
- Versions:
-
1.0.8 2023-Sep-01 1.0.7 2023-Aug-27 1.0.6 2023-Aug-25 1.0.5 2023-Jul-14 1.0.4 2022-Aug-26 - Download Stats:
-
-
0 downloads today
-
0 downloads this week
-
0 downloads this month
-
115 downloads total
-
- Score:
- 1.4
- Short URL:
- gettext.dub.pm