Skip to content
Caution: You are browsing the legacy symfony 1.x part of this website.

Day 23: Internationalization

Language

Previously on symfony

Now that you learned how to transfer a symfony application to a production host, the askeet application can run anywhere. But what if someone decided to use it in a non-English speaking country like, say, France?

Askeet being an open-source project, we hope that people from all over the world will use it shortly. Not only does that mean that all the files of the project have to be encoded in utf-8, the application also has to propose a multilingual interface and content localization.

Think about the multinational companies that are going to install askeet on their Intranet to have a knowledge management base. They will definitely require that users can switch interface language or content rather than install one askeet per language... Fortunately, the choices made during the eighteenth day to implement universes will ease our task a lot, and symfony has native support for internationalized interfaces.

Localization

What if the call to an address like:

http://fr.askeet.com/

...displayed only the French questions? Well, this is quite easy, because since the eighteenth day, such an URI is understood as a universe.

Content

Creating a question in a language universe will have it tagged automatically with the language tag (here: 'fr'). And, if you browse the 'fr' universe, only the questions with the 'fr' tag will appear.

So the universe filter already takes care of content localization. That was an easy move.

Look and feel

The universes can have their own stylesheet. This means that the look and feel of a localized askeet can be easily adapted as well, with the same mechanism. Next, please.

Language-dependent functions

The database indexing system built during the twenty-first day relies on a stemming algorithm which is language-dependant. In a localized version, it has to be adapted.

For now, there is no available stemming library for other languages than English in PHP, but what if there was one, or what if someone decided to port one of the Perl stemming libraries to PHP?

Then, in the myTools::stemPhrase() method, we should call a factory method instead of a simple PorterStemmer (left as an exercise for now).

Database content

Imagine an international website proposing a list of hotels around the world. Each hotel is shown with a text description of the rooms, the service and the opening hours. There are thousands of hotels, so this content is to be stored in a database. The problem is that there must be as many versions of the descriptions as there are translations of the site.

Symfony provides a way to structure data in order to handle such cases. As for the example above, there would be a Hotel class for the fares, address and not-to-be-translated content, and a HotelI18n class for the localized content. As the Propel accessors abstract this separation, even if the description was located in the HotelI18n table, you would still access it with a simple:

$description = $hotel->getDescription();

To understand how this works, refer to the i18n chapter of the symfony book.

Fortunately, the filter system of the askeet universes replaces the need for content adaptation, so we won't use it here..

Internationalization

As it is a long word, developers often refer to internationalization as 'i18n'. For those who don't know why, just count the letters in the word 'internationalization', and you will also understand why 'localization' is referred to as 'l10n'. In web application development, i18n mostly concerns the translation of text content and the use of local formats for the interface.

Set the culture

A lot of built-in i18n features in symfony are based on a parameter of the user session called the culture. The culture is the combination of the country and the language of the user, and it determines how the text and culture-dependant information will be displayed.

When the askeet application recognizes a universe as a localization, it has to set the corresponding culture. When should a permanent tag be recognized as a localization? We choose to allow only the ones for which the interface is translated (see below), so the fact that a universe is a localization is determined by the existence of an XML translation file in the project i18n/ directory.

The universes are discovered in the askeet/apps/frontend/lib/myTagFilter.class.php filter, so we just need to modify it a little bit:

public function execute ($filterChain)
{
  ...
  // is there a tag in the hostname?
  $request  = $this->getContext()->getRequest();
  $hostname = $request->getHost();
  if (!preg_match($this->getParameter('host_exclude_regex'), $hostname) && $pos = strpos($hostname, '.'))
  {
    $tag = Tag::normalize(substr($hostname, 0, $pos));
 
    // add a permanent tag constant
    sfConfig::set('app_permanent_tag', $tag);
 
    // add a custom stylesheet
    $request->setAttribute('app/tag_filter', $tag, 'helper/asset/auto/stylesheet');
 
    // is the tag a culture?
    if (is_readable(sfConfig::get('sf_app_i18n_dir').'/global/messages.'.strtolower($tag).'.xml'))
    {
      $this->getContext()->getUser()->setCulture(strtolower($tag));
    }
    else
    {
      $this->getContext()->getUser()->setCulture('en');
    }
  }
  ...
}

note

The language tags that will be recognized are to be coded in two lower-case characters, as described in the ISO 639-1 norm (for instance fr for French). When dealing with internationalization, always prefer ISO codes for countries and languages, so that your code can comply with international standards and be understood by foreign developers.

You will find more information about internationalization and cultures in the i18n chapter of the symfony book.

Dates, Times, Numbers, Currency, Measurements

The way to display a date in France is not the same as in the US. What an American would write:

December 16, 2005 9:26 PM

...is written by a French

16 décembre 2005 21:26

If you remember well, each time we had to display a date in an askeet template, we used the format_date() helper. This helper formats the date given as parameter according to the user culture. As the culture is set in the myTagFilter.class.php filter, the date formatting will be done automatically.

date formatting in French askeet

This is a another good practice for international projects: always use the i18n helpers when you have to output a date, a time, a number, a currency or a measurement. Symfony provides helpers for most of them (see the i18 helpers chapter of the symfony book for more information).

Interface translation

The interface of the askeet project contains text. In a localized version, the text of the interface should be displayed in the language of the user culture.

To enable interface translation, all the texts of the askeet templates have to be enclosed in a special i18n helper, __(). In addition, the helper must be declared at the top of the template. For instance, to enable interface translation in the home page, open the askeet/apps/frontend/modules/question/templates/listSuccess.php template and change it to:

<?php use_helper('I18N') ?>
 
<h1><?php echo __('popular questions') ?></h1>
 
<?php include_partial('list', array('question_pager' => $question_pager)) ?>

note

Instead of having to add the i18n helper on top of each template, you can just add it once to the application settings.yml in askeet/apps/frontend.config/:

all: .settings:
standard_helpers: Partial,Cache,Form,I18N

For each language in which the interface is translated, a messages.xx.xml file must be created in the askeet/apps/frontend/i18n/ directory, where xx is the language of the translation. This XML file is a XLIFF dictionary, showing the translated version of the text from the source language (English for askeet).

For instance, to enable a French translation, you must create a messages.fr.xml with the following content:

<?xml version="1.0" ?>
<xliff version="1.0">
  <file original="global" source-language="en_US" datatype="plaintext">
    <body>  
      <trans-unit id="1">
        <source>popular questions</source>
        <target>questions populaires</target>
      </trans-unit>             
    </body>
  </file>
</xliff>

The syntax of the XLIFF file is explained in detail in the i18N chapter of the symfony book.

Now, the big part of the job is to browse all the templates (and template fragments) to find the text to translate. Each time you find a sentence, you have to enclose it between <?php echo __(' and ') ?>, and create a new <trans-unit> tag in the messages.fr.xml file. Fortunately, all the templates in symfony projects are localized in templates/ directories, so you don't need to browse all the files of your project.

note

A translation only makes sense if the translation files contains full sentences. However, as you sometimes have formatting or variables in the text, you can add a second argument to the __() helper to do substitution. For instance, to mark the following template text:

There are <?php echo count_logged() ?> persons logged.

...use only one __()) call to avoid splitting the sentence into two parts that can't be understood on their own:

<?php echo __('There are %1% persons logged', array('%1%' => count_logged())) ?>

Finally, to allow the automatic translation, you have to set the i18n parameter to on in the application settings.yml:

all:
  .settings:

    i18n:                on

Now browse to fr.askeet.com and watch the translated interface:

askeet in French

Automated translation

Some tools exist to automate the task of enclosing source text and creating messages.xx.xml files. Unfortunately, none will be able to do the enclosing as well as you would do. Only you can determine where to start and where to end the __() call. Although we don't use them, we provide a link to the websites where you will find resources about automated translation tools:

  • The xgettext command from the GNU getText tool provides a way to extract text from PHP code. It produces a .pot file (list of the terms) that can be declined into a series of .po files (list of the terms translated in one language).
  • The po2xliff command from the XLIFF tools turns .po files into messages.xx.xml XLIFF files.
  • For Windows users, the Okapi framework can be a good alternative.
  • To edit the translation files, poedit proposes an intuitive interface (this is especially useful since most of the human translators don't understand either XML or .po files).

Don't forget

Once the text from the templates is marked for translation, there is still a close code inspection to be done. As a matter of fact, text messages can hide in unexpected parts of your application. Make sure you do an inventory to find the following "hidden" text:

  • Image folders (images can include text)

    If you need to localize images, put them in a sub directory corresponding to their culture, and add the culture to the image_tag() helper call:

      [php]
       getCulture().'/myimage.png') ?>
    
  • The alternative text for images, the button labels and all the text messages that are parameters of <?php and ?> instructions.

  • The JavaScript messages can be located in helpers (as in link_to('click', '@rule', 'confirm=Are you sure?')), in JavaScript tags in your templates, or in included .js files

All in all, if you don't design an application with i18n in mind from the beginning, there is a high risk that you will forget some untranslated text somewhere. Our best advice is to think about i18n before starting to develop, and if you know that your application will probably be translated, keep in mind to use __('') each time you write text that will be displayed to the end user.

note

There are some hidden text messages in the validate/ directories of your modules, that appear when a form is not properly validated. The cool thing is that you don't have to do a special treatment to these texts if they appear in the XLIFF translation. Symfony will automatically find the translation in a <trans-unit> node, and use it instead of the original text of the YAML files.

error messages

See you Tomorrow

Askeet is making its way to be a really useful open-source application. Being an i18n-compatible application, it becomes available to the non-English speakers (roughly 90% of the world population).

The modified source of the application, including i18n, is available in the SVN repository and can be browsed directly from the askeet trac. Your comments on the forum are welcome.

And tomorrow is already the last day of the symfony advent calendar series. Don't miss it.

This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License license.