Website Globalization

The CM Pros Globalization Community

. We have localized the website navigation labels into nine languages (plus English) in time for the CM Pros Fall 2006 Summit, with its theme being globalization.

We decided to form a new topical community of those interested in problems of internationalization (getting a site ready to handle multilingual content), localization (adapting to the culture and language of each locale served), and translation (including workflow tools to manage the translation process). See the charter page.

Donald DePalma, our keynote speaker for the Summit, has made an independent assessment of the website globalization effort.

Globalizing (or localizing or internationalizing) the CM Pros website requires a number of steps.

  1. Navigation Labels translated into the target languages
  2. Controlled vocabulary translated into the target languages
  3. Specific (relatively static) content translated
  4. Website response to browser language preferences
    (recognize and act on the Accept-Language request header)
  5. Links in the banner to target languages (flags, perhaps?)
  6. What language is your browser requesting?

Navigation Labels translated into the target languages

We keep the different language versions in a multilingual translation memory (TM) XML file here, for convenient reference by our translators.

You can download this file and edit it with the TMX Editor from Heartsome. Register with Heartsome and get a 30-day trial copy of TMX Editor.

For those of you with Translation Memory (TM) Tools, we can convert this multilingual file into source-target language pairs. Contact Bob Doyle to get language-pair files, in SDLX or TRADOS formats.

If you would prefer an Excel spreadsheet with the English-language source labels, here is a spreadsheet with the translations so far.

Contributors/Localizers

ArabicRana Allam
DutchAdriaan Bloem, Erik Hartman
FrenchRobert Bédard, Raymond Bissonnette, Jane McConnell, Benoît Secher
GermanAnna Fuhrmann, Jörg Dennis Krüger
HebrewYair Dembinsky
ItalianPaola Di Maio
JapaneseTomoko Yamato
SpanishMario López de Ávila Muñoz

Controlled vocabulary translated into the target languages

Apart from CM Pros specific choices for "branded" terminology (like preferring CM Pros to CMPros) as described in our Style Guide, we might consider multi-lingual versions of the CM Pros glossary.

We may be able to do this with the existing CMS Wiki where collaborative development is done for the Content Management Glossary.

This CM glossary is repurposed for glossaries in closely related disciplines like Information Architecture, Interaction Design, Knowledge Management, and Taxonomy.


Specific content translated.

The Join CM Pros and the Mission page content are relatively static and should be translated first.

In general,

  • We must translate/localize pages on the server for each language we intend to support.
  • We must carefully name the files for the localized pages, so the server has a systematic way of locating them.
  • We need a method for serving a generic page when we don't have the requested language.
  • Alternatively we could serve a computer translation (a gist) of the page. This will no doubt annoy language purists and make others laugh. But we can then ask them to do the missing translation?

    Donald DePalma, our keynote speaker for the CM Pros Summit on Globalization, has pointed out that the choice is not just between machine translation (MT) and human translation (HT). For organizations like CM Pros and its potential international members, the choice is between MT and ZT - "Zero Translation." At a minimum, we should offer links on each English page to automatic machine translation services now available on the web - gists from Altavista Babelfish and Google, among others.


Website response to browser language preferences

How does the web browser make a request for specific languages?

The browser writes a value for the Accept-Language request header field that it sends to the web server. You can set this value in Preferences (Netscape) or Internet Options (Internet Explorer). If you choose multiple languages, they are sent to the server as a comma-delimited list in your preferred order.

With the settings above, the request header field will be Accept-Language = de,en-us,it,fr,pt-br,es.

How does the web server determine which language to serve? Is there a naming convention for multi-lingual web pages?

Deciding which page to serve is called content negotiation.

Tim Berners-Lee's discussion of generic web pages and their language variants describes two different naming conventions, index.fr.html and index.html.fr.

The Apache Web Server compiles in content negotiation (the mod_negotiation module) by default. It postpends the two-letter language code to the URL and looks for files to serve.

For example, if you set your browser preferred language to French and browse the CM Pros site, an Apache server would look for the file www.cmprosold.org/index.html.fr.

Unfortunately, postfixing the language code is not as desirable as infixing it - index.fr.html - which allows the server's operating system to use a familiar extension. The Apache docs on content negotiation say that you can choose between naming conventions, because files can have more than one extension, and the order of the extensions is normally irrelevant (see mod_mime documentation for details). We suggest infixing the language (and other) variants.

Microsoft Internet Information Server negotiates language with an optional ISAPI filter. There does not appear to be a standard ISAPI filter for language negotiation on the market.

Server-driven Negotiation (W3C HTTP/1.1 spec RFC2616 Section 12.1)

If the selection of the best representation for a response is made by an algorithm located at the server, it is called server-driven negotiation. Selection is based on the available representations of the response (the dimensions over which it can vary; e.g. language, content-coding, etc.) and the contents of particular header fields in the request message or on other information pertaining to the request (such as the network address of the client).

Server-driven negotiation is advantageous when the algorithm for selecting from among the available representations is difficult to describe to the user agent, or when the server desires to send its "best guess" to the client along with the first response (hoping to avoid the round-trip delay of a subsequent request if the "best guess" is good enough for the user). In order to improve the server's guess, the user agent MAY include request header fields (Accept, Accept-Language, Accept-Encoding, etc.) which describe its preferences for such a response.

Do web servers handle multi-lingual requests automatically?

Not normally. The web server must have multiple language versions of a web page in order to serve them. It needs to know how the language-variant web pages are named. Besides naming them with a URI (index.fr.html), it may be possible to transmit the language variance as metadata in the HTTP header. This is the direction of the WebDAV protocol being developed by the IETF.

Finally, multi-lingual audiences may be served by XML/XSLT with each structured page containing the separate language versions in the master XML file for the page. This is a bit more difficult for localizers, who may want to edit their own page content. CM Pros is primarily doing page-oriented web content management. Some pages are XML/XSLT as part of the DITA/XML community initiative. For example, our controlled vocabulary best practice.


What language is your browser requesting?

HTTP Accept-Language: en-us,en;q=0.5
 
skyWriter  |
 Comments (0)
Language: en  | fr  | it  | de  | es  | pt  | ar  | he  | da  | nl  | zh  | ja  | ko  | none