Saturday, November 14, 2009

Internationalizing Numbers and Dates in Java

Locale
It's easy to be tempted to find a tool that can do a specific job. Consider formatting Dates and Numbers for different countries. If you do find a tool, great, but that's just making your application dependent on another tool and redundant since Java already provides you an API that does the job. There's a ton of language support that Java provides. If Klingon would become a language in the future, I'm sure Java will support it, hopefully here on earth.

A Locale in Java is a representation of the Language and the Country. It's also a class. Their's two properties of it that is a must - language id and country code each of which consists of 2 characters. Say if you have the German language, the locale is "de". If the language is still German but of the country Austria then the locale is "de_at". Locale for United States would be "en_US". There's a lot of websites that explains the different locales in different countries so I'm gonna leave it out.

When instantiating a Locale take its important to note what arguments are being passed to its constructor. I've come across a bug in one of the applications I've worked at where a line of code was passing a language id of this form languageId_countryCode where in fact it should only be languageId. If you need to pass both to the constructor, separate those two values out. See example below.

Locale germanLanguageId = new Locale("de_at"); // wrong assignment of locale value.
Locale germanLanguageId = new Locale("de"); // correcet assignment.

If both the language id and country code is needed then it should look like this.

Locale germanLanguageIdAndCountryCode = new Locale("de", "at");

Localized Or Internationalized Date
This is pretty straight forward. Just use the DateFormat class and pass in the right locale to get the right date display as shown below. When you actually want to get an instance of that class, use the factory method getDateInstance(int style, Locale locale).

Date date = new Date(); // Now the date can be of any type as long as its an object.
DateFormat dateFormat = DateFormat.getDateInstance(DateFormat.SHORT, locale);
String internationalizedDateString = dateFormat.format(date);

Now depending on your current Locale, let's say we are using the german language id de, their date would actually be in this format. This is due to the fact that we chose the format type SHORT.

dd.mm.yyyy // so for todays date in german it would be 14.11.2009, thank goodness I decided not to write
                   // this yesterday!

Now for locale en_US it would be 11/14/2009. Of course the code will take care of doing the formatting for whatever locale the application is supporting.

Localized or Internationalized Number
First thing to do is to always define the pattern of the number that we would like to show in different locales. When I say pattern, its how many zeros do we want to show and if we don't want to show anything is the value of the number is equal to zero, etc. The DecimalFormat class has a method applyPattern(String pattern) where we can plugin the pattern we want. We can get an instance of this class by using the factory method of NumberFormat and cast it to DecimalFormat as shown below.

float aNumber = 1,500.98;
DecimalFormat decimalFormat = (DecimalFormat)NumberFormat.getInstance(locale); //factory method
decimalFormat.applyPattern("#,##0.00"); //our cool pattern
String internationalizedAndFormattedNumber = decimalFormat.format(aNumber );

Now the pattern #,##0.00 means that the grouping of the number is in thousand. The hash sign is any digit, where if its zero it won't display that number. The zero in the pattern is a digit, where if its zero, it would display as 0. We are also restricting the number of digits after the decimal to two digits. Say if we have a number 1,450.50 in english locale (en_US), upon formatting this to german (de) it should show up as
1.450,50. The comma becomes as period and the period becomes a comma. Of course, if the number we are trying to format is in millions, then our pattern should look something like this #,###,##0.00.

If the pattern is missing, some numbers get displayed in some other locales and some would display just fine. If our number is 0 to be formatted in german it might appear like this 0.0E3 which just confuses our business people looking at our localized spreadsheet. A pattern must be supplied always to display the right number in the right format. If a number in our localized csv file displays as 12.61 and then opened in excel, that number will be translated into a date equal to December 1967. This is the side effect of not supplying a pattern.

One last tip, if an application is processing data and converting it into CSV for excel viewing, remember to replace all carriage line feeds or new lines embedded in a String value with an empty String (" "). Say we have the String "The value\n is supposed to be\n entered here". This value should be in one row on a cell right? But with the new line character \n excel will actually break those strings into 3 rows. It would be

                Cell A
Row 1     The value
Row 2     is supposed to be
Row 3     entered here

Have fun with I18N !