Apparently the BlackBerry does a good job of rendering decomposed unicode characters into readable characters. But the BlackBerry does not appear to be able to render precomposed characters (which is what most pages on the internet use).
So to make a long story short, if you copy, let's say, some Vietnamese from a web page, and paste it into an email ... that email will not render very well on a BlackBerry.
Well, here is a hack that may help someone out. Decompose the characters, and send those decomposed characters in an email. If you do that, the email will likely render "fine" in a normal email client as well as a BlackBerry email client. I call this a hack because the W3C generally recommends to exchange texts in NFC ... well most BlackBerry email clients will not render all precomposed characters (NFC).
For example, if you are NOT using a windows box, try cutting and pasting the following into an email:
Subject: NFC (like most of the web)You'll find that emails sent with these characters render fine in a normal desktop email client, or even a web email client like Gmail, but they do not render correctly in the BlackBerry email client.
Chúa yêu em lòng em vui thay
Kia Kinh Thánh đã tỏ cho hay
Các con thơ thuộc Jê-sus đây
Chúng yếu nhưng Ngài khỏe mạnh hoài
Jê-sus yêu em lắm
Phải em được Chúa yêu
Jê-sus yêu em lắm
Chính trong lời Chúa dạy nhiều
Now, try the same with this "decomposed" text:
Subject: NFKD (a decomposed form)While the two texts may look similar on this web page, they are different, trust me. And you'll find that these characters render well in Gmail, Outlook, Evolution, but also render well on the BlackBerry.
Chúa yêu em lòng em vui thay
Kia Kinh Thánh đã tỏ cho hay
Các con thơ thuộc Jê-sus đây
Chúng yếu nhưng Ngài khỏe mạnh hoài
Jê-sus yêu em lắm
Phải em được Chúa yêu
Jê-sus yêu em lắm
Chính trong lời Chúa dạy nhiều
I'm not sure why I feel like including a small java program I wrote to help folks create emails that render better for the BlackBerry, but here it is:
import java.util.Scanner;This program would be used as follows from a command line:
import java.text.Normalizer;
import java.text.Normalizer.Form;
public class d {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
sc.useDelimiter("Yes, my Java is terrible ...");
String foo = sc.next();
CharSequence c = foo.subSequence(0,foo.length());
Normalizer.Form nf = Normalizer.Form.valueOf("NFKD");
System.out.println(nf + " Compatability Decomposed:\n" + Normalizer.normalize(c,nf));
}
}
$ cat myFileWithPrecomposedCharacters | java dSo you could paste NFC characters from, say, a Vietnamese web page, into a file, and then run the file through the program to generate NFKD which you can then paste into an email you're sending, and that email should render in a readable way using a desktop email client, a web email client, or a BlackBerry.
I've only tested this methodology with Vietnamese, and because the incident at the tower of Babel was so confusing, all bets are off with other languages.
By the way, it looks to me like neither NFC nor NFKD render correctly in AndroidMail as of today's build, so we should end up seeing complaints about Vietnamese not rendering well on the G1, unless the developers get it fixed soon. Maybe we will have a follow up post with more on that subject.
UPDATE: A great write up on unicode and the BlackBerry is here on the Logicmail website. LogicMail is a J2ME E-Mail client supporting IMAP and POP, and designed to run on RIM BlackBerry handheld devices.
NOTE:
1) For more help with the definitions of the normalization forms mentioned above try here. It is a good document to be familiar with if you are planning to do i18n or l10n.
- i18n stands for internationalization
- l10n stands for localization
2) for those of you who just want a quick overview ...
- These terms are roughly equivalent for this discussion:
compatibility decomposition (NFKD)
canonically decomposed characters
composite unicode
composite characters
the "separated" diacritical marks and letters used in Vietnamese without combining
- These terms are also roughly equivalent, and should not be confused with those just above:
compatibility composition (NFKC)
NFC - normalization form canonical composition
precompound unicode
unicode dựng sẵn
composed characters
recomposed characters (by canonical equivalence)
precomposed characters
decomposable characters
pre-composite characters
ligatures
the set of completed characters (including all markings)
If you think there is a problem with the terms or equivalencies drawn above, please let's discuss it via email. If there are things that need to be corrected, I am open to that, just let me know.