The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software

A basic explanation of encoding stuff: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software and its impact on .Net world: Strings in C# and .NET and Unicode and .NET.

Late edit: Eric Sink shows us how Microsoft has become trapped with UTF-16.

They have taken the decision too early to go to UCS-2. And when computers had to manage more characters than UCS-2 was able to handle (with the arrival of many Asian countries using large alphabets), their installed base was too large to switch to UTF-8. So they have chosen UTF-16, just because if you forget extra characters, UCS-2 and UTF-16 match. UCS-4 was too big, so a variable length was mandatory, but UTF-16 is the wrong intermediate: you have to deal with complexity of variable encoding without compactness of UTF-8. Even worse, it can lead you to use ignore kerning-pairs and use UTF-16 as a fixed length encoding. Lots of .Net programers think that System.String is just an array of 16 bits values.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s