|
||||||
| C and C++ C and C++ forum for discussing all forms of C except for C#. These languages are powerful low level languages used for creating Operating Systems, Device Drivers, compilers and much more. |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|||
|
Can someone explain what UNICODE is and why I should optimize my code for it? I have read in several places that your C++ code should be optimized for it but I'm not ever sure what it is or how to optimize it.
__________________
I Need Help |
| Sponsored Links |
|
|
|
|||
|
Unicode is a format for encoding text. Its mainly used for non-romanized languages such as Kanji or Mandarin.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum Chat with other CodeCall members on IRC; connect to irc.codecall.net and join #codecall |
|
|||
|
Quote:
Only about 2 billion people, since Mandarin and Cantonese are China's offical languages, and Kanji, Kana, and hiranga are Japan's natianal languages. There written as Symbols, instead of the roman alphabet.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum Chat with other CodeCall members on IRC; connect to irc.codecall.net and join #codecall |
|
|||||
|
ASCII was the code used to transmit information when C++ was first written. You could rely on 7 bits, which meant 128 characters. 8 bits gave you 256 characters. These are nice, but not reliable. Unicode uses (I think) 2 bytes, or 16 bits to store characters, so you suddenly have 65,536 characters available, which is enough to cover most languages.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum Programming is a branch of mathematics. |
| Sponsored Links |
|
|
|
|||
|
Unfortunately, as in all things that transcend to the human world, text encoding is a royal PITA. Here's a basic primer:
ASCII: 7 bit encoding for standard 128 English characters. Extended ASCII: 8 bit encoding, with the lower 128 being standard ASCII, and various promulgations of the higher 128 bit for encoding various internation characters, drawing symbols, etc. The great thing about ASCII is that 1 byte = 1 character. Fairly simple. The bad thing about ASCII is it turns out that there's a lot more than 256 characters once you consider other languages. So, now we have: Unicode: standard for encoding lots and lots of characters Unfortunately, Unicode doesn't actually define an encoding standard....which leads us to: UTF-7: Nobody uses it, AFAIK... UTF-8: variable bit encoding for Unicode (I think it's currently from 1 byte to 4 bytes). The bottom 128 are the same as ASCII, so all of us English centric programmers can still assume 1 byte = 1 character. Upper bytes, for other languages, are variable length. UTF-16: variable bit encoding for Unicode. The bottom 65,536 (the basic level) are 2 bytes, the rest are 4 bytes. UTF-16 should have a byte order mark to show whether it's big endian or little endian. UTF-16LE is little endian, and UTF-16BE is big endian. UCS-2: A subset of UTF-16 that only supports the basic level. UTF-32: 4 byte encoding for all of Unicode. There's some other encoding standards I'm glossing over (like EBCDIC/UTF-EBCDIC, etc.) that aren't widely used. IIRC, here's how the OS/Environment wars played out: Windows NT: UCS-2 Windows 2000+: UTF-16 .NET Runtime: UTF-16 Java Runtime: UTF-16 *nix: UTF-8 or ASCII |
|
|||
|
Good job, does that explain your question NeedHelp? The most used is UTF-8, which youll see most programs support.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum Chat with other CodeCall members on IRC; connect to irc.codecall.net and join #codecall |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Accessing a databse with a web service | xsonny | Database & Database Programming | 2 | 07-10-2007 02:38 PM |
| Unicode languages | Sir_Rimo | Java Help | 3 | 06-20-2007 10:40 AM |
| Question, don't know how to explain it though | El_Fantastico | Website Design | 6 | 04-12-2007 04:06 PM |
| WingedPanther | ........ | 2753.6 |
| Xav | ........ | 2704 |
| Brandon W | ........ | 1702.32 |
| John | ........ | 1207.73 |
| marwex89 | ........ | 1175.24 |
| morefood2001 | ........ | 966.05 |
| dcs | ........ | 655.75 |
| Steve.L | ........ | 475.59 |
| orjan | ........ | 418.58 |
| Aereshaa | ........ | 383.54 |