Lost Password?


Go Back   CodeCall Programming Forum > Software Development > C and C++

C and C++ C and C++ forum for discussing all forms of C except for C#. These languages are powerful low level languages used for creating Operating Systems, Device Drivers, compilers and much more.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 07-25-2006, 10:14 AM
NeedHelp NeedHelp is offline
Programming God
 
Join Date: May 2006
Posts: 527
Rep Power: 13
NeedHelp is on a distinguished road
Default Explain Unicode

Can someone explain what UNICODE is and why I should optimize my code for it? I have read in several places that your C++ code should be optimized for it but I'm not ever sure what it is or how to optimize it.
__________________
I Need Help
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Sponsored Links
  #2 (permalink)  
Old 07-25-2006, 12:42 PM
TkTech TkTech is offline
 
Join Date: Jun 2006
Posts: 1,034
Last Blog:
Having trouble with yo...
Rep Power: 20
TkTech is on a distinguished road
Send a message via MSN to TkTech
Default

Unicode is a format for encoding text. Its mainly used for non-romanized languages such as Kanji or Mandarin.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum
Chat with other CodeCall members on IRC; connect to irc.codecall.net and join #codecall
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 07-25-2006, 12:52 PM
Sionofdarkness Sionofdarkness is offline
Programming Expert
 
Join Date: Jul 2006
Posts: 384
Rep Power: 11
Sionofdarkness is on a distinguished road
Default

I really have no understanding of anything you just said. What are Kanji and Mandarin used for? I've never heard of those languages before, I'm guessing not many people know them.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 07-25-2006, 12:59 PM
TkTech TkTech is offline
 
Join Date: Jun 2006
Posts: 1,034
Last Blog:
Having trouble with yo...
Rep Power: 20
TkTech is on a distinguished road
Send a message via MSN to TkTech
Default

Quote:
I'm guessing not many people know them.
Eh.....eh.........he.....
Only about 2 billion people, since Mandarin and Cantonese are China's offical languages, and Kanji, Kana, and hiranga are Japan's natianal languages. There written as Symbols, instead of the roman alphabet.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum
Chat with other CodeCall members on IRC; connect to irc.codecall.net and join #codecall
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 07-25-2006, 06:35 PM
WingedPanther's Avatar   
WingedPanther WingedPanther is offline
Super Moderator
 
Join Date: Jul 2006
Age: 35
Posts: 3,421
Last Blog:
wxWidgets is NOT code ...
Rep Power: 37
WingedPanther is a splendid one to beholdWingedPanther is a splendid one to beholdWingedPanther is a splendid one to beholdWingedPanther is a splendid one to beholdWingedPanther is a splendid one to beholdWingedPanther is a splendid one to behold
Default

ASCII was the code used to transmit information when C++ was first written. You could rely on 7 bits, which meant 128 characters. 8 bits gave you 256 characters. These are nice, but not reliable. Unicode uses (I think) 2 bytes, or 16 bits to store characters, so you suddenly have 65,536 characters available, which is enough to cover most languages.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum
Programming is a branch of mathematics.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Sponsored Links
  #6 (permalink)  
Old 07-25-2006, 08:01 PM
Lop's Avatar   
Lop Lop is offline
Speaks fluent binary
 
Join Date: May 2006
Posts: 1,149
Rep Power: 18
Lop will become famous soon enoughLop will become famous soon enough
Default

What would make people in america want to use unicode then? Isn't ascii good enough if your target area is america?
__________________
Lop
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 07-26-2006, 02:09 AM
brackett brackett is offline
Programmer
 
Join Date: May 2006
Posts: 193
Rep Power: 11
brackett is on a distinguished road
Default

Unfortunately, as in all things that transcend to the human world, text encoding is a royal PITA. Here's a basic primer:

ASCII: 7 bit encoding for standard 128 English characters.

Extended ASCII: 8 bit encoding, with the lower 128 being standard ASCII, and various promulgations of the higher 128 bit for encoding various internation characters, drawing symbols, etc.

The great thing about ASCII is that 1 byte = 1 character. Fairly simple. The bad thing about ASCII is it turns out that there's a lot more than 256 characters once you consider other languages. So, now we have:

Unicode: standard for encoding lots and lots of characters

Unfortunately, Unicode doesn't actually define an encoding standard....which leads us to:

UTF-7: Nobody uses it, AFAIK...

UTF-8: variable bit encoding for Unicode (I think it's currently from 1 byte to 4 bytes). The bottom 128 are the same as ASCII, so all of us English centric programmers can still assume 1 byte = 1 character. Upper bytes, for other languages, are variable length.

UTF-16: variable bit encoding for Unicode. The bottom 65,536 (the basic level) are 2 bytes, the rest are 4 bytes. UTF-16 should have a byte order mark to show whether it's big endian or little endian. UTF-16LE is little endian, and UTF-16BE is big endian.

UCS-2: A subset of UTF-16 that only supports the basic level.

UTF-32: 4 byte encoding for all of Unicode.

There's some other encoding standards I'm glossing over (like EBCDIC/UTF-EBCDIC, etc.) that aren't widely used.

IIRC, here's how the OS/Environment wars played out:

Windows NT: UCS-2
Windows 2000+: UTF-16
.NET Runtime: UTF-16
Java Runtime: UTF-16
*nix: UTF-8 or ASCII
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 07-26-2006, 09:14 AM
TkTech TkTech is offline
 
Join Date: Jun 2006
Posts: 1,034
Last Blog:
Having trouble with yo...
Rep Power: 20
TkTech is on a distinguished road
Send a message via MSN to TkTech
Default

Good job, does that explain your question NeedHelp? The most used is UTF-8, which youll see most programs support.
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum
Chat with other CodeCall members on IRC; connect to irc.codecall.net and join #codecall
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Accessing a databse with a web service xsonny Database & Database Programming 2 07-10-2007 02:38 PM
Unicode languages Sir_Rimo Java Help 3 06-20-2007 10:40 AM
Question, don't know how to explain it though El_Fantastico Website Design 6 04-12-2007 04:06 PM


All times are GMT -5. The time now is 11:07 AM.

Contest Stats

WingedPanther ........ 2753.6
Xav ........ 2704
Brandon W ........ 1702.32
John ........ 1207.73
marwex89 ........ 1175.24
morefood2001 ........ 966.05
dcs ........ 655.75
Steve.L ........ 475.59
orjan ........ 418.58
Aereshaa ........ 383.54

Contest Rules

CodeCall Goal

Goal: 100,000 Posts
Complete: 101%


Complete - Celebrate!

Ads