+ Reply to Thread
Results 1 to 3 of 3

Thread: Utilizing the Unicode character set

  1. #1
    DarkLordoftheMonkeys's Avatar
    DarkLordoftheMonkeys is offline Programming Professional
    Join Date
    Oct 2009
    Location
    Massachussets
    Posts
    255
    Blog Entries
    56
    Rep Power
    11

    Utilizing the Unicode character set

    Unicode is a 16-bit character set that includes characters for Asian, Arabic, Greek, and other languages as well as several special symbols. It is backwards compatible with ASCII.

    The code name for Unicode is utf-8, sometimes utf-16. To include Unicode characters in an HTML document, use the tag:

    <meta http-equiv="content-type" content="text/html; charset=utf-8" />

    To include Unicode characters in an XML document, set the encoding to Unicode, like this:

    <?xml version="1.0" encoding="utf-8" ?>

    Several programming languages, such as Javascript and C99, allow you to print Unicode characters using the escape sequence \uXXXX, where XXXX is the hexidecimal code for the character.

    Example:

    printf( "\u0250\n ");

    prints an upside-down a. I made a Javascript that flips inputted text upside down using this piece of information:

    Code:
    <html>
    <head>
    <title>lksag</title>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <script type="text/javascript">
    function fliptext(){
    	var inputText = document.getElementById('textin').value;
    	var textArray = new Array();
    	var flipped = document.getElementById('flip');
    	for( i = 0; i < inputText.length; i++ ){
    		textArray[i] = inputText.charAt(i);
    	}
    	var reverseText = textArray.reverse();
    	flipped.innerHTML = "";
    	for( i = 0; i < reverseText.length; i++ ){
    		flipped.innerHTML += upsideDown(reverseText[i]);
    	}
    }
    function upsideDown( ch ){
    	switch (ch){
    		case 'a': return "\u0250";
    		case 'A': return "\u0250";
    		case 'b': return "q";
    		case 'B': return "q";
    		case 'c': return "\u0254";
    		case 'C': return "\u0254";
    		case 'd': return "p";
    		case 'D': return "p";
    		case 'e': return "\u01dd";
    		case 'E': return "\u01dd";
    		case 'f': return "\u025f";
    		case 'F': return "\u025f";
    		case 'g': return "\u0183";
    		case 'G': return "\u0183";
    		case 'h': return "\u0265";
    		case 'H': return "\u0265";
    		case 'i': return "\u0131";
    		case 'I': return "\u0131";
    		case 'j': return "\u027e";
    		case 'J': return "\u027e";
    		case 'k': return "\u029e";
    		case 'K': return "\u029e";
    		case 'l': return "l";
    		case 'L': return "l";
    		case 'm': return "\u026f";
    		case 'M': return "\u026f";
    		case 'n': return "u";
    		case 'N': return "u";
    		case 'o': return "o";
    		case 'O': return "o";
    		case 'p': return "d";
    		case 'P': return "d";
    		case 'q': return "b";
    		case 'Q': return "b";
    		case 'r': return "\u0279";
    		case 'R': return "\u0279";
    		case 's': return "s";
    		case 'S': return "s";
    		case 't': return "\u0287";
    		case 'T': return "\u0287";
    		case 'u': return "n";
    		case 'U': return "n";
    		case 'v': return "\u028c";
    		case 'V': return "\u028c";
    		case 'w': return "\u028d";
    		case 'W': return "\u028d";
    		case 'x': return "x";
    		case 'X': return "x";
    		case 'y': return "\u028e";
    		case 'Y': return "\u028e";
    		case 'z': return "z";
    		case 'Z': return "z";
    		case '0': return "0";
    		case '1': return "\u21c2";
    		case '2': return "\u1105";
    		case '3': return "\u1110";
    		case '4': return "\u3123";
    		case '5': return "S";
    		case '6': return "9";
    		case '7': return "L";
    		case '8': return "8";
    		case '9': return "6";
    		case ' ': return " ";
    		case '\n': return "<br />";
    		case '.': return "\u02d9";
    		case ',': return "\'";
    		case '\'': return ",";
    		case '\"': return ",,";
    		case '!': return "¡";
    		case '?': return "\u00bf";
    		case '@': return "@";
    		case '#': return "#";
    		case '$': return "$";
    		case '%': return "%";
    		case '^': return "v";
    		case '/': return "/";
    		case '\\': return "\\";
    		case '<': return ">";
    		case '>': return "<";
    		case '(': return ")";
    		case ')': return "(";
    		case '[': return "]";
    		case ']': return "[";
    		case '{': return "}";
    		case '}': return "{";
    		case ':': return ":";
    		case '*': return "*";
    		case '-': return "-";
    		case '+': return "+";
    		case '=': return "=";
    		case '&': return "+";
    		default: return "";
    	}
    }
    </script>
    <style type="text/css">
    div{
    	font-family: Courier;
    	width: 400px;
    	text-align: right;
    }
    </style>
    </head>
    <body>
    <form>
    <textarea id="textin" cols="35" rows="5" onkeypress="fliptext();" onkeyup="fliptext();">
    </textarea>
    <br />
    <div id="flip"></span>
    </form>
    
    </body>
    </html>
    There are a few ways to find the hexidecimal code for a Unicode character. The method that I use is the copy the character into Vim, then type ga in command mode when the cursor is over the character. This gives the decimal, octal, and hexidecimal values.
    Life's too short to be cool. Be a nerd.

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Location
    Advertising world
    Posts
    Many

     
  3. #2
    Join Date
    Jul 2006
    Posts
    16,491
    Blog Entries
    75
    Rep Power
    143

    Re: Utilizing the Unicode character set

    I thought some Unicode characters were 24 bits wide.
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #3
    whitey6993's Avatar
    whitey6993 is offline Programming Expert
    Join Date
    Dec 2008
    Posts
    435
    Rep Power
    15

    Re: Utilizing the Unicode character set

    Good introduction to unicode. +rep

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 7
    Last Post: 10-12-2011, 06:24 AM
  2. utilizing a form more than once
    By cesarg in forum C# Programming
    Replies: 0
    Last Post: 09-15-2011, 01:12 AM
  3. SQL problems with UNICODE
    By VakhoQ in forum PHP Development
    Replies: 0
    Last Post: 12-10-2010, 06:42 AM
  4. Unicode
    By naomi in forum C# Programming
    Replies: 1
    Last Post: 07-26-2010, 05:22 AM
  5. **** unicode and ICU .. need help ??
    By denarced in forum C and C++
    Replies: 2
    Last Post: 09-11-2008, 05:23 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts