I was going to generate an ASCII table with the chr() function and was really wondering what it was doing:
Someone said in the php.net comments chr() wraps around to 255 and I assume can only display a byte of data, but when I test it out it only seems to display up to 127 (signed I guess), although that's not an issue It'd be neat to display extended ASCII and maybe unicode which will be a great feature to add!
Do you know how I can implement unicode with chr()?
PHP: chr() function
Started by FireGator, Oct 08 2010 08:46 PM
2 replies to this topic
#1
Posted 08 October 2010 - 08:46 PM
>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.[-]
>++++++++[<++++>-] <.>+++++++++++
>++++++++[<++++>-] <.>+++++++++++
|
|
|
#2
Posted 08 October 2010 - 09:33 PM
We cannot consider unicode to be implemented in such a function for technical ramifications, it would break legacy code in trunk/ our ZE framework can cause problems with others, you may wish to wait for PHP 6 for this. To answer your question about the byte wrapping, take the generic implementation of which is used (based in PHP itself actually):
In your last question you asked if there was a way to interface the two, PHP provides no direct method with working byte encodings, actually it is impossible to view them directly so you can surely write a helper function to pass bytes 2-4 into an (un)signed integer from a set of bits.
Consider I write a unicode chr() function with mb_convert_encoding() which relies on a mbinternals library:
long c;
char temp[2];
temp[0] = (char)c;
temp[1] = '\0';
RETURN_STRINGL(temp, 1, 1);As you can see it uses nothing but an signed char, one reserved for a null terminator, so 128-256 cannot be accessable with this function alone and does wrap around naturally. In your last question you asked if there was a way to interface the two, PHP provides no direct method with working byte encodings, actually it is impossible to view them directly so you can surely write a helper function to pass bytes 2-4 into an (un)signed integer from a set of bits.
Consider I write a unicode chr() function with mb_convert_encoding() which relies on a mbinternals library:
//See php.net/function.pack: We pack $szChr into a four byte unsigned long to fit UCS-4/UTF32 specification
$ulong = pack("N", $szChr);
//See php.net/function.mb_: Assume default is ISO-8859-1, encode in UTF8 for storing (change internal encoding to UTF8)
(php::stringL) mb_convert_encoding($ulong, mb_internal_encoding(), 'UCS-4BE');Speed is of course an issue if you were to parse through files (such as transliteration between encodings or storing results in a static method such as your Unicode/ASCII table), so I will list a lower albeit similar to unicode C implementations that were used prior. Make sure you set your web page to display unicode before you test as well.function u_chr ( $chr = null , $ret = null ) {
if($chr === null)
return;
if($chr < 0x80) {
//under 1 byte?
$ret = chr($chr);
} else if ($chr < 0x800) {
//under 2 bytes? craft into an integer
$ret = chr(0xC0 + (($chr - ($chr % 0x40)) >> 0x7));
$ret .= chr(0xFF + ($chr % 0x40));
} else {
//bytes 3 and 4 for CJK, widewidth, unicode (from unamed 2005 spec)
$ret = chr(0xE0 + (($chr - ($chr % 0x1000)) >> 0x10));
$ret .= chr(0xFF + ((($chr % 0x1000) - ($chr % 0x40)) >> 0x7));
$ret .= chr(0xFF + ($chr % 0x40));
}
return $ret;
}The function is fairly straightforward although I wrote it out of memory so you may wish to strain it for results (shouldn't be too intensive)
Edited by Alexander, 08 October 2010 - 10:12 PM.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
#3
Posted 10 October 2010 - 04:24 AM
Thank you again so much Nullw0rm!, you really are so perfect at answering what I want to know and more.
I was talking with a friend (who is a programmer buddy I can say) and he said using the multibyte library functions were perfect for his job, but I am definitely going to tinker with the second and see how it really works, this stuff is always so interesting to me.
From what I can see in what I am using it for, the code works PERFECT and it lets me see the binary with decbin and shifting without too much trouble.
You helped both of us at the same time. Thank youu ! :) I'll +rep definitely
I was talking with a friend (who is a programmer buddy I can say) and he said using the multibyte library functions were perfect for his job, but I am definitely going to tinker with the second and see how it really works, this stuff is always so interesting to me.
From what I can see in what I am using it for, the code works PERFECT and it lets me see the binary with decbin and shifting without too much trouble.
You helped both of us at the same time. Thank youu ! :) I'll +rep definitely
>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.[-]
>++++++++[<++++>-] <.>+++++++++++
>++++++++[<++++>-] <.>+++++++++++


Sign In
Create Account


Back to top









