Jump to content

Issues with non-ASCII characters for my application.

- - - - -

  • Please log in to reply
2 replies to this topic

#1
Fighter

Fighter

    Newbie

  • Members
  • PipPip
  • 28 posts
I have started writing an application for somebody where articles are to be submitted, the client in this case caters to mostly another language (croatian, and other languages around that area)

I am unaware of how PHP handles above 127 for the application, a quick example of what I am having trouble with:

echo strlen("ĀāĂ㥹ĆćĈĉĊċ");

I hope they appear on here, basically those are twelve characters and strlen makes it seem like 24! This is a bit troublesome as some statistics/ordering relies on the length of the article.

How would I come about to fix this issue? What would I need to know?

#2
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
  • Location:Vancouver, Eh! Cleverness: 200
PHP has no concept of byte encodings in its native strings, your compliant string may appear as the following internally:
ĀāĂ㥹ĆćĈĉĊċ
As you can see strlen will report it of 24 characters length, originally you would have account for this manually but later versions of PHP include a multibyte function library (mb) which should be enabled by default.

mb_internal_encoding("UTF-8"); //system dependent, so be explicit
echo mb_strlen("ĀāĂ㥹ĆćĈĉĊċ"); //use the mb_ function for strlen
As it uses a library (much as PCRE is used) it will naturally be slower than the standard functions, in the case you wanted to use mass automation of some sort.

Some useful documentation:
PHP: Multibyte String - Manual
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#3
Fighter

Fighter

    Newbie

  • Members
  • PipPip
  • 28 posts
It returns 12, this is great! A great answer, thank you for taking the time to explain this to me in depth.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users