Jump to content

Lithuanian language encoding

- - - - -

  • Please log in to reply
14 replies to this topic

#1
thatsme

thatsme

    Programmer

  • Members
  • PipPipPipPip
  • 176 posts
Hi. I want php to read text from xml file. The text contains lithuanian letters. However, no matter what encoding i set, php never reads lithuanian letters. I tried such encodings in xml file: utf-8, utf-16, windows-1257 and few others that support lithuanian letters. I have also set encoding in php file with header function but it does not help. Does anyone know what i am doing wrong?

#2
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,722 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
What do you mean by "it doesn't read" them. Where are you outputting them to?
sudo rm -rf /

#3
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
  • Location:Vancouver, Eh! Cleverness: 200
You would also need to tell us where the problem lies, PHP has no concept of encoding (and will not read anything wrong), however it could either parse it wrong, or the browser could display it wrongly.

Edited by Alexander, 22 July 2011 - 03:26 AM.

Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#4
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,722 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Also, make sure you have this as the first line of your XML file:

<?xml version="1.0" charset="utf-8"?>
sudo rm -rf /

#5
thatsme

thatsme

    Programmer

  • Members
  • PipPipPipPip
  • 176 posts

dargueta said:

What do you mean by "it doesn't read" them. Where are you outputting them to?
I mean that insted of ę browser displays e, instead of ą displays a and so on. php program reads text from xml and outputs it to html. encoding in html is set correctly

#6
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
  • Location:Vancouver, Eh! Cleverness: 200

thatsme said:

I mean that insted of ę browser displays e, instead of ą displays a and so on. php program reads text from xml and outputs it to html. encoding in html is set correctly

Can you tell us how you are reading this XML file? We are shooting in the dark, you could be using a software of which explicitly converts your encoding to a western encoding or does not respect it.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#7
thatsme

thatsme

    Programmer

  • Members
  • PipPipPipPip
  • 176 posts
I use wamp 2.1 and php dom parser. I use this line to create DOMDocument: $xmlDoc = new DOMDocument('1.0', 'windows-1257'); and then do all the other stuff of collecting values of nodes.

#8
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,722 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Instead of windows-1257 use UTF-8.
sudo rm -rf /

#9
thatsme

thatsme

    Programmer

  • Members
  • PipPipPipPip
  • 176 posts
that didn't help :( Now it's even worse: before switching encoding to utf-8 strings in php code containing lithuanian letters were displayed correctly, but now both php strings and strings read from xml lost lithuanian letters

Edited by thatsme, 25 July 2011 - 07:55 AM.


#10
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,722 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
1) Did you re-encode the file?
2) Do you have the correct encoding specified in the<?xml ?> tag?
sudo rm -rf /

#11
thatsme

thatsme

    Programmer

  • Members
  • PipPipPipPip
  • 176 posts
Yes, xml xml is set to utf8: <?xml version="1.0" encoding="utf-8"?>. html is set to utf-8 also: <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>. And here's how I set it in php: header('Content-Type: text/html; charset=UTF-8'); It looks that xml fails to save lithuanian characters in file.

#12
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,722 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
I'd say your text editor fails to save them properly. What text editor are you using? Some Microsoft text editors put a useless byte-order mark at the beginning of all UTF-8 files so they can easily detect the encoding. A few applications choke on this and fail to render text properly.

See, for example, this famous Notepad bug.
sudo rm -rf /




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users