Jump to content

File reading help

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
6 replies to this topic

#1
lintwurm

lintwurm

    Learning Programmer

  • Members
  • PipPipPip
  • 77 posts
So here is a weird little question,

I am writing a text editor for a class I'm doing. Now for predictive speech, I am using a TRIE algorithm which reads words from a file which contains pretty much all the English words.

My question is: Has anyone ever ran into troubles reading in apostrophes? When I send it to function my program just crashes. When I take out words with apostrophes, it runs 100% fine.

Just wanted to know if this is a problem with the Java language, or my function(which is most likely the case).

If someone wants to peek at the code, please let me know.

Thanks for your time ^_^:pinguin:

#2
ZipOnTrousers

ZipOnTrousers

    Learning Programmer

  • Validating
  • PipPipPip
  • 94 posts
Could it be a problem with how the file is encoded?

#3
lintwurm

lintwurm

    Learning Programmer

  • Members
  • PipPipPip
  • 77 posts
Don't think so.
The file is just a bunch of words and some of the words have apostrophes.

Looks like this:
aaa

aaas

aardvark

aardvarks

aardwolf

aardwolves

aaron

aaronic

aba

ababise

abac

abaca

abaci

aback

abacterial

abacus

abacuses

abaft

abalienate

abalienated

abalienates
I'm

As you can see, not much there really... When I read in the last word, the program will crash for no reason.

Not really sure why.
Would it be better if I show the code as well?
And thanks for the response ^_^

#4
ZipOnTrousers

ZipOnTrousers

    Learning Programmer

  • Validating
  • PipPipPip
  • 94 posts
Yeah post the code.

Is it only apostrophes? How do other non-letter characters work?

Also I'm just thinking, do apostrophes need to be escaped like quote marks in a Java string? I don't think so...

#5
lintwurm

lintwurm

    Learning Programmer

  • Members
  • PipPipPip
  • 77 posts
Apparently it is all non-letter characters >_<
<rant>
This is really starting to annoy me. You would think with all Java's libraries, they would make their strings work with all non-letter characters as well
</rant>
just wanted to get that off my chest...
^_^
btw. Never even thought of testing for other non-letter characters. Thanks for the advice.

#6
ZipOnTrousers

ZipOnTrousers

    Learning Programmer

  • Validating
  • PipPipPip
  • 94 posts
K try this. Open the file in Firefox (yes, Firefox). Go to View > Character Encoding. Which one is ticked? This definitely sounds like a problem with the file encoding.

Also, did you copy and paste the files from something like word? I'm guessing not but it never hurts to check...

#7
lintwurm

lintwurm

    Learning Programmer

  • Members
  • PipPipPip
  • 77 posts
Hey again.

This is what I tried...
String bla = dis.readLine();
byte bytes[] = bla.getBytes("ISO-8859-1");
String s = new String(bytes, "UTF-8");
trie.addWord(s);

but it still crashes when it gets an non-letter character.