I'm writing a program that analyzes a given text file counting occurrences of each unique word. I've got it working except some nagging weird characters in a text file I downloaded to test my program with. I've got the following function that determines of an "end-of-word" character has been found :
Code:
/* Checks character and determines if it is a word-ending character */
int isWordEnding (char inChar) {
char endingChars[] = ",.!?()\":; \t[]{}_/\\\n*><#";
int x, y;
y = 0;
for (x = 0; x < 23; x++) {
if (inChar == endingChars[x]) {
y = 1;
break;
}
}
return y;
}
It works very well. but for the following text, I somehow still get new line characters, but only for at the end of the second and third line.
PRINCESS OF FRANCE. We arrest your word.
Boyet, you can produce acquittances
For such a sum from special officers
Of Charles his father.
KING. Satisfy me so.
I've physically typed the same characters again and the program parses the words without any glitch. I've pasted the above text into MS Word and it just reads a new line and 4 spaces. Are there some other weird characters I don't know about?