Jump to content

C, text file and it's encoding

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
11 replies to this topic

#1
denarced

denarced

    Programmer

  • Members
  • PipPipPipPip
  • 182 posts
can I even read a file without knowing how it is encoded?
as I understand it,
a text file is just filled with numbers and one has to know
in which way it is encoded to read it properly.
How does one find out how it is encoded ?
And do so using C ?

thanks in advance

#2
MeTh0Dz

MeTh0Dz

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,119 posts
Lol know... If you try to read a text file, you are just going to read whatever text is stored in that memory space. So if it is plain text, you'll read the plain text, and if it is encrypted, you'll read the encrypted text.

To read and write files, look at CreateFile(), ReadFile(), and WriteFile().

#3
denarced

denarced

    Programmer

  • Members
  • PipPipPipPip
  • 182 posts
well,
the actual problem is that when reading the text, I'm trying to find certain strings in the text and those strings like the whole text, include certain letters. Such as ä ja ö. This is where I ran into a problem. Don't know how to search for strings with these letters. if I write
char line[] = "näkyvä";
and try to search for that, it won't find it

#4
MeTh0Dz

MeTh0Dz

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,119 posts
I don't have the time right now to look on the ascii chart to see if they are all valid characters. But if they are on the ascii chart then they are valid and there is no reason that you wouldn't be able to find them in a given text.

There is probably just an error in your code if that is the case.

#5
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
Generally, you will read a text file in text mode. The environment will handle the basics of the extended ASCII encoding. If you are working with something that is NOT plain-text, you have an issue :)
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#6
Aereshaa

Aereshaa

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 790 posts

MeTh0Dz|Reb0rn said:

Lol know... If you try to read a text file, you are just going to read whatever text is stored in that memory space. So if it is plain text, you'll read the plain text, and if it is encrypted, you'll read the encrypted text.

To read and write files, look at CreateFile(), ReadFile(), and WriteFile().

Uh, there are no functions by those names in C. It's fopen() to open files (whether they previously exist or not), and then fprintf() or fwrite() to write to it, and fscanf() and fread() to read. I don't know where you got that idea. Probably fron C# or something. Remember to fclose() files when done!

#7
MeTh0Dz

MeTh0Dz

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,119 posts
What? Yeah there is, do you want me to show you the documentation?

There is more than one way to read and write from a file.

Here is the page for Read File, I didn't feel like linking all of them.

ReadFile Function (Windows)

#8
Aereshaa

Aereshaa

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 790 posts
Ah, but that's nonstandard, locks your code into windows, and given that it's Microsoft we're talking about, probably slower. And given that I use Linux, my statement was perfectly true on my machine. I prefer to use standard input/output functions, which work on all operating systems.

#9
MeTh0Dz

MeTh0Dz

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,119 posts
Well that's the difference, I code pretty much strictly for Windows so I prefer to just use WinAPIs.

#10
telboon

telboon

    Newbie

  • Members
  • PipPip
  • 26 posts
Short Answer:
Get a hex editor, find out the values for those non standard characters, and input them into the string of characters using integers

Long Answer:
It could even be on a different encoding, such that 8bit characters will make no sense to the text file. It could be on 7 bit, 9 bit, or some other crap just to purposely screw you up. So, find the encoding, and you probably need some stuff with binary IO

#11
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,720 posts
Well, if you're running Windows, you can test to see if a file is encrypted by doing the following test:


#include <windows.h>


//other code

DWORD dwFileAttribs = GetFileAttributes(szMyFileNameString);

if(dwFileAttribs & FILE_ATTRIBUTE_ENCRYPTED)

    //file is encrypted

else

    //file isn't encrypted



A cheap way to test to see if you're dealing with plain text or Unicode, UTF-16, etc, is to check and see if there are any control characters (chars with ASCII value 0-31). If there are, and you know it's a text file, then it's not plain ASCII text.

#12
denarced

denarced

    Programmer

  • Members
  • PipPipPipPip
  • 182 posts
lots of responses
:)
thanks for all