Jump to content

Problem with reading text from files into a string

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
7 replies to this topic

#1
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
Okay, I have another very simple program here. It reads the contents of one line of a file into a string. It uses two functions that I've written, both of which work perfectly.

This works:


char *append( char str[], char c ){

	char *astr = (char *) malloc( sizeof *str + 1 );  // I needed to create a character

	for( int i = 0; i < strlength( str ); i++ ){      // pointer rather than a regular

		astr[i] = str[i];                         // array because otherwise the array

	}                                                 // is destroyed when the function

	astr[strlength( str )] = c;                       // exits, making the return value NULL.

	return astr;

}


This also works:


// calculates the length of a string using pointers

int strlength( char arr[] ){

	int length = 0;

	char *arrptr;

	arrptr = &arr[0];

	for( int i = 0;; i++ ){

		if( *(arrptr + i) ){    // if the end of the string

			length++;       // has not been reached

		}

		else{

			break;

		}

	}

	return length;

}


You can ignore how the two functions are implemented. I just put them there for error checking purposes. The first appends a single character to the end of a string and returns a pointer to that string. The second calculates the length of a string.

Both work perfectly.

This also works perfectly:


char *lineread = "";

char c;

lineread = append( lineread, c );


So does this:


while( i < 5 ){

	lineread = append( lineread, c );

	i++;

}


This doesn't:


#include <stdio.h>

#include <stdlib.h>

#include "mystrings.h"

#include "files.h"


int main( int argc, char *argv[] ){

	FILE *fp;

	fp = fopen( argv[1], "r" );

	char *lineread = "";

	char c;

	while( (c = getc( fp )) != '\n' ){          // This is the important line.

		lineread = append( lineread, c );    // And this.

	}

	printf( "%s", lineread );

	fclose( fp );

	printf( "\n" );

	return 0;

}


The program compiles, but when I run it, it just does nothing. It waits for me to type something, and when I type something and enter it, it just does that again and again and again until I type Ctrl+C.

What the hell is going on here?
Life's too short to be cool. Be a nerd.

#2
dcs

dcs

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 775 posts

DarkLordoftheMonkeys said:

Okay, I have another very simple program here. It reads the contents of one line of a file into a string. It uses two functions that I've written, both of which work perfectly.

This works:

char *append( char str[], char c ){
	char *astr = (char *) malloc( sizeof *str + 1 );  // I needed to create a character
	for( int i = 0; i < strlength( str ); i++ ){      // pointer rather than a regular
		astr[i] = str[i];                         // array because otherwise the array
	}                                                 // is destroyed when the function
	astr[strlength( str )] = c;                       // exits, making the return value NULL.
	return astr;
}

This also works:

// calculates the length of a string using pointers
int strlength( char arr[] ){
	int length = 0;
	char *arrptr;
	arrptr = &arr[0];
	for( int i = 0;; i++ ){
		if( *(arrptr + i) ){    // if the end of the string
			length++;       // has not been reached
		}
		else{
			break;
		}
	}
	return length;
}

You can ignore how the two functions are implemented. I just put them there for error checking purposes. The first appends a single character to the end of a string and returns a pointer to that string. The second calculates the length of a string.

Both work perfectly.

This also works perfectly:

char *lineread = "";
char c;
lineread = append( lineread, c );

So does this:

while( i < 5 ){
	lineread = append( lineread, c );
	i++;
}

This doesn't:

#include <stdio.h>
#include <stdlib.h>
#include "mystrings.h"
#include "files.h"

int main( int argc, char *argv[] ){
	FILE *fp;
	fp = fopen( argv[1], "r" );
	char *lineread = "";
	char c;
	while( (c = getc( fp )) != '\n' ){          // This is the important line.
		lineread = append( lineread, c );    // And this.
	}
	printf( "%s", lineread );
	fclose( fp );
	printf( "\n" );
	return 0;
}

The program compiles, but when I run it, it just does nothing. It waits for me to type something, and when I type something and enter it, it just does that again and again and again until I type Ctrl+C.
What the hell is going on here?
Actually, your code segfaulted on me right off the bat.

First, you shouldn't be using
char c;
c should be an int because that is what getc returns, and that is how EOF is detected. If you were looking for it. [edit]And with regard to your code looping forever until you kill it -- well, that's what you coded it to do. D'oh! Ignore that. I didn't see the input file part. Can you post the input file?[/edit]


#include "mystrings.h"
#include "files.h"
It would appear that you are putting active code in your header files. That is a no-no. Don't put variable definitions or function bodies into header files.

Writing your own version of strlen might be an interesting exercise, but you don't really need to do so. The standard version of it and strcpy are going to be better choices.

"This works perfectly". :P As a private aside, just about every bug I've seen new coders go after has been prefixed with these words. I generally shy away from using those terms when I'm chasing a bug, 'cuz it generally tends to bite me in the backside. [/aside]

for( int i = 0; [COLOR="red"]i < strlength( str )[/COLOR]; i++ )
This is an idiom to avoid. You may not notice anything much for large strings, but with long strings it certainly can be. The intended O(n) complexity becomes O(n^2). That is, each loop iteration you once again check the string length. Not once, but each and every loop iteration.

And as a nice little segue, this is the cause of the segfault I find. Fortunately for me, and unfortunately for you, my system has not zeroed out stuff for me. This made the crash happen, which indicated to me that there is a bug.

You see, you don't null terminate your string. Since a strlen or strcpy operation depends on the null terminator, its absence is going to be an issue.

Your append function is quite buggy as well. Let me ask you this: how many times to you malloc? And then how many times do you free? You can ignore the first question, but should have an equal number. In other words, you are leakin' memory like a sieve.

I think you confuse strlen and sizeof:
char *append( char str[], char c ){
	char *astr = (char *) malloc( sizeof *str + 1 );  // I needed to create a character
sizeof *str is always 1. You are always allocating your string to be a maximum of 2 characters, one of which must be the null terminator in order to be a string. So you can only have one element in your string. There's not much room for concatenation there. ;)

The loop version of strcpy that you have to follow this fails to null terminate the string.

Are you using a C++ compiler for what is ostensibly C code? I would advise against doing so -- otherwise, especially for language newcomers, you might end up writing a polyglot.

#3
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
Wow, that's a lot of stuff I did wrong. I'm going to take your advice and experiment with all this stuff to see what happens. Thanks for the help. :) I guess I still have a long road to travel before I fully understand how C works.
Life's too short to be cool. Be a nerd.

#4
ZekeDragon

ZekeDragon

    Writes binary right handed and hex left handed

  • Moderators
  • 2,103 posts

DarkLordoftheMonkeys said:

This works:
char *append( char str[], char c ){
	char *astr = (char *) malloc( sizeof *str + 1 );  // I needed to create a character
	for( int i = 0; i < strlength( str ); i++ ){      // pointer rather than a regular
		astr[i] = str[i];                         // array because otherwise the array
	}                                                 // is destroyed when the function
	astr[strlength( str )] = c;                       // exits, making the return value NULL.
	return astr;
}
No, no it doesn't. Every time you run this function you're throwing away the old memory that was previously housing char str[]. Further, it doesn't even allocate memory properly since it'll only allocate 2 bytes each time (sizeof(char) is one byte, plus one). You can't sizeof() a passed array and expect it to work, the compiler degrades it to a pointer, and you couldn't sizeof() dynamically allocated arrays even if you didn't pass it. You have to use strlength() for that. Each time you call append you'll have to figure out the strlength() of the initial string. Also, keeping strlength() in a for loop is just bad practice since it would need to be called for each iteration of the loop, place that value in a temporary and then compare it to that.
int aval = strlength(str);
for (int i = 0; i < aval; ++i)
    astr[i] = str[i];
You're also completely ignoring the null-terminating character in all of these string interactions ('\0'), which means strlength won't even work properly after the second iteration on your main loop. Try this:
char *append(char *str, char c)
{
    int len = strlength(str);
    char *temp = realloc(str, sizeof(char) * (len + 2));
    if (temp)
    {
        temp[len] = c;
        temp[len + 1] = '\0';
    }
    else
    {
        fprintf(stderr, "Failed to perform reallocation in append()!");
        return str;
    }

    return temp;
}
Also change this
char *lineread = "";
to this
char *lineread = malloc(sizeof(char));
*lineread = '\0';
However, I highly recommend instead using fgets and a buffer of some kind for your user input. I made a rather decent function to take care of user input automagically once, I'll see if I can't unbury it and post it.


EDIT: D*MN YOU DCS! :P

Edited by ZekeDragon, 23 January 2010 - 08:05 PM.
Decided it was best to return more than NULL in case of failure.

Wow I changed my sig!

#5
dcs

dcs

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 775 posts
It can be a bumpy path learning C, and I hope I'm getting these bits correct. Doing some tinkering...

#6
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
@dcs: Well, I fixed the append() function by putting int len = strlength( str ); at the beginning of the function and replacing sizedof str and every other instance of strlength( str ) with len. It works now, and it doesn't segfault or go into an infinite loop. I'm confused by what you said about header files, that you shouldn't include function definitions in them. What do you include in header files, then?
Life's too short to be cool. Be a nerd.

#7
dcs

dcs

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 775 posts

DarkLordoftheMonkeys said:

@dcs: Well, I fixed the append() function by putting int len = strlength( str ); at the beginning of the function and replacing sizedof str and every other instance of strlength( str ) with len.
Are you making sure to null terminate the strings? Are you avoiding the memory leaks? Can you post the latest code snippet(s)?

DarkLordoftheMonkeys said:

I'm confused by what you said about header files, that you shouldn't include function definitions in them. What do you include in header files, then?
Data type definitions, #define macros (which you should try to avoid or minimize), function prototypes, perhaps a global variable declaration (not a definition) -- but avoid global variables.

If you look in any standard header, you will notice this same pattern.

#8
dcs

dcs

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 775 posts

dcs said:

It can be a bumpy path learning C, and I hope I'm getting these bits correct. Doing some tinkering...
After some tinkering:
#include <stdio.h>

#include <stdlib.h>

#include <string.h>


char *append( char str[], char c )

{

   size_t len = 0;

   char *astr;

   if ( str )

   {

      len = strlen(str);

   }

   astr = realloc(str, len + 2);

   if ( astr )

   {

      str = astr;

   }

   else

   {

      free(str);

      return NULL;

   }

   str[len] = c;

   str[len + 1] = '\0';

   return str;

}


int main( int argc, char *argv[] )

{

   if ( argc > 1 )

   {

      FILE *fp = fopen( argv[1], "r" );

      if ( fp )

      {

         char *lineread = NULL;

         int c;

         while ( (c = getc( fp )) != '\n' && c != EOF )

         {

            lineread = append( lineread, c );

         }

         printf( "%s\n", lineread );

         fclose( fp );

         free(lineread);

      }

      else

      {

         perror(argv[1]);

      }

   }

   return 0;

}
There's quite a bit that I did differently, so I don't know how many questions that might bring to mind.

First, I'm checking to see whether I've got command-line input of a filename before attempting to open a file. It's better safe than sorry, so the check is a good thing even if you always are running the program with a filename specified on the command line.

I'm also checking whether or not the supplied filename was found. That is, whether the file actually opened. Again, safety checks are a good thing to get in the habit of.

Going with the realloc approach, I've initialized lineread to NULL. Note that this is different from and empty string "". I changed c's data type to be correct so I can check for EOF, which is another safety check to include. It seems that the loop you have is to read only the first line of a file. But as your coding grows more complex, maybe you'd like it to read until the end of the file. This would be a way to do so.

In the append function, I've used the standard function strlen. And I've used realloc. With realloc, you don't need to recopy the array contents, and you're not leaking memory.

There is more safety checking going on with the result of realloc, and then there is the actual appending of the new character. And of high importance is the null termination.

Let me know if there is anything I've missed or that you have found confusing. I can take another shot at explaining it -- sometimes I gloss over things.