Jump to content

string in C: confusion

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
12 replies to this topic

#1
elvira23

elvira23

    Newbie

  • Members
  • PipPip
  • 14 posts
Hi Guys,
I am resuming studying the ANSI C book. looked at the section of string. according to the author, ANSI C does not have a built in string type. instead an array of characters need to be used. According to the book, the compiler ends then the array of char with the character zero '\o'. hence if you want to use a string of 4 chars, you need to declare an array of 5 chars.
i can't see on which circumstance this extra character can be useful? why such decision has been made?

i wrote a simple code to verify the above

char m[6]="elvira";

length=strlen(m); 

for( i=0;i<length;i++)

printf(" the chara at order %d os %c\n",i,m[i]);	


the output was to my suprise

 the chara at order 0 os e

 the chara at order 1 os l

 the chara at order 2 os v

 the chara at order 3 os i

 the chara at order 4 os r

 the chara at order 5 os a

 the chara at order 6 os )

 the chara at order 7 os N

 the chara at order 8 os ├

 the chara at order 9 os w

i was expecting a compile error as the array should have an extra byte to hold the terminating zero or at least the size of the array to be 6. the result of the execution of the program was unexpected to me

any help please?


Cheers,

#2
dcs

dcs

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 775 posts
The compiler will check that things are syntactically correct. But you still need to write correct code.
#include <stdio.h>
#include <string.h>

int main()
{
   char m[] = "elvira";
   size_t i, length = strlen(m);
   for ( i = 0; i < length; i++ )
      printf(" the chara at order %d os %c\n", i, m[i]);
   return 0;
}

/* my output
 the chara at order 0 os e
 the chara at order 1 os l
 the chara at order 2 os v
 the chara at order 3 os i
 the chara at order 4 os r
 the chara at order 5 os a
*/


#3
ZekeDragon

ZekeDragon

    Writes binary right handed and hex left handed

  • Moderators
  • 2,103 posts
@dcs: I think the OP wanted to know why this was happening, not that it was (They could plainly tell).

Assigning const char arrays to other arrays won't cause any compile-time errors in C (they will in C++), even if they plainly overflow the array. The reason that we use null-terminating characters is because there is otherwise absolutely no way for the program to know when the string ends, since there are no other physical markers. Char arrays are nothing more than a sequence of bytes, and there is no outlying structure that tells the computer when that sequence of bytes ends. The cheapest way to do this was to append a null-character, so you always have to support one extra character (that's why this decision was made).

If you notice, your computer went on treating the next characters as if they were part of the string too, since the program could not tell the difference between one character and another. If you run it several more times you'll get different results each time.
Wow I changed my sig!

#4
dcs

dcs

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 775 posts
@Zeke: I was hoping the OP might be curious as to the difference between his declaration and the one that I used and perhaps investigate and learn the why.

#5
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
To hold the string "char" (four characters), you actually have to be able to store "char\0" (five characters).
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#6
elvira23

elvira23

    Newbie

  • Members
  • PipPip
  • 14 posts
Many thanks for your replies. I understand fully what you said wingedpanther.
going back to my code, i lengthen the constant string as follows:
char m[6]="elviraelvira" and I GOT a compile error, namely the string is too long.
if instead i keep char m[6]="elvira" and then set m[6]='\0', the printf shows only "elvir" as expected. however, if i set m[i]='\0' for any i<6, only the characters up to (i-1) are displayed.

as i explained above setting char m[6]="elvira" without ending the string afterwards with '\0', will print ALWAYS for me 10 characters. However, setting char m[]="elvira" will display only "elvira"!

i am surprised with such executions. i was expecting the array of char to be handled via its subscript like with any other type of array but it seems it is not the case

why :confused: they would have better define a string type

#7
dcs

dcs

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 775 posts
The string literal initializer "elvira" contains 7 characters, including the null. By explicitly specifying the length, you tell the compiler not to null terminate the string.
char m[6]="elvira"
So you don't have the null terminator. This is legal code, thus it should not produce an error.

elvira23 said:

if instead i keep char m[6]="elvira" and then set m[6]='\0',
That would be invalid code (but not a compiler error), since m[6] is not an element of the array.

elvira23 said:

as i explained above setting char m[6]="elvira" without ending the string afterwards with '\0', will print ALWAYS for me 10 characters.
Not always, just happenstance.

Regarding your original question as well, without the null terminator, it's not correct to pass this non-null-terminated string to string functions such as strlen.

elvira23 said:

why :confused: they would have better define a string type

As mentioned, you either end up with a complex type that contains the data and some other element that stores the length. Or in the case of C-style strings, you use an out-of-band character -- the null -- to indicate the end. The null terminator is merely simpler.

#8
elvira23

elvira23

    Newbie

  • Members
  • PipPip
  • 14 posts
it was a typo, i meant setting m[5]='\0' in the above

#9
elvira23

elvira23

    Newbie

  • Members
  • PipPip
  • 14 posts
:thumbup: thanks for the explanation. strlen counts up the chars until it reaches '\0'. that's why a subset of the array is only displayed.

it is such low level details which puts me off the language C, sorry :crying:

#10
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
C++ has a built in string type, which is one of many reasons I prefer C++ to C.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#11
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,717 posts
C gives you much more control at the expense of making your life difficult if you don't know exactly what you're doing. Which is why I code in a mix of both at the same time.
sudo rm -rf /

#12
Aereshaa

Aereshaa

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 790 posts
I use C because if you understand the underlying machine, then it's really easy to understand what's happening when there's an error. C++ makes things too complicated with all its supposed 'simplifications'. Also, C++ doesn't really have a string type. A literal "Hello\n" is still a char * to a string in read-only data space, just like in C. Because of this, "Hello " + "World\n" doesn't work.
Watches: Nanoha, Haruhi, AzuDai. Listens to: E-Type, Dj Melodie, Nightcore.
"When people are wrong they need to be corrected. And then when they can't accept it, an argument ensues." - MeTh0Dz