Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Strings in c and string.h (Part 1)

c string

  • Please log in to reply
1 reply to this topic

#1 fkl

fkl

    CC Devotee

  • Senior Member
  • PipPipPipPipPipPip
  • 417 posts

Posted 23 December 2012 - 12:53 PM

Motivation
Strings in c are one of the most commonly used data structure and yet there are many subtleties in their usage and understanding which are problematic for even experienced programmers. In this 2 part tutorial, I will attempt to describe how strings are supported in c language as a data type as well as go through each of the commonly used api’s such as strlen(), strcmp(), strtok(), strstr(),strcat() etc.

We will not only discuss api usage, but also common scenarios of failure and some behind the scene implementation details that protects you for unexplained errors or crashes in your programs.

Strings in c – Data structure
The first and foremost concept is that there is no fundamental data type in c to represent strings. The native character array is used to accommodate for this. Please bear in mind that we are strictly talking about C and not C++ so the header file <string.h> or more recently <cstring> is what we are confined to. Don’t confuse it with c++ string which is in namespace std::string and is treated as a separate data type. C++ strings are objects belonging to above class and therefore have properties and methods.

In contrast, a c string is an array of characters which is infamously terminated by a null (‘\0’) character. This would effectively mean there is no length parameter stored separately. It also means that apart from the actual string, you need at least one extra byte for storing the terminating null.

cstring.png

E.g.

char s[7] = “string”; // this would require total 7 bytes atleast, 6 for the string and one for terminating null.
Or
char s[] = “string”;

If size is not specified, then compiler chooses the minimum number of bytes required (including one for null) i.e. 7 in the above case.

char s[3] = “hello”; // this would be a compilation error in c++, but in case of c would be a run time error or at best a warning on compile time.

String Literal vs string constant
A very significant concept to grasp is that

char * s = “str”; // string constant – cannot be modified, gives compilation error if attempted modification.

and

char s[] = “str”; // string literal – can be modified (though not increased in size)

are NOT the same.

Generally, you would say the first is a char pointer pointing to str, whereas the latter is a character array containing the string str. However, the real difference is that first is immutable i.e. you cannot assign /modify the original string as in

s[0]=’m’;

Whereas the same can be done in the second case.

Of course one more difference is that array name (char s[]) evaluates to a constant pointer and hence s cannot be reassigned to point to another memory location. But it can be done for a char *.

e.g.

char s[] = “array”;
char * t = s; // perfectly legal

char c = ‘a’;
t = &c; 		 // still legal

but

s++;	 // compilation error
or
s = &c; // compilation error


Strlen()
The first and common function from the string library which is used as follows.

strlen.jpg

Note that it prints the length of string excluding the terminating null or ‘\0’ character. As mentioned earlier strings in c are just character arrays terminated by nulls.

How strlen is implemented
So string length internally just works by running a loop from start until it encounters a null while incrementing length in each cycle.

Something like

int my_strlen(char *s)
{
	 int len = 0;

	 for(; s[len] != '\0'; len++); // note the semi colon, loop is terminated here since it doesn't do any thing except incrementing counter

	 return len;
}

Lessons learned from implementation
You don’t have to implement library functions but it is important to understand how they work to know the tradeoffs. Following are some important conclusions drawn from above.

1. Strlen runs in O(n) i.e. the entire length of string. So if you need to use it multiple times, make sure you use it once and save the length in some variable which is accessed repeatedly. Where ever possible, move strlen function call outside of a loop (unless the loop operates on a different string each time). You are just wasting your cpu cycles without any reason.
2. Strlen should always be run on strings which are null terminated. Else you might run into an infinite loop.
3. You should always use an array of size n+1 where n is the max length of your string with an additional space for the null.
4. If you are overwriting some large string at several instances in the program, be careful that with each new insertion, you properly overwrite the last ‘null terminator’. Strlen returns as soon as it encounters the first ‘\0’ character even if there is a long string of character present ahead. Consider the following scenario.

	 char s[30] = "hi";
	 printf("Length of s is : %d\n", strlen(s));

	 s[3] = 't';
	 s[4] = 'o';
	 s[5] = 'o';
	 s[6] = '\0';

	 printf("Length of s is : %d\n", strlen(s));

Since s is large enough array that we can store upto 29 characters in it, you can write / append to it. It is an array so is not mutable. However, the above code would string print 2 in both cases.

Why? Because strlen terminates at the first null encountered which exists at s[2] after “hi” (when we assign a string in double quotes, a null is automatically inserted after it). Whatever we inserted beyond s[2] would be ignored by strlen as well as any other string processing function.

To make this work correctly, one needs to over write the null for continuity for e.g. like this

s[2] = ‘ ’; // a space character

Now strlen would return 6 as the total length of s i.e. 2 characters for “hi”, 1 for space and finally 3 for the “too”.

I will follow up with more functions from string library along with discussing implementation of each and how that impacts our usage in part 2.
  • 2
Today is the first day of the rest of my life

#2 fkl

fkl

    CC Devotee

  • Senior Member
  • PipPipPipPipPipPip
  • 417 posts

Posted 20 January 2013 - 01:10 PM

Posted part 2

http://forum.codecal...stringh-part-2/
  • 0
Today is the first day of the rest of my life





Also tagged with one or more of these keywords: c, string

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download