Strings

We have touched on integers and arrays of integers, and we'll extend this principle. A single character ('c') can be stored in a 'char' type, and, if we take multiples of these chars, and put them after each other, a string is thus nothing more than an array of chars.

char aChar='c';
char * aString="Hello";
printf("Address: %p Value: %c Size: %d\n",&aChar, aChar, sizeof(aChar));
printf("Address of string: %p\n", &aString);
printf("Size of string: %d\n",strlen(aString));
printf("Value: %s\n", aString);
for(i=0;i<=strlen(aString);i++){	printf("Index:%x Address:%p Value:%c\n", i, &aString[i], aString[i]);*}

Here, we create a char, and a char array (which is, in essence, a pointer; this is equal to writing 'char aString[6]=”Hello”;', and do mind the difference between the char 'c' and the string “c”). This generates the following output:

Address: 0xbf8b560f Value: c Size: 1*
Address of string: 0xbf8b5600 Size of string: 5 *
Value: Hello *Index:0 Address:0x8048780 Value:H *
Index:1 Address:0x8048781 Value:e *
Index:2 Address:0x8048782 Value:l *
Index:3 Address:0x8048783 Value:l *
Index:4 Address:0x8048784 Value:o*
Index:5 Address:0x8048785 Value: 

There is actually nothing new here. We handle it the same way as adding integers, except we now use 'strlen()', a function defined in string.h (see man 3 strlen for details) to get the length of the string; a char is only one byte large, and we use %s to print it. There is only one magical thing here and that is how will we know that the string is finished? Well, the array is not {'H','e','l','l','o'}, it is {'H','e','l','l',o',0}. The ASCII null character is added after the string, so how does strlen() work? It is just a while loop which continues increasing the index until the value becomes 0.

Structures

Everything's going well. Let's add another thing on the pile - structures. We know arrays? Arrays are a collection of items of the same type; structures are a collection of things with different types.

struct aStruct*{
	int intMember;*	
        int * intPointer;*	
        char charMember;*	
        char ** stringPointer;*
        };

This defines a structure called 'aStruct', which combines an integer, a pointer to an integer, a char, and a 'double' pointer (a pointer to a string or a pointer to a pointer to a char). Put this declaration outside your functions. Typically, these are placed in header files. Next we can use this struct; we use the previously defined variables to populate this struct:

struct aStruct aStruct;
struct aStruct * aStructPointer;
printf("Address: %p Size: %d\n",&aStruct, sizeof(struct aStruct));
printf("%p %p %p %p\n",&aStruct.intMember, &aStruct.intPointer,
&aStruct.charMember, &aStruct.stringPointer);
aStruct.intMember=6;
aStruct.intPointer=&anInt;
aStruct.charMember='k';
aStruct.stringPointer=&aString;
aStructPointer=&aStruct;
printf("Member of struct: %d\n", (*aStructPointer).intMember);
printf("Member of struct: %d\n", *(*aStructPointer).intPointer);
printf("Member of struct: %d\n", aStructPointer->intMember);
printf("Member of struct: %d\n", *aStructPointer->intPointer);
printf("Member of struct: %s\n", *aStructPointer->stringPointer);

And the output:

Address: 0xbf8b55e8 
Size: 16*0xbf8b55e8 0xbf8b55ec 0xbf8b55f0 0xbf8b55f4 
Member of struct: 6 *Member of struct: 5 *Member of struct: 6 *Member of struct: 5 *Member of struct: Hello 

And what does this teach us? Well, we can declare structures, we can have pointers to structures (it goes further, we can have arrays of structures, and structures can contain arrays, structures can also contain structures and structures can even contain pointers to structures of the same type – this is called a linked list). By using the '.' operator we can access the members of a struct, and when we have a pointer to a struct, we do not need to dereference it first as in (*aStructPointer).intMember, since this is so common we can use the '→' operator as in aStructPointer→intMember. Also, using the double pointer is peanuts. There is, however, one odd thing in the output: here it says the size of this struct is 16, while we added one int (4 byte), one int pointer (4 byte), one char (1 byte) and one char pointer (4 byte). Who stole those three bytes of memory? Well that is called alignment. During the compilation process all memory addresses were aligned to 4-byte multiples since it is much more efficient for the processor to fetch an address which starts at an address which is a multiple of four. But if you would really want to change this, you can.

A word of caution

For all the brave who managed to bear with me this far, my congratulations. I know that the first time people talk about pointers it results in a lot of frowning and thinking 'why would somebody want to use this', but, don't panic - you just need a little practice to get full speed with pointers, and you'll soon see the advantage they bring. But one word of caution is in place: pointers point to 'a' memory location. They can point to any memory location. If you forget to initialize them, or forget to dereference them, you can end up in strange situations. I lost a day this week, because I incremented a pointer (which was zeroed afterwards) instead of incrementing the value the pointer pointed to. C will not prevent you from doing these things, but these will result in your application being terminated. It's the same with arrays: if you write int array[5]; int b; array[6]=0;, you will set the value of b to zero. This leads to memory corruption, and, in extremis, to stack corruption. So, pointers are very powerful, but you need to use them right.

Exercises

  • Collect all the code snippets on this page and turn them into a working program.
  • Try to run this program on a 32bit and a 64bit system (use a livecd for example), and compare the differences.
  • Implement strlen yourself using a while loop.
  • Take a look at some manpages - those of memcpy strcpy strcat memzero, and see that all these functions operate on pointers.
  • A C application typically has 'int main(int argc, char **argv)' as it's main prototype, here argc contains the number of strings passed to the application, and argc is an array of argc strings. Write a small application which prints all arguments given to the application. What is stored in argv[0] ?

назад