How-to: program in C. Part 3

So far so good. This is already the third part in this series, and we have already made a lot of progress. The past articles explained how to think in code - by introducing functions. This article will probably be one of the most difficult in this series; here we'll touch something which is almost C-specific: the notion of pointers. C is a programming language which lives close to the assembly language, and in assembly languages you spend most of your time moving data around, but in order to do that you need to know where the data is located, hence its address in memory. And this is just what a pointer is. A pointer is an address in your computer's memory, nothing more, nothing less. But when you're working with C, you'll end up with the conclusion that pointers are everywhere. So, after conquering functions and pointers, we should be able to handle almost anything.

In this article, I will not present a 'fully functional program'. I will present small snippets between the text, but you are encouraged to fire up your editor and start experimenting. You will also see that I introduce some 'extras' which are not mentioned in the main title. I will, for example, also introduce structures, arrays, strings, … , because I want to see this series evolve into a practical tutorial, and not into a C textbook.

Two operators * and &

When handling pointers, you will encounter two 'extra' operators. These are * and &. It helps, when you look at code, that you read * as 'the value stored at this address', and & as 'the address of this variable'.

int anInt=5;
int * anIntPointer=&anInt;
printf("Address: %p Value: %d \n",&anInt, anInt);
printf("Address of pointer: %p Address: %p Value: %d \n",&anIntPointer, anIntPointer, *anIntPointer);
printf("Size of pointer: %d size of int: %d\n", sizeof(anIntPointer), sizeof(anInt));

Thus, we declare an integer and assign this integer the value 5, we declare a pointer (mind the extra *), and we let it point to the address of the previously declared integer. Next, we print the address of the integer, and the value of the integer. Then we print the address of the pointer, the value of the pointer (which is an address, the address of anInt), and the value the pointer points to. And to end, we print the size of the pointers and the size of the integer. This produces the following output:

Address: 0xbfc819d8 Value: 5 
Address of pointer: 0xbfc819d4 Address: 0xbfc819d8 Value: 5 
Size of pointer: 4 size of int: 4

Here we can see that both pointers and integers are 4 bytes large (which makes sense, since I'm on a 32-bit computer; if you run this on a 64-bit or a 16-bit computer, these values may vary). The address will be different on your system, but the fact that the address of the pointer and the address of the integer are only 4 bytes apart is not a coincidence; they are simply physically stored next to each other. In printf, we use p to print a pointer (in hexadecimal), s to print a string (for more information see man 3 printf). The sizeof() operator used in the printf statement returns the size of an element (in bytes).

Handling arrays

What is an array? An array is simply a list of variables of the same type. In this example, we declare an array of integers where we can store 5 integers. At this point, we also declare how many integers we want to put in there (in this case five). Here we initialize the array at declaration, but we could do it elsewhere in the program as well.

int anIntArray[5]={10,20,30,40,50};
printf("Address of array: %p\n", &anIntArray);
printf("Size of array: %d\n",sizeof(anIntArray));
for(i=0;i<sizeof(anIntArray)/sizeof(int);i++)*{	
printf("Index:%x Address:%p Value:%d Value: %d\n", i, &anIntArray[i], anIntArray[i], *(anIntArray+i));*}

This code produces the following output:

Address of array: 0xbf8b55d4*Size of array: 20*
Index:0 Address:0xbf8b55d4 Value:10 Value: 10 *
Index:1 Address:0xbf8b55d8 Value:20 Value: 20 *
Index:2 Address:0xbf8b55dc Value:30 Value: 30 *
Index:3 Address:0xbf8b55e0 Value:40 Value: 40 *
Index:4 Address:0xbf8b55e4 Value:50 Value: 50

Now, what does this show us? The size of the array equals the number of elements times the size of each element (there is nothing extra stored). All elements are placed next to each other in memory (look at the memory addresses: they each differ by 4. By adding [i] after the array name, we can address an element of the array at index i. But, and here's some magic called 'pointer arithmetic', if we add 1 to an int pointer, the pointer is increased by 4 (the size of the integer) - not by one. So, we can address the array by using the subscript method ([i]), but also with some pointer arithmetic, and, in essence, the array we declared is just a pointer to memory - where several values of the same type are stored.

Strings

We have touched on integers and arrays of integers, and we'll extend this principle. A single character ('c') can be stored in a 'char' type, and, if we take multiples of these chars, and put them after each other, a string is thus nothing more than an array of chars.

char aChar='c';
char * aString="Hello";
printf("Address: %p Value: %c Size: %d\n",&aChar, aChar, sizeof(aChar));
printf("Address of string: %p\n", &aString);
printf("Size of string: %d\n",strlen(aString));
printf("Value: %s\n", aString);
for(i=0;i<=strlen(aString);i++){	printf("Index:%x Address:%p Value:%c\n", i, &aString[i], aString[i]);*}

Here, we create a char, and a char array (which is, in essence, a pointer; this is equal to writing 'char aString[6]=”Hello”;', and do mind the difference between the char 'c' and the string “c”). This generates the following output:

Address: 0xbf8b560f Value: c Size: 1*Address of string: 0xbf8b5600 Size of string: 5 *Value: Hello *Index:0 Address:0x8048780 Value:H *Index:1 Address:0x8048781 Value:e *Index:2 Address:0x8048782 Value:l *Index:3 Address:0x8048783 Value:l *Index:4 Address:0x8048784 Value:o*Index:5 Address:0x8048785 Value:

There is actually nothing new here. We handle it the same way as adding integers, except we now use 'strlen()', a function defined in string.h (see man 3 strlen for details) to get the length of the string; a char is only one byte large, and we use %s to print it. There is only one magical thing here and that is how will we know that the string is finished? Well, the array is not {'H','e','l','l','o'}, it is {'H','e','l','l',o',0}. The ASCII null character is added after the string, so how does strlen() work? It is just a while loop which continues increasing the index until the value becomes 0.

Structures

Everything's going well. Let's add another thing on the pile - structures. We know arrays? Arrays are a collection of items of the same type; structures are a collection of things with different types.

struct aStruct*{
	int intMember;*	
        int * intPointer;*	
        char charMember;*	
        char ** stringPointer;*
        };

This defines a structure called 'aStruct', which combines an integer, a pointer to an integer, a char, and a 'double' pointer (a pointer to a string or a pointer to a pointer to a char). Put this declaration outside your functions. Typically, these are placed in header files. Next we can use this struct; we use the previously defined variables to populate this struct:

struct aStruct aStruct;
struct aStruct * aStructPointer;
printf("Address: %p Size: %d\n",&aStruct, sizeof(struct aStruct));
printf("%p %p %p %p\n",&aStruct.intMember, &aStruct.intPointer,
&aStruct.charMember, &aStruct.stringPointer);
aStruct.intMember=6;
aStruct.intPointer=&anInt;
aStruct.charMember='k';
aStruct.stringPointer=&aString;
aStructPointer=&aStruct;
printf("Member of struct: %d\n", (*aStructPointer).intMember);
printf("Member of struct: %d\n", *(*aStructPointer).intPointer);
printf("Member of struct: %d\n", aStructPointer->intMember);
printf("Member of struct: %d\n", *aStructPointer->intPointer);
printf("Member of struct: %s\n", *aStructPointer->stringPointer);

And the output:

Address: 0xbf8b55e8 
Size: 16*0xbf8b55e8 0xbf8b55ec 0xbf8b55f0 0xbf8b55f4 
Member of struct: 6 *Member of struct: 5 *Member of struct: 6 *Member of struct: 5 *Member of struct: Hello

And what does this teach us? Well, we can declare structures, we can have pointers to structures (it goes further, we can have arrays of structures, and structures can contain arrays, structures can also contain structures and structures can even contain pointers to structures of the same type – this is called a linked list). By using the '.' operator we can access the members of a struct, and when we have a pointer to a struct, we do not need to dereference it first as in (*aStructPointer).intMember, since this is so common we can use the '→' operator as in aStructPointer→intMember. Also, using the double pointer is peanuts. There is, however, one odd thing in the output: here it says the size of this struct is 16, while we added one int (4 byte), one int pointer (4 byte), one char (1 byte) and one char pointer (4 byte). Who stole those three bytes of memory? Well that is called alignment. During the compilation process all memory addresses were aligned to 4-byte multiples since it is much more efficient for the processor to fetch an address which starts at an address which is a multiple of four. But if you would really want to change this, you can.

A word of caution

For all the brave who managed to bear with me this far, my congratulations. I know that the first time people talk about pointers it results in a lot of frowning and thinking 'why would somebody want to use this', but, don't panic - you just need a little practice to get full speed with pointers, and you'll soon see the advantage they bring. But one word of caution is in place: pointers point to 'a' memory location. They can point to any memory location. If you forget to initialize them, or forget to dereference them, you can end up in strange situations. I lost a day this week, because I incremented a pointer (which was zeroed afterwards) instead of incrementing the value the pointer pointed to. C will not prevent you from doing these things, but these will result in your application being terminated. It's the same with arrays: if you write int array[5]; int b; array[6]=0;, you will set the value of b to zero. This leads to memory corruption, and, in extremis, to stack corruption. So, pointers are very powerful, but you need to use them right.

Exercises

• Collect all the code snippets on this page and turn them into a working program. • Try to run this program on a 32bit and a 64bit system (use a livecd for example), and compare the differences. • Implement strlen yourself using a while loop. • Take a look at some manpages - those of memcpy strcpy strcat memzero, and see that all these functions operate on pointers. • A C application typically has 'int main(int argc, char **argv)' as it's main prototype, here argc contains the number of strings passed to the application, and argc is an array of argc strings. Write a small application which prints all arguments given to the application. What is stored in argv[0] ?

fullcircle:19-dev:c_pt3

Содержание