/* Professor Liang's Quick Guide to Unix Programming in C PART I The C programming language predates C++. Its main difference with C++ is the lack of classes and objects. C is today the dominant language for systems programming on both Microsoft Windows and (especially) Unix operating systems. I assume you have a working knowledge of basic C++. Systems programming in C requires the understanding of the structure of data at a closer level. We will begin, therefore with an introduction (or review) of basic data types. Types C integer types on 32 bit machines: char, short, int, long unsigned char, unsigned short, unsigned int, unsigned long. A char is 8 bits or one byte, a short 2 bytes, int 4 bytes and long 8 bytes. Note that a "char" is not necessarily an ascii character like 'A'. A char is just another name for an 8 bit value. In fact, C (and C++/Java for that matter) doesn't even distinguish 'A' with 65, which is the ascii value for 'A'. As 1's and 0's They're one and the same! An integer value can be interpreted as being signed or unsigned. Modern computers use the "two's complement" representation of negative numbers. Basically, this means that if the first (most significant) bit of the number is 1, the number is negative, and if it's 0, the number is non-negative. If you declare a variable x to be a char, for example, its range of values is -128 to +127. But if you declare it to be an unsigned char, then its range of values becomes 0 to 255. This is important. Let's say you inadvertantly did the following: char x = 255; What would the value of x be? It'll actually be -1 ! Because if the bit pattern for 255 is interpreted as a signed number, it will be -1. Both signed and unsigned numbers are but interpretations of the same ones and zeros. For systems programming, I advice to in general use unsigned types. Consider what could happen, for example, when you type cast a char (byte) y to an int: x = (int)y. With a positive number, this just means that 24 bits of zeros will be attached to the front of the byte. However, if the char represents a negative number, then the resulting 32 bit binary sequence may must have a one in the most significant (first) bit. In fact, -1 as a char looks like 11111111. But when type casted to an int, it will become 11111111111111111111111111111111. Similarly, typecasting an int to a char will involve more than just "chopping" off the first three bytes if negative numbers are involved. You need to think more carefully about the representation of data than you do in general purpose, high level programming. Pointers It is helpful to briefly introduce (or review) pointers when one embarks on C programming. If a variable is declared as a pointer to a value, such as in "int * x;" then the value of x is not an integer but the memory address where the intended integer is stored. There are three basic pointer-related expressions in C, illustrated in the following example: int y = 4; int * x; x = &y; *x = 5; // at this point, the value of y and *x are the same (5) and // the value of x and &y are the same (same memory address). The second line declares x as the address of an integer (4 byte value). The third line contains the expression "&y", which is the address where the value of y (5) is stored. This address is assigned to x. In the fourth line, "*x" means the value stored at address x. Note that when used in a declaration, as on the second line, the * indicates that something is a pointer. However, when used in a regular program expression, the * has another use! *x means "x is an address, but I don't want the address, I want the value stored at the address". Similarly, &y means "y is an integer but don't give me the integer, give me the address where the integer resides". Got it? Now consider the following function: void f(int x) { x = x+2; } What does this function do. You should know that it does nothing. Changing the local variable x has no effect on anything outside the function. However, say we wrote void g(int *x) { *x = *x + 2; } Then if we have: int y = 3; g(&y); Then the value of y would be changed to 5 after the function call. This is because the function g did not modify the local variable x, which is a memory address. Rather, it modified the value stored at the address, which is the value of y in the scope from which the function is called. Therefore, what g does remains in effect even after it returns. This form of parameter passing is known as call by reference. When we want to write a function that modifies a variable in C, we must pass in an address. You'll see this used in the scanf function later. Arrays and Strings: C++ and Java have elaborate library routines to deal with strings. In pure C strings are delt with at a lower level. A string in C should not be thought of as something like "abcd". Rather, a string in C is a an array of one byte values - that is to say, an array of chars. char buffer[128] declares an array (string) of up to 128 chars. Now, it's important to realize that the value of the variable "buffer" is in fact the address of the first cell of the array. That is, buffer is a pointer to char. In fact, the following code is type-consistent: char * c; char buffer[128]; c = buffer; because both c and buffer are pointers to chars. Got it? In fact, the expression *(buffer+2) is equivalent to buffer[2]: they both return the value stored at the address 2 units offset from buffer. In C, strings terminate in a 0 byte (not the character '0' but a byte with all 0's as its bits). So in the string "hello", there are actually six bytes, the last being the terminating zero byte. You must take this byte into account when allocating memory for your string. I will talk about how to assign values to strings when I address printf and scanf below Structs Pure C does not have classes. It does have a primitive way to group data into a single structure called a struct, or record. Just imagine a class with no methods, no constructor or destructor, and where everything is public: that's a struct. Here's the syntax of a structure designed to represent a (x,y) coordinate: struct coordinate { int x; int y; }; You use structs as in: struct coordinate A; // declares a coordinate struct A.x = 3; A.y = 4; // etc ... Heap Allocation (malloc) In C++ (or Java), when you need dynamic heap (as opposed to stack) storage you use the directive "new". This automatically allocates enough memory for the structure. In pure C, the built-in function "malloc" allocates memory. But unlike "new", you will have to specifically tell it how many bytes to allocate. malloc returns a pointer to the start of the memory block it's given you. You must type-cast this pointer to the specific type of your structure. Here's an example of a dynamically-allocated coordinate structure: struct coordinate *B; // defines pointer to coordinate struct B = (struct coordinate*) malloc(sizeof(struct coordinate)); B->x = 2; B->y = 4; // etc... The "sizeof" function computes how many bytes is needed to represent a coordinate struct (you can also calculate this value yourself: in case of struct coordinate it's 8 (two 32 bit integers)). To deallocate memory, you call the "free" function: free(B); Everything you allocate with malloc should be "freed" when nolonger needed. Input/Output: (printf and scanf) At first it may appear that C's basic IO functions look strange compared to "cin" and "cout". But in fact "scanf" and "printf" are very powerful tools. The "f" in "printf" and "scanf" stands for "formated". The first argument to printf is always a string which defines a format. Here's an example: int x = 3; int y = 4; printf("x is %d and y is %d",x,y); These statements will print "x is 3 and y is 4". The '%d' inside the format string tells printf to print in decimal format the respective parameter after the format string. You can also specify the printing of single characters (%c), strings (%s) and other numerical formats. Scanf works as follows: int x; scanf("%d",&x); The first argument to scanf is also a format string, telling it what to look for (a decimal value). The second and subsequent arguments must be *pointers*. This is, you should recall, because scanf must set the values of these variables. One common mistake beginners make with respect to scanf is to write the following kind of code: char buffer[64]; scanf("%s",&buffer); // reads a string in to the buffer This code is based on a misunderstanding of pointers - a beginner who wrote this may have been thinking "in order to have a pointer, I have to use & in front". But in this case buffer is already a pointer, and the correct call is just scanf("%s",buffer); the extra & will cause the string to be read into memory starting at the address where the pointer variable buffer is stored - which probably will mean that it will overwrite other values in the current stack frame and cause disaster. If you do sscanf("%s",buffer), sscanf will read characters into the string pointed to by buffer until a white space (space, tab, etc...) is encountered. printf and scanf has many variants. fprintf and fscanf will write/read to an IO stream (which can be, say, a disk file or a network socket). sprintf and sscanf can print to a string and read from a string. These two functions are useful for string manipulation. For example: int x = 3; char A[64]; // string of upto 64 chars sprintf(A,"x is %d",x); Will store the value "x is 3" into the string A. Similarly, you can concactenate two strings with: sprintf(A,"%s%s",B,C); This would make A into the string B followed by C. sscanf can be used to convert strings into other formats. For example, to convert a string into an integer, you do: int x; sscanf("12","%d",&x); // x will get 12. int *a, *b, *c, *d; sscanf("147.4.150.248","%d.%d.%d.%d",a,b,c,d) will store each byte of the IP address respectively into addresses a, b, c, and d. HINT HINT HINT On Unix/Linux machines, all built-in C functions have descriptions via man pages. For example, type "man printf" in a unix shell to see a technical description of the printf function and its variants. There are many other io commands in C. Use the "man pages" to learn about getc and gets. Now let's look at a couple of C programs. The first program simply inputs a string (using the gets function you're supposed to have looked up) and prints it backwords character by character. To compile and run a C program, such as this one, on a Unix machine, type "gcc cguide.c -o cguide" to compile " then "./cguide" to run, assuming that the file with the source code is "cguide.c". */ #include /* note not used in C++ */ #include int main() { int i; // array index and loop counter. char strbuffer[64]; // string of up to 64 characters printf("Enter a string: "); // cout gets(strbuffer); // read string into buffer i = strlen(strbuffer) -1; // i now index of last char in string for(;i>=0;i--) printf("%c",strbuffer[i]); printf("\n"); exit(0); } /* The second set of examples declare structures and functions for linked lists. The "typedef" directive is just a way to assign shorter names to types. */ typedef struct listcell * cell; // now writing "cell" is the same as // writing "struct listcell *" struct listcell // a node in a linked list { char head; // information stored in cell. cell tail; // pointer to next cell/rest of list }; // The following function encapsulates the process of constructing // a cell node with head h and tail t. cell cons(char h, cell t) // make a list with head h and tail t { cell newcell; newcell = (cell) malloc(sizeof(struct listcell)); newcell->head = h; newcell->tail = t; return newcell; } // for example, to create a list with 'a', 'b' and 'c', use the expression // cons('a', cons('b', cons('c',NULL))) void freecells(cell L) // deallocate memory for each cell { if (L != NULL) { freecells(L->tail); // recursively free the cells in the tail first free(L); } } // prints each element of list: void printlist(cell L) { cell ptr; for(ptr=L;ptr != NULL;ptr=ptr->tail) { printf("%d ",ptr->head); } printf("\n"); } // function to convert an array of chars into a list of chars: cell arraytolist(char *A, int size) // size=length of array { cell L = NULL; int i; for(i=size-1;i>=0;i--) // it has to go backwards, since cons L = cons(A[i],L); // will add cells to the front of the list return L; } /* To be Continued ... */