statements (you did make that quick reference sheet we  recommended in Chapter 2, right?), but you should  quickly become familiar with them. If you find yourself  struggling with this translation, it likely either means that  your description of your algorithm is too vague (in which  case, you need to go back to it, think about what  precisely you meant, and refine it), or that the pieces of  your algorithm are complex, and you are getting hung up  on them, rather than calling a function (as described  above) to do that piece, which you will write afterwards.          The process of taking large, complex pieces and  separating them out into their own functions—known as  top-down design—is crucial as you write larger and larger  programs. Initially, we will write individual functions to  serve a small, simple purpose—we might write one or  two additional functions to implement a complex step.  However, as your programming skill expands, you will  write larger, more complex programs. Here, you may end  up writing dozens of functions—solving progressively  smaller problems until you reach a piece small enough  that you do not need to break it down any further. While it  may seem advantageous to just write everything in one  giant function, such an approach not only makes the  programming more difficult, but also tends to result in a  complex mess that is difficult to test and debug.  Whenever you have a chance to pull a well-defined  logical piece of your program out into its own function,  you should consider this an opportunity, not a burden. We  will talk much more about writing larger programs in  Chapter 13 after you master the basic programming  concepts and are ready to write significant pieces of  code.    4.5.1 Composability
When you are translating your code from your algorithmic  description to C (or whatever other language you want),  you can translate an instruction into code in the same  way, no matter what other steps it is near or what  conditions or repetitions it is inside of. That is, you do not  have to do anything special to write a loop inside of  another loop, nor to write a conditional statement inside  of a loop—you can just put the pieces together and they  work as expected.          The ability to put things together and have them work  as expected is called composability and is important to  building not only programs, but other complex systems. If  you put a for loop inside of an if statement, you do not  need to worry about any special rules or odd behaviors:  you only need to know how a for loop and an if  statement work, and you can reason about the behavior  of their combination.          In general, modern programming languages are  designed so that features and language constructs can  be composed and work as expected. C (and later C++)  follows this principle pretty well, so you can compose  pretty much anything you learn from this book with pretty  much anything else (with one notable exception that we  will discuss in Section 18.7).    4.5.2 Finishing Our Earlier Examples    Now that we have seen how to translate our generalized  steps into code, we can finish our earlier examples. We  will start with the simpler one, the triangle of stars. Here,  we see the code as the translation of our generalized  steps from earlier (which are placed in the code as  comments—each line corresponds to the directions from  the algorithm on the line right before it):
1 //prints i stars            2 void printIStars(int i) {              3 //Count (call it j) from 1 to i (inclusive)            4 for (int j = 1; j <= i; j++) {              5 // Print a star            6 printf(\"*\");          7}          8}              9 //prints a triangle of n stars          10 void printStarTriangle(int n) {            11 //Count (call it i) from 1 to n (inclusive)          12 for (int i = 1; i <= n; i++) {            13 //Print i stars          14 printIStars(i);            15 //Print a newline          16 printf(\"\\n\");        17 }        18 }          Note how we abstracted out printIStars. The  resulting function is small enough and simple enough that  we would have been justified in writing it inline if we saw  exactly how to do it right away. However, there is nothing  wrong with pulling it out into its own function and solving  it separately. In fact, if it is not immediately obvious what  to write to translate it, abstracting it out is exactly what  you should do.          The other example we saw earlier was our rectangle  intersection problem. We can translate these steps into  code as well, using the same approach. Again, we  include the generalized steps as comments and abstract  out the maximum and minimum functions—here, doing so  allows us to avoid writing the code for this functionality  twice, as we need each in two places. Here is the  resulting code:              1 //a rectangle with left, bottom, top, and right            2 struct rect_tag {          3 float left;          4 float bottom;          5 float top;          6 float right;
7 };          8 typedef struct rect_tag rect_t;              9          10 float minimum(float f1, float f2) {            11 //compare f1 to f2          12 if (f1 < f2) {            13 //if f1 is smaller than f2, then f1 is your          14 return f1;        15 }        16 else {            17 //otherwise, f2 is your answer          18 return f2;        19 }        20 }        21 float maximum(float f1, float f2) {            22 //compare f1 to f2          23 if (f1 > f2) {            24 //if f1 is larger than f2, then f1 is your a          25 return f1;        26 }        27 else {            28 //otherwise, f2 is your answer          29 return f2;        30 }        31 }            32 //To find the intersection of two rectangles, r1          33 rect_t intersection(rect_t r1, rect_t r2)            34 //Make a rectangle (called ans) with          35 rect_t ans;            36 //left: maximum of r1’s left and r2’s left          37 ans.left = maximum(r1.left, r2.left);            38 //bottom: maximum of r1’s bottom and r2’s botto          39 ans.bottom = maximum(r1.bottom, r2.bott            40 //right: minimum of r1’s right and r2’s right          41 ans.right = minimum(r1.right, r2.right)            42 //top: minimum of r1’s top and r2’s top          43 ans.top = minimum(r1.top, r2.top);            44 //The rectangle called ans is your answer          45 return ans;        46 }    4.6 A Complete Example                Video 4.1: Writing the isPrime function.
Video 4.1 walks through a complete example—    writing the isPrime function, which takes one integer (N)  and determines if N is prime (in which case, it returns 1)  or not (in which case it returns 0).    4.7 Next Steps: Compiling, Running,  Testing and Debugging    Steps 6 (testing your program) and 7 (debugging it)  require you to be able to run your code. The computer’s  processor does not actually understand the source code  directly, so the source code must first be translated to the  numerically encoded (Everything Is a Number)  instructions that the processor can understand. In the  next chapter, we will discuss this process, as well as  some of the details of what you need to put in your  source file to make a complete program. We will then  discuss testing and debugging in Chapter 6.    4.8 Practice Exercises    Selected questions have links to answers in the back of  the book.    • Question 4.1 : Write a function myAbs, which takes    an integer and returns an integer that is the    absolute value of .    • Question 4.2 : Write a function avg3, which takes    three integers ( , , and ) and returns the    floating point number that is their average. Be    careful—you may need to think back to what you    learned in Section 3.4.2 to be sure you get the    right answer (in particular, if , , and    ), you should get       , not ).
• Question 4.3 : Write a function myRound, which   takes a double and returns an integer. This   function should round to the nearest integer,   and return the rounded result. (Hint: to get the     fractional portion of (the part after the decimal),   think about how you can get the integral portion   (the part before the decimal) using what you   learned in Chapter 3—then think about what   mathematical operation you can use to compute   the fractional portion from the information you   have).  • Question 4.4 : Write a function factorial, which   takes an integer and returns an int that is the   factorial of ( in math notation).  • Question 4.5 : Write a function isPow2, which   takes an integer and returns an int that is 1   (“true”) if is a power of 2 and 0 (“false”) if it is     not. Note that 1 is a power of 2 ( ), and 0 is not a   power of 2. Note: some approaches to this     problem involve computing . In C, if you write     2^i it will NOT compute —instead, it will   compute the bitwise exclusive-or (XOR) of 2 and     . If you want to compute easily, you can write   1 << i (where << is the binary left shift operator—   so it takes the number 1 and puts 0s after it in   the binary representation.  • Question 4.6 : Write a function printBinary, which   takes an integer and returns void. This function   should print the binary representation of the   number . You may assume that the number   has at most 20 bits.  • Question 4.7 : Write a function printFactors,   which takes an integer , and returns void. This   function should print the prime factorization of —   that is, it should print a multiplication expression   composed of prime numbers, which if evaluated
would result in . For example, given the number           132, your function should print 2 * 2 * 3 * 11. If           your function is given a number that is 1 or less,           your function should print nothing. Hint: you should           end up with at least two steps that are too complex           to translate directly into a single line—these should           result in additional functions you write.          • Question 4.8 : Write a function isPerfect, which           takes an integer and determines if is perfect           (hint: the definition of “perfect number” is domain           knowledge in math—if you do not know it, look it           up before you attempt Step 1). If is perfect, your           function should return 1, otherwise it should return           0.          • Question 4.9 : Write a function power, which takes           two unsigned integers, and , and returns an             unsigned integer. This function should compute           and return (that is, to the power).            • Question 4.10 : For any of the previous questions,           take your code, and trade it with a friend. First, you           should examine the syntax of each other’s code—           does it follow the rules we specified in Chapter 2?           If not, explain to each other what you think is not           correct. Next, have her execute your code by hand           (for some reasonable parameter values) while you           execute her code by hand. Determine if each           other’s code is correct. If not, help each other           understand what might be wrong, and figure out           how to fix it.    3 Types5 Compiling and Running                                                   Generated on Thu Jun 27 15:08:37 2019 by L T XML
I Introduction to Programming in C4 Writing Code6 Fixing Your Code:  Testing and Debugging
Chapter 5    Compiling and Running    Once you have written your code, you need to compile it in  order to be able to run it. Compiling a program is the act of  translating the human-readable code that a programmer  wrote (called “source code”) into a machine-executable  format. The compiler is the program that performs this  process for you: it takes your source code as input and  writes out the actual executable file, which you can then run.          At its simplest, compiling your program is a matter of  running the compiler and giving it a command line argument  specifying the .c file with the source code for your program.  There are many different C compilers, but we will be using  GCC, which stands for “GNU Compiler Collection.” If you  wrote your program in a file called myProgram.c, you could  execute the command gcc myProgram.c to compile your  code. Assuming there are no errors in your code, the  compiler would produce an executable program called a.out,  which you could then execute by typing the command  ./a.out. However, if you try this on any of the C code you  wrote in the previous chapters, you will encounter errors—  we need to add a bit more to our code (which we will see  shortly) to make it compile.          If you are not familiar with using a UNIX/Linux  command-line interface, you might want to read Appendix B  for more information about using the command line and  command-line arguments. Typically, you will invoke the  compiler with more arguments than just the name of one  source file. We will discuss those options after we cover  some more about the compilation process.    5.1 The Compilation Process
When you are programming, you will use the compiler all the  time, so it is useful to have a bit of an understanding of what  it does at a high level. The inner workings and details of the  compiler are quite complex and require an in-depth  understanding of many areas of computer science and  engineering, so we will not go into those. Instead, we will just  cover the parts relevant to day-to-day program writing.     Figure 5.1: A high-level view of the process that GCC takes to compile     a program. Light blue boxes represent code you have written; light   green boxes represent built-in parts of C.          Figure 5.1 shows a high-level overview of the process  that GCC goes through to compile the code. In this picture,  the light blue boxes represent code you have written, while  the light green boxes represent built-in parts of C. The  orange clouds indicate steps of this process (each is a  separate program, but GCC invokes these programs for  you), and the white boxes represent intermediate files that  GCC generates to pass information from one stage to the  next. The dark blue box in the upper right represents the  final executable—the program that you can run to make your  computer do whatever the program tells it to do.    5.1.1 The Preprocessor: Header Files and  Defines    The first step (in the upper left) is the preprocessor, which  takes your C source file and combines it with any header  files that it includes, as well as expanding any macros that  you might have used. To help understand this process, we
will look at our first complete C program, which is  algorithmically simple (all it does is print “Hello World”), but  useful for explaining the compilation process.            1 #include <stdio.h>          2 #include <stdlib.h>              3            4 int main(void) {          5 printf(\"Hello World\\n\");          6 return EXIT_SUCCESS;          7}          While the main function’s contents should be mostly  familiar—a call to printf and a return statement—there are  several new concepts in this program. The first two lines are  #include directives (pronounced “pound include” or “hash  include”). These lines of code are not actually statements  that are executed when the program is run, but rather  directives to the preprocessor portion of the compiler. In  particular, these directives tell the preprocessor to literally  include the contents of the named file at that point in the  program source, before passing it on to the later steps of the  compilation process. These #include directives name the file  to be included in angle brackets (<>) because that file is one  of the standard C header files. If you wrote your own header  file, you would include it by placing its name in quotation  marks (e.g., #include \"myHeader.h\"). (This is not a formal  rule but a very common convention.) Preprocessor directives  begin with a pound sign (#) and have their own syntax.          In this particular program, there are two #include  directives. The first of these directs the preprocessor to  include the file stdio.h and the second directs it to include  stdlib.h. These header files—and header files in general—  primarily contain three things: function prototypes, macro  definitions, and type declarations.          A function prototype looks much like a function  definition, except that it has a semicolon in place of the body.  The prototype tells the compiler that the function exists  somewhere in the program, as well as the return and
parameter types of the function. Providing the prototype    allows the compiler to check that the correct number and    type of arguments are passed to the function and that the    return value is used correctly, without having the entire    function definition available. In the case of printf, stdio.h    provides the prototype. The actual implementation of printf    is inside the C standard library, which we will discuss further    when we learn about the linker in Section 5.1.4            Header files may also contain macro definitions. The    simplest use of a macro definition is to define a constant,    such as    1 #define EXIT_SUCCESS      This directive (from stdlib.h) tells the preprocessor to define    the symbol EXIT_SUCCESS to be . Whenever the preprocessor    encounters the symbol EXIT_SUCCESS, it sees that it is defined    as a macro and expands the macro to its definition. In this    case, the definition is just , so the preprocessor replaces    EXIT_SUCCESS with in the source it passes on to the later    stages of compilation. Note that the preprocessor splits the    input into identifiers, and checks each identifier to see if it is    a defined macro, so EXIT_SUCCESS_42 will not expand to 0_42,    since it is not defined as a macro, and the preprocessor will    leave it alone (unless it is defined elsewhere).            Using macro definitions for constants provides a variety    of advantages to the programmer over writing the numerical    constant directly. For one, if the programmer ever needs to    change the constant, only the macro definition must be    changed, rather than all of the places where the constant is    used. Another advantage is that naming the constant makes    the code more readable. The naming of the constant in    return EXIT_SUCCESS gives you a clue that the return value    here indicates that the program succeeded. In fact, this is    exactly what this statement does. The return value from main    indicates the success or failure of your program to whatever    program ran it.
A third advantage of using macro-defined constants is    portability. While 0 may indicate success on your platform—    the combination of the type of hardware and the operating    system you have—it may mean failure on some other    platform. If you hardcode the constant 0 into your code, then    it may be correct on your platform, but it may need to be    rewritten to work correctly on another platform. By contrast, if    you use the constants defined in the standard header files,    then when you recompile your program on the new platform,    it will just work correctly—the header files on that platform    will have those constants defined to the correct values.            The standard header file limits.h contains constants    specifically for portability. These constants describe the    maximum and minimum value of various types on the    current platform. For example, if a program needs to know    the maximum and minimum values for an int, it can use    INT_MAX and INT_MIN, respectively. On platforms where an    int is 32 bits, these would be defined like this:    1 #define INT_MAX 2147483647  2 #define INT_MIN -2147483648            On a platform with a different size for the int type, these    would be defined to whatever value is appropriate for the    size of the int on that platform.            Macros can also take arguments; however, these    arguments behave differently from function arguments.    Recall that function calls are evaluated when the program    runs, and the values of the arguments are copied into the    function’s newly created frame. Macros are expanded by the    preprocessor (while the program is being compiled, before it    has even started running), and the arguments are just    expanded textually. In fact, the macro arguments do not    have any declared types, and do not even need to be valid C    expressions—only the text resulting from the expansion    needs to be valid (and well typed).
We could (though as we shall see shortly, we shouldn’t)    define a SQUARE macro as follows:    1 #define SQUARE(x) x * x            The preprocessor would then expand SQUARE(3) to    3 * 3, or SQUARE(45.9) to 45.9 * 45.9. Note that here, we    are using the fact that macro arguments do not have types to    pass it an int in the first case and a double in the second    case. However, what happens if we attempt SQUARE(z - y)?    If this were a function, we would evaluate z - y to a value    and copy it into the stack frame for a call to SQUARE; however,    this is a macro expansion, so the preprocessor works only    with text. It expands the macro by replacing x in the macro    definition with the text z - y, resulting in z - y * z - y. Note    that this will compute z - (y * z) - y, which is not z - y    squared.            We could improve on this macro by defining it like this:    1 #define SQUARE(x) ((x) * (x))      Now, SQUARE(z - y) will expand to ((z - y) * (z - y)),    giving the correct result irrespective of the arguments and    location of the macro. We can still run into problems if the    macro argument has side effects. For example, if we type    SQUARE(f(i)), then the function f will be called twice. If f    prints something, it will be printed twice, once for each call to    f in the macro expansion.            The SQUARE macro is a bit contrived—we would be better    to write the multiplication down where we need it—but    highlights the nature (and dangers) of textual expansion of    macros. Section E.3 contains much more information about    the C preprocessor and macros. They are quite powerful and    can be used for some rather complex things, but you will not    need them for most things you write in this book.            Header files may also contain type declarations. For    example, stdio.h contains a type declaration for a FILE type,
which is used by a variety of functions that manipulate files.  The functions also have their prototypes in stdio.h, and we  will discuss them in Chapter 11 when we learn about  accessing files and reading input.          Another example of type declarations in standard  header files is the integer types in stdint.h. As mentioned in  Chapter 3, integers come in different sizes, and the size of  an int varies from platform to platform. Often programmers  do not care too much about the size of an int, but  sometimes using a specifically-sized int is important.  stdint.h defines types such as int32_t (which is guaranteed  to be a 32-bit signed int on any platform) or uint64_t (which  is always a 64-bit unsigned int).          With all of that in mind, we can revisit our Hello World  program from earlier:            1 #include <stdio.h>          2 #include <stdlib.h>              3            4 int main(void) {          5 printf(\"Hello World\\n\");          6 return EXIT_SUCCESS;          7}          The preprocessor would take this code and basically  transform it into this:            1 int printf(const char * , ...);              2            3 int main(void) {          4 printf(\"Hello World\\n\");          5 return 0;          6}          In actuality, the result of the preprocessor is a few  thousand lines long, since it includes the entire contents of  stdio.h and stdlib.h, which have many other function  prototypes and type declarations. However, these do not  affect our code (the compiler will know those types and  functions exist, but since we do not use them, they do not  matter). We will note that the prototype for printf contains a
few features that we have not seen yet. You can think of  const char * as being the type for a literal string, and the ...  means that printf takes a variable number of arguments—a  feature of C that we will not delve into beyond using it to call  printf and similar functions.    5.1.2 The Actual Compiler    The output of the preprocessor is stored in a temporary file  and passed to the actual compiler. At first, it may seem a bit  confusing that one of the steps of the compilation process is  the actual compiler. However, we often refer to complex  processes by their eponymous step. For example, if you say  “I am going to bake a cake,” even though only part of the  process involves actually baking the cake (cooking the cake  batter in the oven), we still understand that you will go  through the entire process: finding a recipe, getting the  ingredients, mixing the batter, baking the cake, letting it cool,  and icing it.          The compiler reads the pre-processed source code—  which has all the specified files included and all macro  definitions expanded—and translates it into assembly.  Assembly is the lowest-level type of human-readable code.  In assembly, each statement corresponds to one machine  instruction. Machine instructions correspond to very simple  operations, such as adding two numbers or moving a value  to or from memory. While humans can program in assembly  (sometimes with good reason), that is not our focus here—  we just want to understand enough of the compilation  process to know what is going on when we compile our  programs.          The compiler is a rather complex program. Its first task  is to read in your program source and “understand” your  program according to the rules of C—a task called parsing  the program. In order for the compiler to parse your program,  the source code must have the correct syntax (i.e., as we
described in Chapter 2). If your code is not syntactically  correct, the compiler will print an error message and attempt  to continue parsing but may be confused by the earlier  errors. For example, the following code has one error—a  missing semicolon after int x = 3 on line 5:            1 #include <stdio.h>          2 #include <stdlib.h>              3            4 int main(void) {          5 int x = 3          6 int y = 4;          7 printf (\"%d\\n\", x + y);          8 return EXIT_SUCCESS;          9}          However, compiling the code results in two errors:       err1.c: In function ’main’:     err1.c:6:5: error: expected ’,’ or ’;’ before ’int’                int y = 4;              ^     err1.c:7:25: error: ’y’ undeclared (first use in     this function)              printf (\"%d\\n\", x + y);                                                     ^     err1.c:7:25: note: each undeclared identifier is     reported only once     for each function it appears in    The first error (expected ’,’ or ’;’ before ’int’) correctly  identifies the problem with our code—we are missing a  semicolon before the compiler sees int at the start of line 6  (it should go at the end of line 5, but the compiler does not  know anything is wrong until it sees the next word). The next  error (’y’ undeclared) is not actually another problem with  the code, but rather a result of the compiler being confused  by the first error—the missing semicolon caused it to  misunderstand the declaration of y as a part of the previous  statement, so it does not recognize y later in the code.          Compiler errors can be a source of confusion and  frustration for novice programmers. One key rule in dealing
with them is to try to fix them in order, from first to last. If you  find an error message (other than the first one) confusing,  you should fix the first error, then retry compiling the code to  see if the error goes away—it may just be a result of the  compiler being confused by earlier errors.          Dealing with Compilation               Errors: Tip 1        Remember that the compiler can get    confused by earlier errors. If later errors   are confusing, fix the first error, then try to   recompile before you attempt to fix them.          Another source of confusion in dealing with compiler  errors is that the error message may not do a good job of  describing the problem. The compiler tries its best to  describe the problem but may guess incorrectly about what  you were trying to do or refer to unfamiliar possibilities. For  example, if we have a slightly different variation on the  missing semicolon problem from the previous example:            1 #include <stdio.h>          2 #include <stdlib.h>              3            4 int main(void) {          5 int x          6 int y;          7 x = 3;          8 y = 4;          9 printf(\"%d\\n\", x + y);        10 return EXIT_SUCCESS;        11 }          The error here is still a missing semicolon after the  declaration of x on line 5; however, the compiler gives the  following error messages:       err2.c: In function ’main’:     err2.c:5:5: error: ISO C forbids nested functions [-
Werror=pedantic]              int x              ^       err2.c:6:5: error: expected ’=’, ’,’, ’;’, ’asm’ or     ’__attribute__’     before ’int’                int y;              ^     err2.c:7:5: error: ’x’ undeclared (first use in this     function)              x = 3;              ^     err2.c:7:5: note: each undeclared identifier is     reported only once for     each function it appears in     cc1: all warnings being treated as errors    These error messages may appear quite intimidating, since  they refer to several language features we have not seen  (and which are all irrelevant to the problem). The first error  message here is actually completely irrelevant, other than  that it is on the right line number. The compiler first guesses  that we might be trying to use a non-standard language  extension (extra features that one particular compiler allows  in a language) that allows us to write one function inside of  another. This extension is forbidden by this version of the  language standard. The compiler is trying to be helpful, even  though it is wrong. In general, if an error message is  referencing something completely unfamiliar, it is likely that  the compiler is confused or offering suggestions that are not  relevant to what you are trying to do.          Dealing with Compilation               Errors: Tip 2    If parts of an error message are completely  unfamiliar, try to ignore them and see if the  rest of the error message(s) make sense. If  so, try to use the part that makes sense to  understand and fix your error. If not, search
for the confusing parts on Google and see             if any of them are relevant.          Looking at the next error message, we see that it is on  the same line number (6), so the compiler may be trying to  give us more or different information about the same  problem. Here, while there are some unfamiliar things (such  as asm or __attribute__), the rest of the error message  makes sense if we ignore these. That is, we could  understand the error message if it were:       err2.c:6:5: error: expected ’=’, ’,’, or ’;’ before     ’int’    The compiler has told us it expects an equals sign, comma,  or semicolon before we get to the word int on line 6. In this  case, we are missing a semicolon, but the compiler is  suggesting other possibilities (e.g., we could have had an =  had we been writing int x = 3;). The asm and __attribute__  suggestions it provided could also have been options, but  these are very advanced language features, which we will  not cover.          Another source of confusion in dealing with error  messages can arise from the fact that the compiler may not  notice the error immediately. As long as the source code  seems to obey the rules of C’s syntax, the compiler assumes  everything is correct, even if its parse does not agree with  what you meant. In the following code, the actual error is  omission of a close curly brace on line 9:            1 #include <stdio.h>          2 #include <stdlib.h>              3            4 int main(void) {          5 for (int i = 0; i < 10; i++) {          6 for (int j = 0; j < 10; j++) {          7 printf(\"%d\\n\", (i + 3) * (j + 4));          8}              9          10 return EXIT_SUCCESS;        11 }
However, the compiler does not recognize the error  immediately. Having a return statement inside the for loop is  syntactically valid. The closing brace after it on line 11—  which the programmer intended to be the end of main is also  legal, but the compiler parses it as the end of the first for  loop. The compiler then discovers that something is not right  when it reaches the end of the file and the body of main has  not ended, reporting the error as:       err3.c: In function ’main’:     err3.c:11:1: error: expected declaration or     statement at end of input         }       ^          Dealing with Compilation               Errors: Tip 3       Programmer’s editors are very good at    helping you find mismatched braces and     parentheses. Most will indent the code  according to the nesting level of the braces   it is inside and will show you the matching   brace/parenthesis when your cursor is on         one of the pair. Using these sorts of     features can be the easiest way to find         errors from mismatched braces and                      parentheses.          Once the compiler has parsed your program, it type-  checks the program. Type-checking the program involves  determining the type of every expression (including the sub-  expressions inside of it) and making sure that they are  compatible with the ways they are being used. It also  involves checking that functions have the right number and  type of arguments passed to them. The compiler will  produce error messages for any problems it discovers during  this process.
Another important note about fixing compiler errors is  that sometimes fixing one error will let the compiler progress  further along, resulting in more errors. Novice programmers  who are not confident in their correction of their first error  may undo the fix, thinking they made things worse, even  though the fix may have been correct—the extra errors just  came from progressing further through the process.          Dealing with Compilation               Errors: Tip 4     Be confident in your fix for an error. If you   do not understand what is wrong and how      to fix it, find out and be sure rather than             randomly changing things.          Both Appendix F and the web page  http://aop.cs.cornell.edu/errors/index.html provide more  details on a variety of error messages and will likely prove  useful in helping you diagnose errors you may encounter.          Once the compiler finishes checking your code for  errors, it will translate it into assembly—the individual  machine instructions required to do what your program says.  You can ask the compiler to optimize your code—transform it  to make the resulting assembly run faster—by specifying the  -O option followed by the level of optimization that you want.  Programs are usually compiled with no optimizations for  testing and debugging (the code transformation makes the  program very difficult to debug) and then re-compiled with  optimizations once they are ready for release/use. Typically,  real programs compiled with GCC are compiled with -O3  when they are optimized. GCC provides a variety of other  options to control optimization at a finer level of detail;  however, they are beyond the scope of this book. We are  strictly concerned with writing code that works, so you do not  need to bother with the -O flag for any of the exercises you  do in this book.
5.1.3 Assembling    The next step is to take the assembly that the compiler  generated and assemble it into an object file. GCC invokes  the assembler to translate the assembly instructions from the  textual/human-readable format into their numerical  encodings that the processor can understand and execute.  This translation is another example of “Everything Is a  Number”—even the instructions that the computer executes  are numbers.          While it is possible to get error messages at this stage, it  should not happen for anything you will try to do in this book.  Generally errors here are limited to cases in which you  explicitly write the specific assembly-level instructions that  you want into your program (which is possible in C, but  limited to very advanced situations) and make errors in  those.          The important thing to understand about this step is that  it results in an object file. The object file contains the  machine-executable instructions for the source file that you  compiled but is not yet a complete program. The object file  may reference functions that it does not define (such as  those in the C library or those written in other files). You can  request that GCC stop after it assembles an object file by  specifying the -c option. By default, the name of the object  file will be the name of the .c file with the .c replaced by .o.  For example, gcc -c xyz.c will compile xyz.c into xyz.o. If  you wish to provide a different name for the object file, use  the -o option followed by the name you want. For example,  gcc -c xyz.c -o awesomeName.o will produce an object file  called awesomeName.o.          This ability to stop is important for large programs where  you will split the code into multiple different C files. Splitting  the program into large files is primarily useful to help you  keep the code split into manageable pieces. However, each  source file can be individually compiled to an object file, and
those object files can then be linked together (as we will  discuss in the next section). If you change code in one file,  you can recompile only that file (to generate a new object file  for it) and re-link the program without recompiling any of the  other source files. For programs you write in this class, this  will not make a difference. However, when you write real  programs, you may have tens or hundreds of thousands of  lines of code split across dozens or hundreds of files. In this  case, the difference between recompiling one file and  recompiling all files (especially if optimizations are enabled)  may be the difference between tens of seconds and tens of  minutes.    5.1.4 Linking    The final step of the process is to link the program. Linking  the program takes one or more object files and combines  them together with various libraries, as well as some startup  code, and produces the actual executable binary. The object  files refer to functions by name, and the linker’s job is to  resolve these references—finding the matching definition. If  the linker cannot find the definition for a name (called a  “symbol”) that is used, or if the same symbol is defined  multiple times, it will report an error.          Linker errors—indicated by the fact that they are  reported by ld (the linker’s name)—are typically less  common than other compiler errors. If you encounter an  unresolved symbol error, it means that you either did not  define the symbol, did not include the object file that defines  the symbol in the link, or that the symbol was specified as  only visible inside that object file (which can be done by  using the static keyword—a feature we will not delve into).  If you encounter errors from duplicate definitions of a  symbol, first make sure you did not try to name two different  functions with the same name. Next, make sure you did not  include any files twice on the compilation command line.  Finally, make sure you do not #include a .c file—only header
files—and that you only include function prototypes in the  header file, not the function’s definition (there are some  advanced exceptions to these rules, but if you are at that  stage, you should understand the linking process and static  keyword well enough to fix any problems).          Sometimes you may wish to use an external library—a  collection of functions that are already written and packaged  up to use for some purpose (e.g., drawing graphics, playing  sound, advanced math, or many other purposes). The C  library is one such example, which is linked in by default. For  other libraries, you must specifically request that the linker  link with the library with the -l command line option,  specifying the name of the library right after the l. If all goes  well, the linker will resolve the all the symbol references and  combine the various object files and libraries together into an  executable binary.    5.1.5 Building Large Programs: make    The programs you will write as exercises in this book will be  relatively small—at most a hundred lines and a couple of  files. For programs of this size, recompiling the entire  program takes a few seconds. Real programs tend to be  quite a bit larger. As the program size increases, so does the  compilation time. As an example, one program changed,  compiled, and run frequently by one of the authors has about  40,000 lines of code, 90 files, and takes about 75 seconds to  compile completely.          While 75 seconds may not sound long, it is practically  an eternity in the typical development cycle. Programmers  recompile their code dozens to hundreds of times per day,  meaning that all of those minutes add up—waiting 75  seconds 50 times in a day adds up to a bit more than an  hour, or an eighth of a standard work day. This large of an  overhead in the development cycle would be prohibitively
large (despite its ability to provide convenient excuses for  slacking off: e.g., http://xkcd.com/303/).          Fortunately, most of the time, one does not need to  recompile all of the source code if the object (.o) files from  previous compilations are kept. Instead, only the file (or files)  that were changed need to be recompiled, then the program  needs to be linked again. As mentioned in the previous  sections, compiling each source file to an object file is  accomplished with the -c option, and the various object files  can then be listed on the command line to GCC in order to  pass them to the linker. For comparison, recompiling one  source file then relinking the same program described above  takes about one second—much faster than recompiling the  whole thing.          However, orchestrating this process manually is tedious  and error prone. If the programmer forgets to recompile a  file, then the program will not work correctly, possibly in  strange ways. Doing this manually not only requires the  programmer to remember all files that were changed, but  also which files included with #include were changed. For  example, if the programmer changes myHeader.h, then any  file that #includes myHeader.h must also be recompiled, as  the source files’ contents have effectively changed.          Long ago, programmers developed the Make utility to  not only automate this process, but also to simplify compiling  programs in general. The make command reads a file called  Makefile (though you can ask it to read an input file by a  different name), which specifies how to compile your  program. Specifically, it names the targets that can be made,  their dependencies, and the rules to make the target.          When make is run, it starts from a particular target—  something that it can build out of other things. A common  starting target might be your entire program. Make first  checks if the target is up-to-date. Checking that a target is  up-to-date requires checking that all of the files the target
depends on are themselves up-to-date and that none of  them are newer than the target. For example, the target for  the whole program may depended on many object files. If  any such file is not itself up-to-date (for example, the object  file may depend on a .c file that just got changed), it is  rebuilt first. Make then follows whatever rule was specified  for the target to build it from its dependencies. If the target is  already up to date, then it does nothing.          Using Make to compile the program is also useful when  the command line to compile the program becomes  complex, especially when other people may need to do the  compilation. Large complicated programs may require linking  with several libraries, as well as a variety of other command-  line options. Trying to remember all of these and type them  in every time you compile is painful. Even worse, many real  programs need to be compiled by other people—whether it  is multiple members of a large development team or people  who you distribute the program to. Expecting these other  people to figure out the commands required to make your  program leads to much frustration for them. Providing a  Makefile allows anyone to build the program simply by  typing make.          Appendix D explains Make in more detail: how to write a  Makefile and a few advanced features.    5.2 Running Your Program    Now that you have compiled your program, you are ready to  run it. You can run your program much like any other  program—by typing its name at the command prompt.  However, unlike many programs you have used so far, the  directory in which your program is located is not part of the  PATH—the environment variable that specifies which  directories to look in to run a program. Therefore, you have  to specify which directory to find the program in as part of  the name. Since the program is in the current directory, you
can just put ./ before the program’s name (for  example./myProgram)—telling the command shell to run the  myProgram program in the current directory. If you are  unfamiliar with the command shell, directories, path names,  or environment variables, you can read more about them in  Appendix B.          You can also run your program from inside various tools  that are intended to help you test and debug the program.  Two incredibly useful tools are GDB (the GNU Debugger)  and Valgrind (pronounced “val-grinned” with a short ‘i’). For  more information about GDB, we will discuss debugging in  Chapter 6 and GDB in particular in Section D.2. Valgrind  emulates the computer but tracks more information about  what your program is doing to report errors that may  otherwise go undetected (see Section D.3).          Valgrind is particularly good at finding errors in your  program that did not manifest simply because you got lucky  when you ran it. For example, recall from Chapter 2 that a  variable does not have a value until you assign it one—we  just draw a ? in its box to indicate the value is unknown. If  your program has a variable that is used before it is  assigned any value—called an uninitialized variable—the  variable will evaluate to whatever value was previously  stored into the location that the variable resides in. You may  get “lucky” on what value the uninitialized variable ends up  holding. In fact, you may get “lucky” thousands of times and  not see any problem but then get unlucky when it actually  matters. This may seem highly improbable—after all, with  billions of possible values that the variable could end up  being, what are your chances of getting lucky? However, the  value that you end up with is not random, it is whatever was  in that storage location previously—for example, the value  that some other variable had when it went out of scope.          When you run your program directly on the computer, it  does not explicitly track whether a variable has been  initialized—the compiler generates instructions that do
exactly what your program specifies, and the computer does  exactly what those instructions tell it to. When you run your  program in Valgrind, however, Valgrind explicitly tracks which  variables are initialized and which are not. If your program  uses the uninitialized value in certain ways, Valgrind will  report an error to you (it does not do this in all cases for a  variety of reasons beyond the scope of this discussion—  however, suffice it to say that it will catch the cases where it  really matters).          Valgrind is useful for detecting many other problems  with your program that you might not otherwise discover  easily. We highly recommend running your program in  Valgrind whenever you are testing your program. We will talk  about testing in much more detail in Chapter 6, but for now,  you will want to run your program to see if it works. When we  learn how to make programs that work with various forms of  input, you will want to test your program on a variety of  inputs to become more and more confident that it works.              Video 5.1: Writing, compiling, and running the                           “Hello World” program.          Video 5.1 ties everything we have talked about so far in  this chapter with the “Hello World” example. The video  shows writing the code, compiling it, and then running it  directly as well as in Valgrind. In the video, we use some  compiler options, which we recommend you use in general,  and will discuss momentarily.    5.3 Compiler Options    Usually, you will want to name the resulting program  something meaningful, rather than the default a.out. To  change the output file name, use the -o option, and specify  the output file name after it. For example, gcc -o myProgram  myProgram.c would compile myProgram.c (as above), but
instead of producing a.out, it would name the program  myProgram.          Another option you will commonly want to use is --  std=gnu99. This option specifies that the compiler should use  the C99 standard with GNU extensions. There are actually a  few different versions of C (referred to as “standards”  because they reflect a standardization of features). C99 will  match what we describe in this book, and is generally a  reasonable standard to program in.          Another useful pair of options are -Wall and -Werror.  The first of these requests that the compiler issue warnings  for a wide range of questionable behavior. As we discussed  in Section 5.1.2, the compiler checks your program for  certain kinds of errors. If it detects errors, it reports them to  you and requires you to fix them before proceeding.  Warnings are like errors in that the compiler will report a  problem that it has detected; however, unlike errors, the  compiler will continue and produce a program even if it  warned you about something. The -Werror option tells the  compiler to treat all warnings as errors—making it refuse to  compile the program until the programmer fixes all the  warnings.          Novice programmers often wonder why one would want  to enable warnings to begin with, and especially to ask the  compiler to convert warnings to errors. Errors prevent your  program from compiling, so it seems like any way to avoid  them and get to the compiled binary is a good thing.  However, a programmer’s goal is not just to produce any  executable program, but one that works correctly at hand.  Any time the compiler can alert you to potential problems,  you can fix them much more easily than if they lead to  problems you have to debug. A good analogy is purchasing  a house. Think of the compiler as your inspector for the  house. While you are eager to purchase and move into your  new house, you would much rather have the inspector
discover a potential problem before you buy the house and  move in.          We strongly recommend compiling with at least the  following warning options:       -Wall -Wsign-compare -Wwrite-strings -Wtype-limits -     Werror    These options will help catch a lot of mistakes, and should  not pose an undue burden on correctly written code.          Recent versions of GCC also support an option -  fsanitize=address that will generate code that includes extra  checking to help detect a variety of problems at runtime.  Using this option is also strongly recommended during your  development cycle. However, we will note that you typically  cannot run a program built with this option in Valgrind (see  Section D.3 for information about Valgrind—we strongly  encourage you use it). The two tools detect different, but  overlapping sets of problems, so use of both is a good idea  —they just have to be used separately.    5.4 Other Possibilities    The process of compiling and running a program that we  have described here is not the only way to execute a  program, although it is fairly typical for C and C++ programs.  An alternative to compilation is to interpret the program.  Instead of translating the program into machine instructions,  the source code is given to an interpreter—a program that  reads the source of another program and performs the  actions specified in it. Interpreters and compilers have much  in common—they must both read and analyze the source  code of a program—the key difference is what they do with  the program. A compiler translates it into an executable the  runs directly on the computer. An interpreter executes the  effects of the program. Some interpreters allow for
interaction via a read-eval-print-loop (REPL)—that is, you  can type a bit of source code, which the interpreter reads as  input (read); it then evaluates it (eval), prints the result  (print), and repeats the process (loop).          Many people will say that a language is compiled or  interpreted (e.g., C is compiled, but Python is interpreted),  which is rather imprecise. Any programming language could  have a compiler and/or interpreter written for it. A more  precise version of this statement would be that a particular  language is typically compiled, or another particular  language is typically interpreted. C is typically compiled;  however, there exist interpreters for it. Likewise, as we  discuss in Chapter 32, Python is usually interpreted,  although there are also compilers for it.          In some cases, the two can be mixed. Typically Java is  compiled into instructions for the Java Virtual Machine (JVM)  —an instruction set typically exectued by an interpreter, not  directly by a computer. Most JVMs perform just-in-time (JIT)  compilation, meaning that as they interpret the program, they  also compile the program into directly executable code in  their own memory (remember: “Everything Is a Number,”  including computer instructions), which it can then execute  the next time it encounters the same piece of code. JIT  provides speed advantages over purely interpreting the  code.    5.5 Practice Exercises    Selected questions have links to answers in the back of the  book.            • Question 5.1 : Write, compile, and run a program that           prints out your name (followed by a newline).          • Question 5.2 : Write, compile, and run a program that           prints out all of the numbers from 1 to 1000 (inclusive,           each on their own line).
• Question 5.3 : Write, compile, and run a program     that, for each number from 1 to 1000 (inclusive),     prints out that number, a colon, a space, then the     square of that number, one per line. The first line of     your output should be 1: 1, the second line should be   2: 4, and so on through 1000: 100000.  • Question 5.4 : Take your isPow2 function from   Question 4.8 in Chapter 4, and write it in a file called     powers.c. Write a main function that prints out all of the   powers of 2 less than 20,000,000 (20 million).     Compile and run your program.    • Question 5.5 : Take your printBinary function to print   a number in binary from Question 4.8 in Chapter 4,     and write it in a file called printBinary.c. Write a main   function that uses the printBinary function to print the   binary representation of all of the numbers from 0 to     1000 (inclusive). Compile and run your program.    • Question 5.6 : Take your printFactors function to   print the prime factors from Question 4.8 in Chapter 4,     and write it in a file called printFactors.c. Write a   main function that uses the printFactors function to   print the prime factors of all of the numbers from 2 to     1000 (inclusive). Compile and run your program.    • Question 5.7 : Take your power function to compute          from Question 4.8 in Chapter 4, and write it in a     file called power.c. Write a main function that uses the     power function to print a table of for all from   between 0 and 5 (inclusive) and all between 0 and     16 (inclusive). Why do the results for and   look strange (hint: think back to Chapter 3)?  • Question 5.8 : Find a type that would allow your    power function to work correctly on  and , and    change your code for the previous problem to use this    type. You will probably need to look at the man page    for printf to find the correct format specifier to print    the result.    • Question 5.9 : Run Valgrind on all of the programs    you have written for the other questions in this
chapter. If it identifies any problems with your code, fix           them.    4 Writing Code6 Fixing Your Code: Testing and Debugging                                                          Generated on Thu Jun 27 15:08:37 2019 by L T XML
I Introduction to Programming in C5 Compiling and Running7  Recursion
Chapter 6    Fixing Your Code: Testing  and Debugging    Once you have written and compiled your code, you are  ready to test and debug it. These two steps are closely  related, but distinct. Testing is the process of trying to find  problems (“bugs”) in your code, and debugging is the  process of fixing those bugs.          Both of these skills are crucial to good programming  —what good is writing code if it does not work correctly?  Testing your code identifies the problems you need to fix  and then gives you increasingly more confidence that the  code works as you expect it to. Once you have found a  problem, debugging is the key skill to let you fix the  problem in a disciplined manner, rather than just being  frustrated by randomly changing things to no avail.          Many novice programmers (and even many  experienced programmers) underestimate the time  required for testing. Proper testing requires not only  testing on a variety of cases, but also analyzing your  code and thinking carefully about what test cases you  need to write. Unfortunately, testing is easy to shirk—  when you are under time pressure, it may seem  worthwhile to just dash together a couple test cases, then  move on. However, in large software projects (or even  modest-sized ones), “moving on” will typically involve  writing code that uses the functions you just wrote. If the  functions you wrote are broken, then your next work
starts from a poor foundation, and you will just spend  longer trying to make it work.          Debugging is another area where many  programmers underestimate the time required. It is often  hard to say how long it will take to find one particular bug,  as you often do not know how complex the problem is  until you have solved it. To make matters worse,  inexperienced programmers will often rush ahead—trying  to debug quickly rather than in a disciplined manner.  Such activities follow the old saying “haste makes waste.”  A programmer may try to make changes in the hopes that  it will fix a problem, without investing the time to fully  understand the problem and devise a correct solution.    6.1 Step 6: Test Your Code    Testing is all about finding bugs in your code. A good test  case is one that identifies a problem. While this may  seem unintuitive, consider the analogy of going to the  doctor: while it may be great to be told you are healthy, if  there is something wrong with you, the test that identifies  that problem (so that the doctor can fix it) is the most  important test. Similarly, with your program, you want to  identify and fix the problem rather than think it works  when it does not. In the case of a class assignment, you  can think of this as preferring to find and fix a problem  before you turn the assignment in, rather than think it  works and receive a poor grade.          In “the real world,” the consequences become more  serious: bad code kills people. As our world becomes  more and more automated—software controls critical  flight systems on aircraft, cars are beginning to drive  themselves, etc.—it is easy to imagine how software  bugs can cause fatal catastrophes, but you do not even  need to imagine these possibilities, as there have already
been people killed by software bugs. Examples include  the Therac-25 (a radiation therapy machine whose  software bugs overdosed and killed patients), the Patriot  Missile defense system (a software bug caused the  system to fail intercept a Scud missile in the Gulf War: 28  people died, and many more were injured), and the crash  of a Spanish military transport plane (the software to  control the engines malfunctioned and prevented three of  the four engines from delivering enough power to keep  the plane in the air; four people died, and two were  injured).          Even when software bugs are not fatal to humans,  they can cause other types of catastrophes: a $400  million trading error, exploding rockets, a phone system  crash, a blackout of most of the east coast, and the Mars  Climate Orbiter crash were all the result of bugs in  software. The Heartbleed bug presented the possibility of  hackers gaining information from and access to affected  systems (https://xkcd.com/1354/ provides a nice high-  level explanation of the problem. Once you finish the next  few chapters, you could understand a more technical  explanation).          One of the keys to testing your code well—both  thoroughly and in a manageable fashion—is to test  incrementally. Write one function, then test it well before  moving on. Ideally, you should be completely confident  that a function you wrote works before you use it.          Note that while we discussed a top-down design  methodology in Chapter 4, incremental testing is naturally  a bottom-up approach. These two techniques do not  conflict with each other—if you write code for function A,  and end up needing to write functions B and C as part of  it, you should test B and C before testing A.
6.1.1 Black Box Testing    The testing methodology that most people think of first is  black box testing. In black box testing, the tester  considers only the expected behavior of the function—not  any implementation details—to devise test cases. The  lack of access to the implementation details is where this  testing methodology gets its name—the function’s  implementation is treated as a “black box,” which the  tester cannot look inside.          Black box testing does not mean that the tester  thinks of a few cases in an ad hoc manner and calls it a  day. Instead, the tester—whose goal is to craft test cases  that are likely to expose errors—should contemplate what  cases are likely to be error-prone from the way the  function behaves. For example, suppose you needed to  test a function to sum a list of integers. Without seeing  the code, what test cases might you come up with?          We might start with a simple test case to test the  basic functionality of the code—for example, seeing that  the function gives an answer of 15 when given an input of  {1, 2, 3, 4, 5}. However, we should also devise more  difficult test cases, particularly those that test corner  cases—inputs that require specific behavior unlike that of  other cases. In this example, we might test with the  empty list (that is, the list with no numbers in it) and see  that we get 0 as our answer. We might make sure to test  with a list that has negative numbers in it or one with  many very large numbers in it. Note that Practice  Exercise 6.3 asks you to contemplate what sorts of errors  these test cases might expose.          Observe that we were able to contemplate good test  cases for our hypothetical problem without going through  Steps 1–5. You can actually come up with a set of black
box tests for a problem before you start on it. Some  programmers advocate a test-first development  approach. One advantage is that if you have a  comprehensive test suite written before you start, you are  unlikely to skimp on testing after you implement your  code. Another advantage is that by thinking about your  corner cases in advance, you are less likely to make  mistakes in developing and implementing your algorithm.          Writing good tests cases requires a lot of thought:  you need to think about a wide variety of things that can  go wrong. Here are some suggestions to guide your  thinking, as well as important lessons. First, some  thoughts for testing error cases:            • Make sure your tests cover every error case.           Think about all the inputs the program cannot           handle, i.e., ones where you would expect to           receive an error message. If the program requires           a number, give it xyz, which is not a valid number.           An even better test case might be 123xyz, which           starts out as a number but isn’t entirely a number.           How the program should handle this depends on           the requirements of the program, but you want to           make sure it handles it correctly.          • Be sure to test “too many” as well as “too few.” If a           program requires exactly things, test it with at           least one case greater than and at least one           case with fewer than . For example, if a           program requires a line of input with exactly 10           characters, test it with 9 and with 11.          • Any given test case can only test one “error           message and exit” condition. This means that if           you want to test two different error conditions, you           need two different test cases: one for each error           condition. Suppose that our program requires a           three-letter string of lowercase letters. We might
be tempted to test with aBcD to test two things at           once, but this will only test one of them (and           believing you tested both is problematic!) To see           why this rule exists, think about the algorithm to           check for these errors:                   Check if the input string has exactly                 three letters.                 If it does not, then:                      print \"Error: the input needs to be                 three letters\"                      exit the program                 Check if the input string is made up                 entirely of lowercase letters.                 If it is not then:                      print \"Error: the input needs to be all                 lowercase letters\"                      exit the program             Violating one input condition will cause the           program to exit! This is also true of other situations           where the program rejects the input, even if does           not exit.          • Test exactly at the boundary of validity. If a           program requires between 7 and 18 things, you           should have test cases with 6, 7, 8, 17, 18, and 19           things. You need to make sure that 6 and 19 are           rejected while 7, 8, 17, and 18 are accepted.           Testing exactly at the boundaries is important           because of the common “off by one” mistake—           maybe the programmer wrote < when she should           have written <=, or >= when she should have           written >, or something similar. If you test with           values that are right at the boundary, you will find           these mistakes.          However, testing is not just about checking error  handling. You want to make sure the algorithm correctly  handles valid inputs too. Here are some suggestions:
• Think carefully about whether or not there are any   special cases where one particular input value (or   set of values) has to be treated unusually. For   example, in poker, an Ace is usually ranked the   highest; however, it can have the lowest ranking in   an “Ace low straight” (5 4 3 2 A). If you are testing   code related to poker hands, you would want to   explicitly test this case, since it requires treating an   input value differently from normal.  • Think carefully about the requirements, and   consider whether something could be   misinterpreted, easily mis-implemented, or have   variations that could seem correct. Suppose your   algorithm works with sequences of decreasing   numbers. You should test with a sequence like 7 6   6 5 4, which has two equal numbers in it. Checking   equal numbers is a good idea here, since people   might have misunderstood whether the sequence   is strictly decreasing (equal numbers don’t count   as continuing to decrease) or non-increasing   (equal numbers do count).  • Think about types. What would happen if the   programmer used the wrong type in a particular   place? This could mean that the programmer used   a type that was too small to hold the required   answer (such as a 32-bit integer when a 64-bit   integer is required), used an integer type when a   floating point type is required, or used a type of the   wrong signedness (signed when unsigned is   required or vice versa).  • Consider any kind of off-by-one error that the   programmer might have been able to make. Does   the algorithm seem like it could involve counting?   What if the programmer was off by one at either   end of the range she counted over? Does it   involve numerical comparison? What if < and <=   (or > and >=) were mixed up?
• Whenever you have a particular type of problem   in mind, think about how that mistake would affect   the answer relative to the correct behavior, and   make sure they are different. For example,   suppose you are writing a program that takes two   sequences of integers and determine which one   has more even numbers in it. You have considered   that the programmer might have an off-by-one   error where he accidentally misses the last   element of the sequence. Would this be a good   test case?         Sequence 1: 2 3 5 6 9 8       Sequence 2: 1 4 2 8 7 6     This would not be a good test case for this   particular problem. If the program is correct, it will   answer “Sequence 2” (which has four, compared   to three). However, if the algorithm mistakenly   skips the last element, it will still answer   “Sequence 2” (because it will count three elements   in Sequence 2, and two elements in Sequence 1).   A good test case to cover this off-by-one-error   would be:         Sequence 1: 2 3 5 6 9 8       Sequence 2: 1 4 2 8 7 3     Now a correct program will report a tie (three vs.   three) and a program with this particular bug will   report Sequence 2 as having more even numbers.  • Consider all major categories of inputs, and be   sure you cover them.           For numerical inputs, these would generally   be negative, zero, and positive. One is also usually   a good number to be sure you cover.
For sequences of data, your tests should   cover an empty sequence, a single element   sequence, and a sequence with many elements.           For characters, use lowercase letters,   uppercase letters, digits, punctuation, spaces,   non-printable characters in your test cases.           For many algorithms, there are problem-   specific categories you should consider. For   example, if you are testing a function related to   prime numbers (e.g., isPrime), then you should   consider prime and composite (not prime)   numbers as input categories to cover.           When you combine two ways to categorize   data, cover all the combinations. For example, if   you have a sequence of numbers, you should test   with an empty list, a one-element sequence with   zero, a one-element sequence with a negative   number, a one-element sequence with a positive   number, and have each of negative, zero, and   positive numbers appearing in your many-element   sequences.    • An important corollary of the previous rules is that   if your algorithm gives a set of answers where you   can list all possible ones (true/false, values from   an enum, a string from a particular set, etc.), then   your test cases should ensure that you get every   answer at least once. Furthermore, if there are   other conditions you think are important, you   should be sure that you get all possible answers   for each of these conditions. For example, if you   are getting a yes/no answer, for a numeric input,   you should test with a negative number that gives   yes, a negative number that gives no, a positive   number that gives yes, a positive number that
gives no, and zero (zero, being only one input, will              have one answer).             All of this advice is a great starting point, but the     most important thing for testing is to think carefully—     imagine all the things that could go wrong, think carefully     about how to test them, and make sure your test cases     are actually testing what you think they are testing.      6.1.2 White Box Testing       Another testing methodology is white box testing. Unlike     black box testing, white box testing involves examining     the code to devise test cases. One consideration in white     box tests is test coverage—a description of how well your     test cases cover the different behaviors of your code.             Note that while white box and black box testing are     different, they are not mutually exclusive, but rather     complimentary. One might start by forming a set of black     box test cases, implement the algorithm, then create     more test cases to reach a desired level of test coverage.             We will discuss three levels of test coverage:     statement coverage, decision coverage, and path     coverage. To illustrate each of these types of coverage,     we will use the (contrived) code example:     1 int aFunction(int a, int b, int c) {   2 int answer = 0;   3 if (a < b) {   4 answer = b - a;   5}   6 if (b < c) {   7 answer = answer * (b - c);   8}   9 else {  10 answer = answer + 42;  11 }
12 return answer;  13 }             Statement coverage means that every statement in     the function is executed. In our example, if we were to     use only one test with a = 1, b = 2, and c = 3, we would     not have statement coverage. Specifically, this test case     (by itself) does not execute line 10 of the code (the body     of the else). Our lack of coverage indicates we have not     adequately tested our function—it has behaviors our test     case has not exercised.             We can then figure out more test cases to add to our     test suite to improve our testing until the desired     coverage level is met. For example, we might add the     test case a = 1, b = 3, and c = 2 to our test suite to test     the else case. The two test cases together provide     statement coverage of this function.             Statement coverage is a minimal starting point, so if     we want to test our code well, we likely want a stronger     coverage metric. To see how statement coverage falls     short, notice that we have not tested any cases where the     a < b test evaluates to false—that is, a case where we do     not enter the body of the if statement (line 4).             A stronger level of coverage is decision coverage—     in which all possible outcomes of decisions are     exercised. For boolean tests, this means we construct a     test case where the expression evaluates to true and     another where it evaluates to false. If we have a     switch/case statement, we must construct at least one     test case per choice in the case statement, plus one for     the default case, even if it is implicit—that is, if we do not     write it down, it behaves as if we wrote default: break;.             To understand decision coverage more fully, it helps     to visualize it. In order to visualize decision coverage, we
need to understand the concept of a control flow graph  (CFG). A control flow graph is a directed graph (in the  mathematical sense) whose nodes are basic blocks  (boxes) and whose edges (arrows) represent the possible  ways the control flow arrow can move from one basic  block to the next. A basic block is a contiguous sequence  of statements that must be executed in an “all-or-nothing”  manner; the execution arrow cannot jump into or out of  the middle of a basic block. (We discuss graphs in much  more detail in Chapter 25.)           Figure 6.1: The control flow graph for our example code.          Figure 6.1 shows the control flow graph for our  example function (aFunction) from above. The very first  basic block has a lone arrow coming into it, indicating that  it is the start—the execution arrow can only reach it by  calling the function. This first basic block has two  statements: the declaration/initialization of answer and the  if statement, which tests if a < b. The basic block ends  here because the execution arrow could go two places—  indicated by the two edges leaving this block. The final  block has no arrows going out of it; when the execution  arrow moves to the end of this block, there is nowhere  else to go within this function. At this point, the function is  done and returns its answer. Note that we draw the CFG
for one function at a time. We could also draw how the  execution arrow moves between functions, which is  called a call graph.          Decision coverage corresponds to having a suite of  test cases that covers every edge in the graph. In our  graph, if a < b is true, we will take the left edge coming  out of the first block (corresponding to going into the  “then” clause of the if/else). If it is false, we will take the  right edge, which skips over this clause, and as there is  no else clause, goes to the next if statement. This  second edge—representing the path of our execution  arrow when a < b is false—is highlighted in red in  Figure 6.1, since it represents the shortcomings of our  test cases so far with respect to decision coverage.          We can obtain decision coverage on this function by  adding a test case where the execution arrow traverses  this edge of the control flow graph—or put a different way,  where the a < b decision evaluates to false. An example  of this test case would be a = 5, b = 2, and c = 1.     Figure 6.2: The four possible paths through our example CFG.     Our tests cover three out of the four, but the red path has not been   tested.
An even stronger type of test coverage is path  coverage. To achieve path coverage, our test cases must  span all possible valid paths through the control flow  graph (following the direction of the arrows). Figure 6.2  shows the four paths through our example control flow  graph and color codes them based on which of our test  cases cover them. Note that there is one path (shown in  red) that we have not yet covered. This path corresponds  to the case where a < b is false, but b < c is true. Our  test cases that gave us decision coverage tested each of  these conditions separately, but none of the tests tested  both of these together at the same time—where our  execution would follow the red path. We can add another  test case (e.g., a = 5, b = 2, and c = 3) to gain path  coverage.          At this point you may be wondering “Why do we  have all of these levels of test coverage?” and “How do I  pick the right level of coverage for what I am doing?”  Software engineers define and discuss multiple levels of  test coverage because the higher levels of coverage give  us more confidence in the correctness of our code but at  the cost of more test cases. While our example only  requires two, three, and four test cases for the three  levels of coverage we discussed, it is a simple (and  contrived) piece of code. The number of paths through  the control flow graph is exponential in the number of  conditional choices—if we have six if/else statements  one after the other, then there are 64 possible paths  through the control flow graph. If we increase the number  of if/else statements from six to 16, then the number of  paths grows to 65536—quite a lot!          So how do you pick the right level of test coverage?  As with many things, the answer is “it depends.”1 The first  aspect of this decision is “how confident do you need to  be in your code?” If you are doing preliminary testing of
your algorithm by hand in Step 4 (and will test the  implemented algorithm once you have translated it to  code), then statement coverage is a reasonable choice.  Here, testing more cases is quite time consuming (you  are doing them by hand), and you will do more test cases  later (once it is translated to code and compiled, where  you can have the computer run the tests).          By contrast, if you are testing a piece of critical  software that will be deployed when you finish testing,  only achieving statement coverage would be woefully  insufficient. Here, you would likely want to aim for more  than just the minimum to achieve path coverage.  Unfortunately, no amount of testing can ensure the code  is correct—it can only increase our confidence that it is. If  you require absolute certainty that the code is correct,  you must formally prove it (which is beyond the scope of  this book).    6.1.3 Generating Tests    One difficulty with testing arises from the fact that you  want to test the cases you did not think of—but if you do  not think of these cases, then how do you know to write  tests for them? One approach to such a problem is to  generate the test cases according to some algorithm. If  the function we are testing takes a single integer as input,  we might iterate over some large range (say -1,000,000  to 1,000,000) and use each integer in that range as a test  case.          Another possibility is to generate the test cases  (pseudo-)randomly (called, unsurprisingly, random  testing). Note that pseudo-random2 means the numbers  look random (no “obvious” pattern) to a human but are  generated by an algorithm that will produce the same
                                
                                
                                Search
                            
                            Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
- 531
- 532
- 533
- 534
- 535
- 536
- 537
- 538
- 539
- 540
- 541
- 542
- 543
- 544
- 545
- 546
- 547
- 548
- 549
- 550
- 551
- 552
- 553
- 554
- 555
- 556
- 557
- 558
- 559
- 560
- 561
- 562
- 563
- 564
- 565
- 566
- 567
- 568
- 569
- 570
- 571
- 572
- 573
- 574
- 575
- 576
- 577
- 578
- 579
- 580
- 581
- 582
- 583
- 584
- 585
- 586
- 587
- 588
- 589
- 590
- 1 - 50
- 51 - 100
- 101 - 150
- 151 - 200
- 201 - 250
- 251 - 300
- 301 - 350
- 351 - 400
- 401 - 450
- 451 - 500
- 501 - 550
- 551 - 590
Pages:
                                             
                    