Home Explore All of Programming [ PART I ]

All of Programming [ PART I ]

Published by Willington Island, 2021-09-02 03:28:09

Description: All of Programming provides a platform for instructors to design courses which properly place their focus on the core fundamentals of programming, or to let a motivated student learn these skills independently. A student who masters the material in this book will not just be a competent C programmer, but also a competent programmer. We teach students how to solve programming problems with a 7-step approach centered on thinking about how to develop an algorithm. We also teach students to deeply understand how the code works by teaching students how to execute the code by hand.

Read the Text Version

Pages:

statements (you did make that quick reference sheet we recommended in Chapter 2, right?), but you should quickly become familiar with them. If you find yourself struggling with this translation, it likely either means that your description of your algorithm is too vague (in which case, you need to go back to it, think about what precisely you meant, and refine it), or that the pieces of your algorithm are complex, and you are getting hung up on them, rather than calling a function (as described above) to do that piece, which you will write afterwards. The process of taking large, complex pieces and separating them out into their own functions—known as top-down design—is crucial as you write larger and larger programs. Initially, we will write individual functions to serve a small, simple purpose—we might write one or two additional functions to implement a complex step. However, as your programming skill expands, you will write larger, more complex programs. Here, you may end up writing dozens of functions—solving progressively smaller problems until you reach a piece small enough that you do not need to break it down any further. While it may seem advantageous to just write everything in one giant function, such an approach not only makes the programming more difficult, but also tends to result in a complex mess that is difficult to test and debug. Whenever you have a chance to pull a well-defined logical piece of your program out into its own function, you should consider this an opportunity, not a burden. We will talk much more about writing larger programs in Chapter 13 after you master the basic programming concepts and are ready to write significant pieces of code. 4.5.1 Composability

When you are translating your code from your algorithmic description to C (or whatever other language you want), you can translate an instruction into code in the same way, no matter what other steps it is near or what conditions or repetitions it is inside of. That is, you do not have to do anything special to write a loop inside of another loop, nor to write a conditional statement inside of a loop—you can just put the pieces together and they work as expected. The ability to put things together and have them work as expected is called composability and is important to building not only programs, but other complex systems. If you put a for loop inside of an if statement, you do not need to worry about any special rules or odd behaviors: you only need to know how a for loop and an if statement work, and you can reason about the behavior of their combination. In general, modern programming languages are designed so that features and language constructs can be composed and work as expected. C (and later C++) follows this principle pretty well, so you can compose pretty much anything you learn from this book with pretty much anything else (with one notable exception that we will discuss in Section 18.7). 4.5.2 Finishing Our Earlier Examples Now that we have seen how to translate our generalized steps into code, we can finish our earlier examples. We will start with the simpler one, the triangle of stars. Here, we see the code as the translation of our generalized steps from earlier (which are placed in the code as comments—each line corresponds to the directions from the algorithm on the line right before it):

1 //prints i stars 2 void printIStars(int i) { 3 //Count (call it j) from 1 to i (inclusive) 4 for (int j = 1; j <= i; j++) { 5 // Print a star 6 printf(\"*\"); 7} 8} 9 //prints a triangle of n stars 10 void printStarTriangle(int n) { 11 //Count (call it i) from 1 to n (inclusive) 12 for (int i = 1; i <= n; i++) { 13 //Print i stars 14 printIStars(i); 15 //Print a newline 16 printf(\"\\n\"); 17 } 18 } Note how we abstracted out printIStars. The resulting function is small enough and simple enough that we would have been justified in writing it inline if we saw exactly how to do it right away. However, there is nothing wrong with pulling it out into its own function and solving it separately. In fact, if it is not immediately obvious what to write to translate it, abstracting it out is exactly what you should do. The other example we saw earlier was our rectangle intersection problem. We can translate these steps into code as well, using the same approach. Again, we include the generalized steps as comments and abstract out the maximum and minimum functions—here, doing so allows us to avoid writing the code for this functionality twice, as we need each in two places. Here is the resulting code: 1 //a rectangle with left, bottom, top, and right 2 struct rect_tag { 3 float left; 4 float bottom; 5 float top; 6 float right;

7 }; 8 typedef struct rect_tag rect_t; 9 10 float minimum(float f1, float f2) { 11 //compare f1 to f2 12 if (f1 < f2) { 13 //if f1 is smaller than f2, then f1 is your 14 return f1; 15 } 16 else { 17 //otherwise, f2 is your answer 18 return f2; 19 } 20 } 21 float maximum(float f1, float f2) { 22 //compare f1 to f2 23 if (f1 > f2) { 24 //if f1 is larger than f2, then f1 is your a 25 return f1; 26 } 27 else { 28 //otherwise, f2 is your answer 29 return f2; 30 } 31 } 32 //To find the intersection of two rectangles, r1 33 rect_t intersection(rect_t r1, rect_t r2) 34 //Make a rectangle (called ans) with 35 rect_t ans; 36 //left: maximum of r1’s left and r2’s left 37 ans.left = maximum(r1.left, r2.left); 38 //bottom: maximum of r1’s bottom and r2’s botto 39 ans.bottom = maximum(r1.bottom, r2.bott 40 //right: minimum of r1’s right and r2’s right 41 ans.right = minimum(r1.right, r2.right) 42 //top: minimum of r1’s top and r2’s top 43 ans.top = minimum(r1.top, r2.top); 44 //The rectangle called ans is your answer 45 return ans; 46 } 4.6 A Complete Example Video 4.1: Writing the isPrime function.

Video 4.1 walks through a complete example— writing the isPrime function, which takes one integer (N) and determines if N is prime (in which case, it returns 1) or not (in which case it returns 0). 4.7 Next Steps: Compiling, Running, Testing and Debugging Steps 6 (testing your program) and 7 (debugging it) require you to be able to run your code. The computer’s processor does not actually understand the source code directly, so the source code must first be translated to the numerically encoded (Everything Is a Number) instructions that the processor can understand. In the next chapter, we will discuss this process, as well as some of the details of what you need to put in your source file to make a complete program. We will then discuss testing and debugging in Chapter 6. 4.8 Practice Exercises Selected questions have links to answers in the back of the book. • Question 4.1 : Write a function myAbs, which takes an integer and returns an integer that is the absolute value of . • Question 4.2 : Write a function avg3, which takes three integers ( , , and ) and returns the floating point number that is their average. Be careful—you may need to think back to what you learned in Section 3.4.2 to be sure you get the right answer (in particular, if , , and ), you should get , not ).

• Question 4.3 : Write a function myRound, which takes a double and returns an integer. This function should round to the nearest integer, and return the rounded result. (Hint: to get the fractional portion of (the part after the decimal), think about how you can get the integral portion (the part before the decimal) using what you learned in Chapter 3—then think about what mathematical operation you can use to compute the fractional portion from the information you have). • Question 4.4 : Write a function factorial, which takes an integer and returns an int that is the factorial of ( in math notation). • Question 4.5 : Write a function isPow2, which takes an integer and returns an int that is 1 (“true”) if is a power of 2 and 0 (“false”) if it is not. Note that 1 is a power of 2 ( ), and 0 is not a power of 2. Note: some approaches to this problem involve computing . In C, if you write 2^i it will NOT compute —instead, it will compute the bitwise exclusive-or (XOR) of 2 and . If you want to compute easily, you can write 1 << i (where << is the binary left shift operator— so it takes the number 1 and puts 0s after it in the binary representation. • Question 4.6 : Write a function printBinary, which takes an integer and returns void. This function should print the binary representation of the number . You may assume that the number has at most 20 bits. • Question 4.7 : Write a function printFactors, which takes an integer , and returns void. This function should print the prime factorization of — that is, it should print a multiplication expression composed of prime numbers, which if evaluated

would result in . For example, given the number 132, your function should print 2 * 2 * 3 * 11. If your function is given a number that is 1 or less, your function should print nothing. Hint: you should end up with at least two steps that are too complex to translate directly into a single line—these should result in additional functions you write. • Question 4.8 : Write a function isPerfect, which takes an integer and determines if is perfect (hint: the definition of “perfect number” is domain knowledge in math—if you do not know it, look it up before you attempt Step 1). If is perfect, your function should return 1, otherwise it should return 0. • Question 4.9 : Write a function power, which takes two unsigned integers, and , and returns an unsigned integer. This function should compute and return (that is, to the power). • Question 4.10 : For any of the previous questions, take your code, and trade it with a friend. First, you should examine the syntax of each other’s code— does it follow the rules we specified in Chapter 2? If not, explain to each other what you think is not correct. Next, have her execute your code by hand (for some reasonable parameter values) while you execute her code by hand. Determine if each other’s code is correct. If not, help each other understand what might be wrong, and figure out how to fix it. 3 Types5 Compiling and Running Generated on Thu Jun 27 15:08:37 2019 by L T XML

I Introduction to Programming in C4 Writing Code6 Fixing Your Code: Testing and Debugging

Chapter 5 Compiling and Running Once you have written your code, you need to compile it in order to be able to run it. Compiling a program is the act of translating the human-readable code that a programmer wrote (called “source code”) into a machine-executable format. The compiler is the program that performs this process for you: it takes your source code as input and writes out the actual executable file, which you can then run. At its simplest, compiling your program is a matter of running the compiler and giving it a command line argument specifying the .c file with the source code for your program. There are many different C compilers, but we will be using GCC, which stands for “GNU Compiler Collection.” If you wrote your program in a file called myProgram.c, you could execute the command gcc myProgram.c to compile your code. Assuming there are no errors in your code, the compiler would produce an executable program called a.out, which you could then execute by typing the command ./a.out. However, if you try this on any of the C code you wrote in the previous chapters, you will encounter errors— we need to add a bit more to our code (which we will see shortly) to make it compile. If you are not familiar with using a UNIX/Linux command-line interface, you might want to read Appendix B for more information about using the command line and command-line arguments. Typically, you will invoke the compiler with more arguments than just the name of one source file. We will discuss those options after we cover some more about the compilation process. 5.1 The Compilation Process

When you are programming, you will use the compiler all the time, so it is useful to have a bit of an understanding of what it does at a high level. The inner workings and details of the compiler are quite complex and require an in-depth understanding of many areas of computer science and engineering, so we will not go into those. Instead, we will just cover the parts relevant to day-to-day program writing. Figure 5.1: A high-level view of the process that GCC takes to compile a program. Light blue boxes represent code you have written; light green boxes represent built-in parts of C. Figure 5.1 shows a high-level overview of the process that GCC goes through to compile the code. In this picture, the light blue boxes represent code you have written, while the light green boxes represent built-in parts of C. The orange clouds indicate steps of this process (each is a separate program, but GCC invokes these programs for you), and the white boxes represent intermediate files that GCC generates to pass information from one stage to the next. The dark blue box in the upper right represents the final executable—the program that you can run to make your computer do whatever the program tells it to do. 5.1.1 The Preprocessor: Header Files and Defines The first step (in the upper left) is the preprocessor, which takes your C source file and combines it with any header files that it includes, as well as expanding any macros that you might have used. To help understand this process, we

will look at our first complete C program, which is algorithmically simple (all it does is print “Hello World”), but useful for explaining the compilation process. 1 #include <stdio.h> 2 #include <stdlib.h> 3 4 int main(void) { 5 printf(\"Hello World\\n\"); 6 return EXIT_SUCCESS; 7} While the main function’s contents should be mostly familiar—a call to printf and a return statement—there are several new concepts in this program. The first two lines are #include directives (pronounced “pound include” or “hash include”). These lines of code are not actually statements that are executed when the program is run, but rather directives to the preprocessor portion of the compiler. In particular, these directives tell the preprocessor to literally include the contents of the named file at that point in the program source, before passing it on to the later steps of the compilation process. These #include directives name the file to be included in angle brackets (<>) because that file is one of the standard C header files. If you wrote your own header file, you would include it by placing its name in quotation marks (e.g., #include \"myHeader.h\"). (This is not a formal rule but a very common convention.) Preprocessor directives begin with a pound sign (#) and have their own syntax. In this particular program, there are two #include directives. The first of these directs the preprocessor to include the file stdio.h and the second directs it to include stdlib.h. These header files—and header files in general— primarily contain three things: function prototypes, macro definitions, and type declarations. A function prototype looks much like a function definition, except that it has a semicolon in place of the body. The prototype tells the compiler that the function exists somewhere in the program, as well as the return and

parameter types of the function. Providing the prototype allows the compiler to check that the correct number and type of arguments are passed to the function and that the return value is used correctly, without having the entire function definition available. In the case of printf, stdio.h provides the prototype. The actual implementation of printf is inside the C standard library, which we will discuss further when we learn about the linker in Section 5.1.4 Header files may also contain macro definitions. The simplest use of a macro definition is to define a constant, such as 1 #define EXIT_SUCCESS This directive (from stdlib.h) tells the preprocessor to define the symbol EXIT_SUCCESS to be . Whenever the preprocessor encounters the symbol EXIT_SUCCESS, it sees that it is defined as a macro and expands the macro to its definition. In this case, the definition is just , so the preprocessor replaces EXIT_SUCCESS with in the source it passes on to the later stages of compilation. Note that the preprocessor splits the input into identifiers, and checks each identifier to see if it is a defined macro, so EXIT_SUCCESS_42 will not expand to 0_42, since it is not defined as a macro, and the preprocessor will leave it alone (unless it is defined elsewhere). Using macro definitions for constants provides a variety of advantages to the programmer over writing the numerical constant directly. For one, if the programmer ever needs to change the constant, only the macro definition must be changed, rather than all of the places where the constant is used. Another advantage is that naming the constant makes the code more readable. The naming of the constant in return EXIT_SUCCESS gives you a clue that the return value here indicates that the program succeeded. In fact, this is exactly what this statement does. The return value from main indicates the success or failure of your program to whatever program ran it.

A third advantage of using macro-defined constants is portability. While 0 may indicate success on your platform— the combination of the type of hardware and the operating system you have—it may mean failure on some other platform. If you hardcode the constant 0 into your code, then it may be correct on your platform, but it may need to be rewritten to work correctly on another platform. By contrast, if you use the constants defined in the standard header files, then when you recompile your program on the new platform, it will just work correctly—the header files on that platform will have those constants defined to the correct values. The standard header file limits.h contains constants specifically for portability. These constants describe the maximum and minimum value of various types on the current platform. For example, if a program needs to know the maximum and minimum values for an int, it can use INT_MAX and INT_MIN, respectively. On platforms where an int is 32 bits, these would be defined like this: 1 #define INT_MAX 2147483647 2 #define INT_MIN -2147483648 On a platform with a different size for the int type, these would be defined to whatever value is appropriate for the size of the int on that platform. Macros can also take arguments; however, these arguments behave differently from function arguments. Recall that function calls are evaluated when the program runs, and the values of the arguments are copied into the function’s newly created frame. Macros are expanded by the preprocessor (while the program is being compiled, before it has even started running), and the arguments are just expanded textually. In fact, the macro arguments do not have any declared types, and do not even need to be valid C expressions—only the text resulting from the expansion needs to be valid (and well typed).

We could (though as we shall see shortly, we shouldn’t) define a SQUARE macro as follows: 1 #define SQUARE(x) x * x The preprocessor would then expand SQUARE(3) to 3 * 3, or SQUARE(45.9) to 45.9 * 45.9. Note that here, we are using the fact that macro arguments do not have types to pass it an int in the first case and a double in the second case. However, what happens if we attempt SQUARE(z - y)? If this were a function, we would evaluate z - y to a value and copy it into the stack frame for a call to SQUARE; however, this is a macro expansion, so the preprocessor works only with text. It expands the macro by replacing x in the macro definition with the text z - y, resulting in z - y * z - y. Note that this will compute z - (y * z) - y, which is not z - y squared. We could improve on this macro by defining it like this: 1 #define SQUARE(x) ((x) * (x)) Now, SQUARE(z - y) will expand to ((z - y) * (z - y)), giving the correct result irrespective of the arguments and location of the macro. We can still run into problems if the macro argument has side effects. For example, if we type SQUARE(f(i)), then the function f will be called twice. If f prints something, it will be printed twice, once for each call to f in the macro expansion. The SQUARE macro is a bit contrived—we would be better to write the multiplication down where we need it—but highlights the nature (and dangers) of textual expansion of macros. Section E.3 contains much more information about the C preprocessor and macros. They are quite powerful and can be used for some rather complex things, but you will not need them for most things you write in this book. Header files may also contain type declarations. For example, stdio.h contains a type declaration for a FILE type,

which is used by a variety of functions that manipulate files. The functions also have their prototypes in stdio.h, and we will discuss them in Chapter 11 when we learn about accessing files and reading input. Another example of type declarations in standard header files is the integer types in stdint.h. As mentioned in Chapter 3, integers come in different sizes, and the size of an int varies from platform to platform. Often programmers do not care too much about the size of an int, but sometimes using a specifically-sized int is important. stdint.h defines types such as int32_t (which is guaranteed to be a 32-bit signed int on any platform) or uint64_t (which is always a 64-bit unsigned int). With all of that in mind, we can revisit our Hello World program from earlier: 1 #include <stdio.h> 2 #include <stdlib.h> 3 4 int main(void) { 5 printf(\"Hello World\\n\"); 6 return EXIT_SUCCESS; 7} The preprocessor would take this code and basically transform it into this: 1 int printf(const char * , ...); 2 3 int main(void) { 4 printf(\"Hello World\\n\"); 5 return 0; 6} In actuality, the result of the preprocessor is a few thousand lines long, since it includes the entire contents of stdio.h and stdlib.h, which have many other function prototypes and type declarations. However, these do not affect our code (the compiler will know those types and functions exist, but since we do not use them, they do not matter). We will note that the prototype for printf contains a

few features that we have not seen yet. You can think of const char * as being the type for a literal string, and the ... means that printf takes a variable number of arguments—a feature of C that we will not delve into beyond using it to call printf and similar functions. 5.1.2 The Actual Compiler The output of the preprocessor is stored in a temporary file and passed to the actual compiler. At first, it may seem a bit confusing that one of the steps of the compilation process is the actual compiler. However, we often refer to complex processes by their eponymous step. For example, if you say “I am going to bake a cake,” even though only part of the process involves actually baking the cake (cooking the cake batter in the oven), we still understand that you will go through the entire process: finding a recipe, getting the ingredients, mixing the batter, baking the cake, letting it cool, and icing it. The compiler reads the pre-processed source code— which has all the specified files included and all macro definitions expanded—and translates it into assembly. Assembly is the lowest-level type of human-readable code. In assembly, each statement corresponds to one machine instruction. Machine instructions correspond to very simple operations, such as adding two numbers or moving a value to or from memory. While humans can program in assembly (sometimes with good reason), that is not our focus here— we just want to understand enough of the compilation process to know what is going on when we compile our programs. The compiler is a rather complex program. Its first task is to read in your program source and “understand” your program according to the rules of C—a task called parsing the program. In order for the compiler to parse your program, the source code must have the correct syntax (i.e., as we

described in Chapter 2). If your code is not syntactically correct, the compiler will print an error message and attempt to continue parsing but may be confused by the earlier errors. For example, the following code has one error—a missing semicolon after int x = 3 on line 5: 1 #include <stdio.h> 2 #include <stdlib.h> 3 4 int main(void) { 5 int x = 3 6 int y = 4; 7 printf (\"%d\\n\", x + y); 8 return EXIT_SUCCESS; 9} However, compiling the code results in two errors: err1.c: In function ’main’: err1.c:6:5: error: expected ’,’ or ’;’ before ’int’ int y = 4; ^ err1.c:7:25: error: ’y’ undeclared (first use in this function) printf (\"%d\\n\", x + y); ^ err1.c:7:25: note: each undeclared identifier is reported only once for each function it appears in The first error (expected ’,’ or ’;’ before ’int’) correctly identifies the problem with our code—we are missing a semicolon before the compiler sees int at the start of line 6 (it should go at the end of line 5, but the compiler does not know anything is wrong until it sees the next word). The next error (’y’ undeclared) is not actually another problem with the code, but rather a result of the compiler being confused by the first error—the missing semicolon caused it to misunderstand the declaration of y as a part of the previous statement, so it does not recognize y later in the code. Compiler errors can be a source of confusion and frustration for novice programmers. One key rule in dealing

with them is to try to fix them in order, from first to last. If you find an error message (other than the first one) confusing, you should fix the first error, then retry compiling the code to see if the error goes away—it may just be a result of the compiler being confused by earlier errors. Dealing with Compilation Errors: Tip 1 Remember that the compiler can get confused by earlier errors. If later errors are confusing, fix the first error, then try to recompile before you attempt to fix them. Another source of confusion in dealing with compiler errors is that the error message may not do a good job of describing the problem. The compiler tries its best to describe the problem but may guess incorrectly about what you were trying to do or refer to unfamiliar possibilities. For example, if we have a slightly different variation on the missing semicolon problem from the previous example: 1 #include <stdio.h> 2 #include <stdlib.h> 3 4 int main(void) { 5 int x 6 int y; 7 x = 3; 8 y = 4; 9 printf(\"%d\\n\", x + y); 10 return EXIT_SUCCESS; 11 } The error here is still a missing semicolon after the declaration of x on line 5; however, the compiler gives the following error messages: err2.c: In function ’main’: err2.c:5:5: error: ISO C forbids nested functions [-

Werror=pedantic] int x ^ err2.c:6:5: error: expected ’=’, ’,’, ’;’, ’asm’ or ’__attribute__’ before ’int’ int y; ^ err2.c:7:5: error: ’x’ undeclared (first use in this function) x = 3; ^ err2.c:7:5: note: each undeclared identifier is reported only once for each function it appears in cc1: all warnings being treated as errors These error messages may appear quite intimidating, since they refer to several language features we have not seen (and which are all irrelevant to the problem). The first error message here is actually completely irrelevant, other than that it is on the right line number. The compiler first guesses that we might be trying to use a non-standard language extension (extra features that one particular compiler allows in a language) that allows us to write one function inside of another. This extension is forbidden by this version of the language standard. The compiler is trying to be helpful, even though it is wrong. In general, if an error message is referencing something completely unfamiliar, it is likely that the compiler is confused or offering suggestions that are not relevant to what you are trying to do. Dealing with Compilation Errors: Tip 2 If parts of an error message are completely unfamiliar, try to ignore them and see if the rest of the error message(s) make sense. If so, try to use the part that makes sense to understand and fix your error. If not, search

for the confusing parts on Google and see if any of them are relevant. Looking at the next error message, we see that it is on the same line number (6), so the compiler may be trying to give us more or different information about the same problem. Here, while there are some unfamiliar things (such as asm or __attribute__), the rest of the error message makes sense if we ignore these. That is, we could understand the error message if it were: err2.c:6:5: error: expected ’=’, ’,’, or ’;’ before ’int’ The compiler has told us it expects an equals sign, comma, or semicolon before we get to the word int on line 6. In this case, we are missing a semicolon, but the compiler is suggesting other possibilities (e.g., we could have had an = had we been writing int x = 3;). The asm and __attribute__ suggestions it provided could also have been options, but these are very advanced language features, which we will not cover. Another source of confusion in dealing with error messages can arise from the fact that the compiler may not notice the error immediately. As long as the source code seems to obey the rules of C’s syntax, the compiler assumes everything is correct, even if its parse does not agree with what you meant. In the following code, the actual error is omission of a close curly brace on line 9: 1 #include <stdio.h> 2 #include <stdlib.h> 3 4 int main(void) { 5 for (int i = 0; i < 10; i++) { 6 for (int j = 0; j < 10; j++) { 7 printf(\"%d\\n\", (i + 3) * (j + 4)); 8} 9 10 return EXIT_SUCCESS; 11 }

However, the compiler does not recognize the error immediately. Having a return statement inside the for loop is syntactically valid. The closing brace after it on line 11— which the programmer intended to be the end of main is also legal, but the compiler parses it as the end of the first for loop. The compiler then discovers that something is not right when it reaches the end of the file and the body of main has not ended, reporting the error as: err3.c: In function ’main’: err3.c:11:1: error: expected declaration or statement at end of input } ^ Dealing with Compilation Errors: Tip 3 Programmer’s editors are very good at helping you find mismatched braces and parentheses. Most will indent the code according to the nesting level of the braces it is inside and will show you the matching brace/parenthesis when your cursor is on one of the pair. Using these sorts of features can be the easiest way to find errors from mismatched braces and parentheses. Once the compiler has parsed your program, it type- checks the program. Type-checking the program involves determining the type of every expression (including the sub- expressions inside of it) and making sure that they are compatible with the ways they are being used. It also involves checking that functions have the right number and type of arguments passed to them. The compiler will produce error messages for any problems it discovers during this process.

Another important note about fixing compiler errors is that sometimes fixing one error will let the compiler progress further along, resulting in more errors. Novice programmers who are not confident in their correction of their first error may undo the fix, thinking they made things worse, even though the fix may have been correct—the extra errors just came from progressing further through the process. Dealing with Compilation Errors: Tip 4 Be confident in your fix for an error. If you do not understand what is wrong and how to fix it, find out and be sure rather than randomly changing things. Both Appendix F and the web page http://aop.cs.cornell.edu/errors/index.html provide more details on a variety of error messages and will likely prove useful in helping you diagnose errors you may encounter. Once the compiler finishes checking your code for errors, it will translate it into assembly—the individual machine instructions required to do what your program says. You can ask the compiler to optimize your code—transform it to make the resulting assembly run faster—by specifying the -O option followed by the level of optimization that you want. Programs are usually compiled with no optimizations for testing and debugging (the code transformation makes the program very difficult to debug) and then re-compiled with optimizations once they are ready for release/use. Typically, real programs compiled with GCC are compiled with -O3 when they are optimized. GCC provides a variety of other options to control optimization at a finer level of detail; however, they are beyond the scope of this book. We are strictly concerned with writing code that works, so you do not need to bother with the -O flag for any of the exercises you do in this book.

5.1.3 Assembling The next step is to take the assembly that the compiler generated and assemble it into an object file. GCC invokes the assembler to translate the assembly instructions from the textual/human-readable format into their numerical encodings that the processor can understand and execute. This translation is another example of “Everything Is a Number”—even the instructions that the computer executes are numbers. While it is possible to get error messages at this stage, it should not happen for anything you will try to do in this book. Generally errors here are limited to cases in which you explicitly write the specific assembly-level instructions that you want into your program (which is possible in C, but limited to very advanced situations) and make errors in those. The important thing to understand about this step is that it results in an object file. The object file contains the machine-executable instructions for the source file that you compiled but is not yet a complete program. The object file may reference functions that it does not define (such as those in the C library or those written in other files). You can request that GCC stop after it assembles an object file by specifying the -c option. By default, the name of the object file will be the name of the .c file with the .c replaced by .o. For example, gcc -c xyz.c will compile xyz.c into xyz.o. If you wish to provide a different name for the object file, use the -o option followed by the name you want. For example, gcc -c xyz.c -o awesomeName.o will produce an object file called awesomeName.o. This ability to stop is important for large programs where you will split the code into multiple different C files. Splitting the program into large files is primarily useful to help you keep the code split into manageable pieces. However, each source file can be individually compiled to an object file, and

those object files can then be linked together (as we will discuss in the next section). If you change code in one file, you can recompile only that file (to generate a new object file for it) and re-link the program without recompiling any of the other source files. For programs you write in this class, this will not make a difference. However, when you write real programs, you may have tens or hundreds of thousands of lines of code split across dozens or hundreds of files. In this case, the difference between recompiling one file and recompiling all files (especially if optimizations are enabled) may be the difference between tens of seconds and tens of minutes. 5.1.4 Linking The final step of the process is to link the program. Linking the program takes one or more object files and combines them together with various libraries, as well as some startup code, and produces the actual executable binary. The object files refer to functions by name, and the linker’s job is to resolve these references—finding the matching definition. If the linker cannot find the definition for a name (called a “symbol”) that is used, or if the same symbol is defined multiple times, it will report an error. Linker errors—indicated by the fact that they are reported by ld (the linker’s name)—are typically less common than other compiler errors. If you encounter an unresolved symbol error, it means that you either did not define the symbol, did not include the object file that defines the symbol in the link, or that the symbol was specified as only visible inside that object file (which can be done by using the static keyword—a feature we will not delve into). If you encounter errors from duplicate definitions of a symbol, first make sure you did not try to name two different functions with the same name. Next, make sure you did not include any files twice on the compilation command line. Finally, make sure you do not #include a .c file—only header

files—and that you only include function prototypes in the header file, not the function’s definition (there are some advanced exceptions to these rules, but if you are at that stage, you should understand the linking process and static keyword well enough to fix any problems). Sometimes you may wish to use an external library—a collection of functions that are already written and packaged up to use for some purpose (e.g., drawing graphics, playing sound, advanced math, or many other purposes). The C library is one such example, which is linked in by default. For other libraries, you must specifically request that the linker link with the library with the -l command line option, specifying the name of the library right after the l. If all goes well, the linker will resolve the all the symbol references and combine the various object files and libraries together into an executable binary. 5.1.5 Building Large Programs: make The programs you will write as exercises in this book will be relatively small—at most a hundred lines and a couple of files. For programs of this size, recompiling the entire program takes a few seconds. Real programs tend to be quite a bit larger. As the program size increases, so does the compilation time. As an example, one program changed, compiled, and run frequently by one of the authors has about 40,000 lines of code, 90 files, and takes about 75 seconds to compile completely. While 75 seconds may not sound long, it is practically an eternity in the typical development cycle. Programmers recompile their code dozens to hundreds of times per day, meaning that all of those minutes add up—waiting 75 seconds 50 times in a day adds up to a bit more than an hour, or an eighth of a standard work day. This large of an overhead in the development cycle would be prohibitively

large (despite its ability to provide convenient excuses for slacking off: e.g., http://xkcd.com/303/). Fortunately, most of the time, one does not need to recompile all of the source code if the object (.o) files from previous compilations are kept. Instead, only the file (or files) that were changed need to be recompiled, then the program needs to be linked again. As mentioned in the previous sections, compiling each source file to an object file is accomplished with the -c option, and the various object files can then be listed on the command line to GCC in order to pass them to the linker. For comparison, recompiling one source file then relinking the same program described above takes about one second—much faster than recompiling the whole thing. However, orchestrating this process manually is tedious and error prone. If the programmer forgets to recompile a file, then the program will not work correctly, possibly in strange ways. Doing this manually not only requires the programmer to remember all files that were changed, but also which files included with #include were changed. For example, if the programmer changes myHeader.h, then any file that #includes myHeader.h must also be recompiled, as the source files’ contents have effectively changed. Long ago, programmers developed the Make utility to not only automate this process, but also to simplify compiling programs in general. The make command reads a file called Makefile (though you can ask it to read an input file by a different name), which specifies how to compile your program. Specifically, it names the targets that can be made, their dependencies, and the rules to make the target. When make is run, it starts from a particular target— something that it can build out of other things. A common starting target might be your entire program. Make first checks if the target is up-to-date. Checking that a target is up-to-date requires checking that all of the files the target

depends on are themselves up-to-date and that none of them are newer than the target. For example, the target for the whole program may depended on many object files. If any such file is not itself up-to-date (for example, the object file may depend on a .c file that just got changed), it is rebuilt first. Make then follows whatever rule was specified for the target to build it from its dependencies. If the target is already up to date, then it does nothing. Using Make to compile the program is also useful when the command line to compile the program becomes complex, especially when other people may need to do the compilation. Large complicated programs may require linking with several libraries, as well as a variety of other command- line options. Trying to remember all of these and type them in every time you compile is painful. Even worse, many real programs need to be compiled by other people—whether it is multiple members of a large development team or people who you distribute the program to. Expecting these other people to figure out the commands required to make your program leads to much frustration for them. Providing a Makefile allows anyone to build the program simply by typing make. Appendix D explains Make in more detail: how to write a Makefile and a few advanced features. 5.2 Running Your Program Now that you have compiled your program, you are ready to run it. You can run your program much like any other program—by typing its name at the command prompt. However, unlike many programs you have used so far, the directory in which your program is located is not part of the PATH—the environment variable that specifies which directories to look in to run a program. Therefore, you have to specify which directory to find the program in as part of the name. Since the program is in the current directory, you

can just put ./ before the program’s name (for example./myProgram)—telling the command shell to run the myProgram program in the current directory. If you are unfamiliar with the command shell, directories, path names, or environment variables, you can read more about them in Appendix B. You can also run your program from inside various tools that are intended to help you test and debug the program. Two incredibly useful tools are GDB (the GNU Debugger) and Valgrind (pronounced “val-grinned” with a short ‘i’). For more information about GDB, we will discuss debugging in Chapter 6 and GDB in particular in Section D.2. Valgrind emulates the computer but tracks more information about what your program is doing to report errors that may otherwise go undetected (see Section D.3). Valgrind is particularly good at finding errors in your program that did not manifest simply because you got lucky when you ran it. For example, recall from Chapter 2 that a variable does not have a value until you assign it one—we just draw a ? in its box to indicate the value is unknown. If your program has a variable that is used before it is assigned any value—called an uninitialized variable—the variable will evaluate to whatever value was previously stored into the location that the variable resides in. You may get “lucky” on what value the uninitialized variable ends up holding. In fact, you may get “lucky” thousands of times and not see any problem but then get unlucky when it actually matters. This may seem highly improbable—after all, with billions of possible values that the variable could end up being, what are your chances of getting lucky? However, the value that you end up with is not random, it is whatever was in that storage location previously—for example, the value that some other variable had when it went out of scope. When you run your program directly on the computer, it does not explicitly track whether a variable has been initialized—the compiler generates instructions that do

exactly what your program specifies, and the computer does exactly what those instructions tell it to. When you run your program in Valgrind, however, Valgrind explicitly tracks which variables are initialized and which are not. If your program uses the uninitialized value in certain ways, Valgrind will report an error to you (it does not do this in all cases for a variety of reasons beyond the scope of this discussion— however, suffice it to say that it will catch the cases where it really matters). Valgrind is useful for detecting many other problems with your program that you might not otherwise discover easily. We highly recommend running your program in Valgrind whenever you are testing your program. We will talk about testing in much more detail in Chapter 6, but for now, you will want to run your program to see if it works. When we learn how to make programs that work with various forms of input, you will want to test your program on a variety of inputs to become more and more confident that it works. Video 5.1: Writing, compiling, and running the “Hello World” program. Video 5.1 ties everything we have talked about so far in this chapter with the “Hello World” example. The video shows writing the code, compiling it, and then running it directly as well as in Valgrind. In the video, we use some compiler options, which we recommend you use in general, and will discuss momentarily. 5.3 Compiler Options Usually, you will want to name the resulting program something meaningful, rather than the default a.out. To change the output file name, use the -o option, and specify the output file name after it. For example, gcc -o myProgram myProgram.c would compile myProgram.c (as above), but

instead of producing a.out, it would name the program myProgram. Another option you will commonly want to use is -- std=gnu99. This option specifies that the compiler should use the C99 standard with GNU extensions. There are actually a few different versions of C (referred to as “standards” because they reflect a standardization of features). C99 will match what we describe in this book, and is generally a reasonable standard to program in. Another useful pair of options are -Wall and -Werror. The first of these requests that the compiler issue warnings for a wide range of questionable behavior. As we discussed in Section 5.1.2, the compiler checks your program for certain kinds of errors. If it detects errors, it reports them to you and requires you to fix them before proceeding. Warnings are like errors in that the compiler will report a problem that it has detected; however, unlike errors, the compiler will continue and produce a program even if it warned you about something. The -Werror option tells the compiler to treat all warnings as errors—making it refuse to compile the program until the programmer fixes all the warnings. Novice programmers often wonder why one would want to enable warnings to begin with, and especially to ask the compiler to convert warnings to errors. Errors prevent your program from compiling, so it seems like any way to avoid them and get to the compiled binary is a good thing. However, a programmer’s goal is not just to produce any executable program, but one that works correctly at hand. Any time the compiler can alert you to potential problems, you can fix them much more easily than if they lead to problems you have to debug. A good analogy is purchasing a house. Think of the compiler as your inspector for the house. While you are eager to purchase and move into your new house, you would much rather have the inspector

discover a potential problem before you buy the house and move in. We strongly recommend compiling with at least the following warning options: -Wall -Wsign-compare -Wwrite-strings -Wtype-limits - Werror These options will help catch a lot of mistakes, and should not pose an undue burden on correctly written code. Recent versions of GCC also support an option - fsanitize=address that will generate code that includes extra checking to help detect a variety of problems at runtime. Using this option is also strongly recommended during your development cycle. However, we will note that you typically cannot run a program built with this option in Valgrind (see Section D.3 for information about Valgrind—we strongly encourage you use it). The two tools detect different, but overlapping sets of problems, so use of both is a good idea —they just have to be used separately. 5.4 Other Possibilities The process of compiling and running a program that we have described here is not the only way to execute a program, although it is fairly typical for C and C++ programs. An alternative to compilation is to interpret the program. Instead of translating the program into machine instructions, the source code is given to an interpreter—a program that reads the source of another program and performs the actions specified in it. Interpreters and compilers have much in common—they must both read and analyze the source code of a program—the key difference is what they do with the program. A compiler translates it into an executable the runs directly on the computer. An interpreter executes the effects of the program. Some interpreters allow for

interaction via a read-eval-print-loop (REPL)—that is, you can type a bit of source code, which the interpreter reads as input (read); it then evaluates it (eval), prints the result (print), and repeats the process (loop). Many people will say that a language is compiled or interpreted (e.g., C is compiled, but Python is interpreted), which is rather imprecise. Any programming language could have a compiler and/or interpreter written for it. A more precise version of this statement would be that a particular language is typically compiled, or another particular language is typically interpreted. C is typically compiled; however, there exist interpreters for it. Likewise, as we discuss in Chapter 32, Python is usually interpreted, although there are also compilers for it. In some cases, the two can be mixed. Typically Java is compiled into instructions for the Java Virtual Machine (JVM) —an instruction set typically exectued by an interpreter, not directly by a computer. Most JVMs perform just-in-time (JIT) compilation, meaning that as they interpret the program, they also compile the program into directly executable code in their own memory (remember: “Everything Is a Number,” including computer instructions), which it can then execute the next time it encounters the same piece of code. JIT provides speed advantages over purely interpreting the code. 5.5 Practice Exercises Selected questions have links to answers in the back of the book. • Question 5.1 : Write, compile, and run a program that prints out your name (followed by a newline). • Question 5.2 : Write, compile, and run a program that prints out all of the numbers from 1 to 1000 (inclusive, each on their own line).

• Question 5.3 : Write, compile, and run a program that, for each number from 1 to 1000 (inclusive), prints out that number, a colon, a space, then the square of that number, one per line. The first line of your output should be 1: 1, the second line should be 2: 4, and so on through 1000: 100000. • Question 5.4 : Take your isPow2 function from Question 4.8 in Chapter 4, and write it in a file called powers.c. Write a main function that prints out all of the powers of 2 less than 20,000,000 (20 million). Compile and run your program. • Question 5.5 : Take your printBinary function to print a number in binary from Question 4.8 in Chapter 4, and write it in a file called printBinary.c. Write a main function that uses the printBinary function to print the binary representation of all of the numbers from 0 to 1000 (inclusive). Compile and run your program. • Question 5.6 : Take your printFactors function to print the prime factors from Question 4.8 in Chapter 4, and write it in a file called printFactors.c. Write a main function that uses the printFactors function to print the prime factors of all of the numbers from 2 to 1000 (inclusive). Compile and run your program. • Question 5.7 : Take your power function to compute from Question 4.8 in Chapter 4, and write it in a file called power.c. Write a main function that uses the power function to print a table of for all from between 0 and 5 (inclusive) and all between 0 and 16 (inclusive). Why do the results for and look strange (hint: think back to Chapter 3)? • Question 5.8 : Find a type that would allow your power function to work correctly on and , and change your code for the previous problem to use this type. You will probably need to look at the man page for printf to find the correct format specifier to print the result. • Question 5.9 : Run Valgrind on all of the programs you have written for the other questions in this

chapter. If it identifies any problems with your code, fix them. 4 Writing Code6 Fixing Your Code: Testing and Debugging Generated on Thu Jun 27 15:08:37 2019 by L T XML

I Introduction to Programming in C5 Compiling and Running7 Recursion

Chapter 6 Fixing Your Code: Testing and Debugging Once you have written and compiled your code, you are ready to test and debug it. These two steps are closely related, but distinct. Testing is the process of trying to find problems (“bugs”) in your code, and debugging is the process of fixing those bugs. Both of these skills are crucial to good programming —what good is writing code if it does not work correctly? Testing your code identifies the problems you need to fix and then gives you increasingly more confidence that the code works as you expect it to. Once you have found a problem, debugging is the key skill to let you fix the problem in a disciplined manner, rather than just being frustrated by randomly changing things to no avail. Many novice programmers (and even many experienced programmers) underestimate the time required for testing. Proper testing requires not only testing on a variety of cases, but also analyzing your code and thinking carefully about what test cases you need to write. Unfortunately, testing is easy to shirk— when you are under time pressure, it may seem worthwhile to just dash together a couple test cases, then move on. However, in large software projects (or even modest-sized ones), “moving on” will typically involve writing code that uses the functions you just wrote. If the functions you wrote are broken, then your next work

starts from a poor foundation, and you will just spend longer trying to make it work. Debugging is another area where many programmers underestimate the time required. It is often hard to say how long it will take to find one particular bug, as you often do not know how complex the problem is until you have solved it. To make matters worse, inexperienced programmers will often rush ahead—trying to debug quickly rather than in a disciplined manner. Such activities follow the old saying “haste makes waste.” A programmer may try to make changes in the hopes that it will fix a problem, without investing the time to fully understand the problem and devise a correct solution. 6.1 Step 6: Test Your Code Testing is all about finding bugs in your code. A good test case is one that identifies a problem. While this may seem unintuitive, consider the analogy of going to the doctor: while it may be great to be told you are healthy, if there is something wrong with you, the test that identifies that problem (so that the doctor can fix it) is the most important test. Similarly, with your program, you want to identify and fix the problem rather than think it works when it does not. In the case of a class assignment, you can think of this as preferring to find and fix a problem before you turn the assignment in, rather than think it works and receive a poor grade. In “the real world,” the consequences become more serious: bad code kills people. As our world becomes more and more automated—software controls critical flight systems on aircraft, cars are beginning to drive themselves, etc.—it is easy to imagine how software bugs can cause fatal catastrophes, but you do not even need to imagine these possibilities, as there have already

been people killed by software bugs. Examples include the Therac-25 (a radiation therapy machine whose software bugs overdosed and killed patients), the Patriot Missile defense system (a software bug caused the system to fail intercept a Scud missile in the Gulf War: 28 people died, and many more were injured), and the crash of a Spanish military transport plane (the software to control the engines malfunctioned and prevented three of the four engines from delivering enough power to keep the plane in the air; four people died, and two were injured). Even when software bugs are not fatal to humans, they can cause other types of catastrophes: a $400 million trading error, exploding rockets, a phone system crash, a blackout of most of the east coast, and the Mars Climate Orbiter crash were all the result of bugs in software. The Heartbleed bug presented the possibility of hackers gaining information from and access to affected systems (https://xkcd.com/1354/ provides a nice high- level explanation of the problem. Once you finish the next few chapters, you could understand a more technical explanation). One of the keys to testing your code well—both thoroughly and in a manageable fashion—is to test incrementally. Write one function, then test it well before moving on. Ideally, you should be completely confident that a function you wrote works before you use it. Note that while we discussed a top-down design methodology in Chapter 4, incremental testing is naturally a bottom-up approach. These two techniques do not conflict with each other—if you write code for function A, and end up needing to write functions B and C as part of it, you should test B and C before testing A.

6.1.1 Black Box Testing The testing methodology that most people think of first is black box testing. In black box testing, the tester considers only the expected behavior of the function—not any implementation details—to devise test cases. The lack of access to the implementation details is where this testing methodology gets its name—the function’s implementation is treated as a “black box,” which the tester cannot look inside. Black box testing does not mean that the tester thinks of a few cases in an ad hoc manner and calls it a day. Instead, the tester—whose goal is to craft test cases that are likely to expose errors—should contemplate what cases are likely to be error-prone from the way the function behaves. For example, suppose you needed to test a function to sum a list of integers. Without seeing the code, what test cases might you come up with? We might start with a simple test case to test the basic functionality of the code—for example, seeing that the function gives an answer of 15 when given an input of {1, 2, 3, 4, 5}. However, we should also devise more difficult test cases, particularly those that test corner cases—inputs that require specific behavior unlike that of other cases. In this example, we might test with the empty list (that is, the list with no numbers in it) and see that we get 0 as our answer. We might make sure to test with a list that has negative numbers in it or one with many very large numbers in it. Note that Practice Exercise 6.3 asks you to contemplate what sorts of errors these test cases might expose. Observe that we were able to contemplate good test cases for our hypothetical problem without going through Steps 1–5. You can actually come up with a set of black

box tests for a problem before you start on it. Some programmers advocate a test-first development approach. One advantage is that if you have a comprehensive test suite written before you start, you are unlikely to skimp on testing after you implement your code. Another advantage is that by thinking about your corner cases in advance, you are less likely to make mistakes in developing and implementing your algorithm. Writing good tests cases requires a lot of thought: you need to think about a wide variety of things that can go wrong. Here are some suggestions to guide your thinking, as well as important lessons. First, some thoughts for testing error cases: • Make sure your tests cover every error case. Think about all the inputs the program cannot handle, i.e., ones where you would expect to receive an error message. If the program requires a number, give it xyz, which is not a valid number. An even better test case might be 123xyz, which starts out as a number but isn’t entirely a number. How the program should handle this depends on the requirements of the program, but you want to make sure it handles it correctly. • Be sure to test “too many” as well as “too few.” If a program requires exactly things, test it with at least one case greater than and at least one case with fewer than . For example, if a program requires a line of input with exactly 10 characters, test it with 9 and with 11. • Any given test case can only test one “error message and exit” condition. This means that if you want to test two different error conditions, you need two different test cases: one for each error condition. Suppose that our program requires a three-letter string of lowercase letters. We might

be tempted to test with aBcD to test two things at once, but this will only test one of them (and believing you tested both is problematic!) To see why this rule exists, think about the algorithm to check for these errors: Check if the input string has exactly three letters. If it does not, then: print \"Error: the input needs to be three letters\" exit the program Check if the input string is made up entirely of lowercase letters. If it is not then: print \"Error: the input needs to be all lowercase letters\" exit the program Violating one input condition will cause the program to exit! This is also true of other situations where the program rejects the input, even if does not exit. • Test exactly at the boundary of validity. If a program requires between 7 and 18 things, you should have test cases with 6, 7, 8, 17, 18, and 19 things. You need to make sure that 6 and 19 are rejected while 7, 8, 17, and 18 are accepted. Testing exactly at the boundaries is important because of the common “off by one” mistake— maybe the programmer wrote < when she should have written <=, or >= when she should have written >, or something similar. If you test with values that are right at the boundary, you will find these mistakes. However, testing is not just about checking error handling. You want to make sure the algorithm correctly handles valid inputs too. Here are some suggestions:

• Think carefully about whether or not there are any special cases where one particular input value (or set of values) has to be treated unusually. For example, in poker, an Ace is usually ranked the highest; however, it can have the lowest ranking in an “Ace low straight” (5 4 3 2 A). If you are testing code related to poker hands, you would want to explicitly test this case, since it requires treating an input value differently from normal. • Think carefully about the requirements, and consider whether something could be misinterpreted, easily mis-implemented, or have variations that could seem correct. Suppose your algorithm works with sequences of decreasing numbers. You should test with a sequence like 7 6 6 5 4, which has two equal numbers in it. Checking equal numbers is a good idea here, since people might have misunderstood whether the sequence is strictly decreasing (equal numbers don’t count as continuing to decrease) or non-increasing (equal numbers do count). • Think about types. What would happen if the programmer used the wrong type in a particular place? This could mean that the programmer used a type that was too small to hold the required answer (such as a 32-bit integer when a 64-bit integer is required), used an integer type when a floating point type is required, or used a type of the wrong signedness (signed when unsigned is required or vice versa). • Consider any kind of off-by-one error that the programmer might have been able to make. Does the algorithm seem like it could involve counting? What if the programmer was off by one at either end of the range she counted over? Does it involve numerical comparison? What if < and <= (or > and >=) were mixed up?

• Whenever you have a particular type of problem in mind, think about how that mistake would affect the answer relative to the correct behavior, and make sure they are different. For example, suppose you are writing a program that takes two sequences of integers and determine which one has more even numbers in it. You have considered that the programmer might have an off-by-one error where he accidentally misses the last element of the sequence. Would this be a good test case? Sequence 1: 2 3 5 6 9 8 Sequence 2: 1 4 2 8 7 6 This would not be a good test case for this particular problem. If the program is correct, it will answer “Sequence 2” (which has four, compared to three). However, if the algorithm mistakenly skips the last element, it will still answer “Sequence 2” (because it will count three elements in Sequence 2, and two elements in Sequence 1). A good test case to cover this off-by-one-error would be: Sequence 1: 2 3 5 6 9 8 Sequence 2: 1 4 2 8 7 3 Now a correct program will report a tie (three vs. three) and a program with this particular bug will report Sequence 2 as having more even numbers. • Consider all major categories of inputs, and be sure you cover them. For numerical inputs, these would generally be negative, zero, and positive. One is also usually a good number to be sure you cover.

For sequences of data, your tests should cover an empty sequence, a single element sequence, and a sequence with many elements. For characters, use lowercase letters, uppercase letters, digits, punctuation, spaces, non-printable characters in your test cases. For many algorithms, there are problem- specific categories you should consider. For example, if you are testing a function related to prime numbers (e.g., isPrime), then you should consider prime and composite (not prime) numbers as input categories to cover. When you combine two ways to categorize data, cover all the combinations. For example, if you have a sequence of numbers, you should test with an empty list, a one-element sequence with zero, a one-element sequence with a negative number, a one-element sequence with a positive number, and have each of negative, zero, and positive numbers appearing in your many-element sequences. • An important corollary of the previous rules is that if your algorithm gives a set of answers where you can list all possible ones (true/false, values from an enum, a string from a particular set, etc.), then your test cases should ensure that you get every answer at least once. Furthermore, if there are other conditions you think are important, you should be sure that you get all possible answers for each of these conditions. For example, if you are getting a yes/no answer, for a numeric input, you should test with a negative number that gives yes, a negative number that gives no, a positive number that gives yes, a positive number that

gives no, and zero (zero, being only one input, will have one answer). All of this advice is a great starting point, but the most important thing for testing is to think carefully— imagine all the things that could go wrong, think carefully about how to test them, and make sure your test cases are actually testing what you think they are testing. 6.1.2 White Box Testing Another testing methodology is white box testing. Unlike black box testing, white box testing involves examining the code to devise test cases. One consideration in white box tests is test coverage—a description of how well your test cases cover the different behaviors of your code. Note that while white box and black box testing are different, they are not mutually exclusive, but rather complimentary. One might start by forming a set of black box test cases, implement the algorithm, then create more test cases to reach a desired level of test coverage. We will discuss three levels of test coverage: statement coverage, decision coverage, and path coverage. To illustrate each of these types of coverage, we will use the (contrived) code example: 1 int aFunction(int a, int b, int c) { 2 int answer = 0; 3 if (a < b) { 4 answer = b - a; 5} 6 if (b < c) { 7 answer = answer * (b - c); 8} 9 else { 10 answer = answer + 42; 11 }

12 return answer; 13 } Statement coverage means that every statement in the function is executed. In our example, if we were to use only one test with a = 1, b = 2, and c = 3, we would not have statement coverage. Specifically, this test case (by itself) does not execute line 10 of the code (the body of the else). Our lack of coverage indicates we have not adequately tested our function—it has behaviors our test case has not exercised. We can then figure out more test cases to add to our test suite to improve our testing until the desired coverage level is met. For example, we might add the test case a = 1, b = 3, and c = 2 to our test suite to test the else case. The two test cases together provide statement coverage of this function. Statement coverage is a minimal starting point, so if we want to test our code well, we likely want a stronger coverage metric. To see how statement coverage falls short, notice that we have not tested any cases where the a < b test evaluates to false—that is, a case where we do not enter the body of the if statement (line 4). A stronger level of coverage is decision coverage— in which all possible outcomes of decisions are exercised. For boolean tests, this means we construct a test case where the expression evaluates to true and another where it evaluates to false. If we have a switch/case statement, we must construct at least one test case per choice in the case statement, plus one for the default case, even if it is implicit—that is, if we do not write it down, it behaves as if we wrote default: break;. To understand decision coverage more fully, it helps to visualize it. In order to visualize decision coverage, we

need to understand the concept of a control flow graph (CFG). A control flow graph is a directed graph (in the mathematical sense) whose nodes are basic blocks (boxes) and whose edges (arrows) represent the possible ways the control flow arrow can move from one basic block to the next. A basic block is a contiguous sequence of statements that must be executed in an “all-or-nothing” manner; the execution arrow cannot jump into or out of the middle of a basic block. (We discuss graphs in much more detail in Chapter 25.) Figure 6.1: The control flow graph for our example code. Figure 6.1 shows the control flow graph for our example function (aFunction) from above. The very first basic block has a lone arrow coming into it, indicating that it is the start—the execution arrow can only reach it by calling the function. This first basic block has two statements: the declaration/initialization of answer and the if statement, which tests if a < b. The basic block ends here because the execution arrow could go two places— indicated by the two edges leaving this block. The final block has no arrows going out of it; when the execution arrow moves to the end of this block, there is nowhere else to go within this function. At this point, the function is done and returns its answer. Note that we draw the CFG

for one function at a time. We could also draw how the execution arrow moves between functions, which is called a call graph. Decision coverage corresponds to having a suite of test cases that covers every edge in the graph. In our graph, if a < b is true, we will take the left edge coming out of the first block (corresponding to going into the “then” clause of the if/else). If it is false, we will take the right edge, which skips over this clause, and as there is no else clause, goes to the next if statement. This second edge—representing the path of our execution arrow when a < b is false—is highlighted in red in Figure 6.1, since it represents the shortcomings of our test cases so far with respect to decision coverage. We can obtain decision coverage on this function by adding a test case where the execution arrow traverses this edge of the control flow graph—or put a different way, where the a < b decision evaluates to false. An example of this test case would be a = 5, b = 2, and c = 1. Figure 6.2: The four possible paths through our example CFG. Our tests cover three out of the four, but the red path has not been tested.

An even stronger type of test coverage is path coverage. To achieve path coverage, our test cases must span all possible valid paths through the control flow graph (following the direction of the arrows). Figure 6.2 shows the four paths through our example control flow graph and color codes them based on which of our test cases cover them. Note that there is one path (shown in red) that we have not yet covered. This path corresponds to the case where a < b is false, but b < c is true. Our test cases that gave us decision coverage tested each of these conditions separately, but none of the tests tested both of these together at the same time—where our execution would follow the red path. We can add another test case (e.g., a = 5, b = 2, and c = 3) to gain path coverage. At this point you may be wondering “Why do we have all of these levels of test coverage?” and “How do I pick the right level of coverage for what I am doing?” Software engineers define and discuss multiple levels of test coverage because the higher levels of coverage give us more confidence in the correctness of our code but at the cost of more test cases. While our example only requires two, three, and four test cases for the three levels of coverage we discussed, it is a simple (and contrived) piece of code. The number of paths through the control flow graph is exponential in the number of conditional choices—if we have six if/else statements one after the other, then there are 64 possible paths through the control flow graph. If we increase the number of if/else statements from six to 16, then the number of paths grows to 65536—quite a lot! So how do you pick the right level of test coverage? As with many things, the answer is “it depends.”1 The first aspect of this decision is “how confident do you need to be in your code?” If you are doing preliminary testing of

your algorithm by hand in Step 4 (and will test the implemented algorithm once you have translated it to code), then statement coverage is a reasonable choice. Here, testing more cases is quite time consuming (you are doing them by hand), and you will do more test cases later (once it is translated to code and compiled, where you can have the computer run the tests). By contrast, if you are testing a piece of critical software that will be deployed when you finish testing, only achieving statement coverage would be woefully insufficient. Here, you would likely want to aim for more than just the minimum to achieve path coverage. Unfortunately, no amount of testing can ensure the code is correct—it can only increase our confidence that it is. If you require absolute certainty that the code is correct, you must formally prove it (which is beyond the scope of this book). 6.1.3 Generating Tests One difficulty with testing arises from the fact that you want to test the cases you did not think of—but if you do not think of these cases, then how do you know to write tests for them? One approach to such a problem is to generate the test cases according to some algorithm. If the function we are testing takes a single integer as input, we might iterate over some large range (say -1,000,000 to 1,000,000) and use each integer in that range as a test case. Another possibility is to generate the test cases (pseudo-)randomly (called, unsurprisingly, random testing). Note that pseudo-random2 means the numbers look random (no “obvious” pattern) to a human but are generated by an algorithm that will produce the same

Pages:

Willington Island

All of Programming [ PART I ]

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

All of Programming [ PART I ]

Read the Text Version

Willington Island

TOP SEARCH

RELATED PUBLICATIONS