Home Explore Supercharged Python: Take Your Code to the Next Level [ PART II ]

Supercharged Python: Take Your Code to the Next Level [ PART II ]

Published by Willington Island, 2021-08-29 03:20:50

Description: [ PART II ]

If you’re ready to write better Python code and use more advanced features, Advanced Python Programming was written for you. Brian Overland and John Bennett distill advanced topics down to their essentials, illustrating them with simple examples and practical exercises.

Building on Overland’s widely-praised approach in Python Without Fear, the authors start with short, simple examples designed for easy entry, and quickly ramp you up to creating useful utilities and games, and using Python to solve interesting puzzles. Everything you’ll need to know is patiently explained and clearly illustrated, and the authors illuminate the design decisions and tricks behind each language feature they cover. You’ll gain the in-depth understanding to successfully apply all these advanced features and techniques:

Coding for runtime efficiency
Lambda functions (and when to use them)
Managing versioning
Localization and Unicode
Regular expressions
Binary operators

Read the Text Version

Pages:

import numpy as np np.info(np.sin) np.info(np.cos) np.info(np.power) Most of the functions listed here are designed to operate on a single numpy array. A few functions have variations. The numpy power function takes at least two arguments: X and Y. Either or both can be an array; but if they are both arrays, they must have the same shape. The effect of the function is to raise each X value to a power specified by the corresponding element of Y. For example, the following statements raise each of the elements in array A to the power of 2 (that is, to square each element). >>> import numpy as np >>> A = np.arange(6) >>> print(A) [0 1 2 3 4 5] >>> print(np.power(A, 2)) [ 0 1 4 9 16 25] Other functions are often used in conjunction with the numpy linspace function, which in turn is heavily used in plotting equations, as you’ll see in Section 13.3, “Plotting Lines with ‘numpy’ and ‘matplotlib.’” For example, the following statements combine the linspace function with the sin function and the constant, pi, to get a series of 10 values that reflects the value of the sine function as its inputs increase from 0 to pi and then decrease back to 0. Click here to view code image >>> A = np.linspace(0, np.pi, 10) >>> B = np.sin(A, dtype='float16') >>> print(B) [0.000e+00 3.420e-01 6.431e-01 8.657e-01 9.849e-01 9.849e-01 8.662e-01 6.431e-01 3.416e-01 9.675e-04]

In this example, the data type float16 was chosen so as to make the numbers easier to print. But a still better way to do that is to use some of the formatting techniques from Chapter 5. Now the results are easier to interpret. Click here to view code image >>> B = np.sin(A) >>> print(' '.join(format(x, '5.3f') for x in B)) 0.000 0.342 0.643 0.866 0.985 0.985 0.866 0.643 0.342 0.000 This small data sample demonstrates the behavior of the sine function. The sine of 0 is 0, but as the inputs increase toward pi/2, the results approach 1.0; then they approach 0 again as the inputs increase toward pi. 13.2 DOWNLOADING “MATPLOTLIB” We can use all this data to plot graphs in Python with the help of numpy and another package: matplotlib, which has to be downloaded and imported. The first step in downloading a package is to bring up the DOS Box (Windows) or Terminal application (Macintosh). As explained in Chapter 12, every Python download should come with the pip utility or with pip3. Use the install command of the pip utility. Assuming you’re connected to the Internet, the following command directs the utility to download the matplotlib software. pip install matplotlib

If you use pip at the command line and it is not recognized, try using pip3 (which is the name of the utility when Python 3 is installed on a Macintosh system). pip3 install matplotlib Note If neither the pip nor pip3 command worked, check how you spelled matplotlib. The spelling is tricky. To see the range of commands possible, type pip help 13.3 PLOTTING LINES WITH “NUMPY” AND “MATPLOTLIB” Now you’re ready to have some fun. After matplotlib is downloaded, import it. Click here to view code image import numpy as np import matplotlib.pyplot as plt You don’t have to give matplotlib.pyplot the short name plt, but plt is widely used and recognized. The full name, matplotlib.pyplot, is cumbersome, and most programmers use plt by convention. Two of the major functions used for plotting include plt.plot and plt.show. The syntax shown here is a simplified view; we’ll show more of it later.

plt.plot( [X,] Y [,format_str] ) plt.show() In this syntax, square brackets are not intended literally but indicate optional items. The simplest calls to plot usually involve two one- dimensional array arguments: X and Y. Where X is omitted, it’s assumed to be the array [0, 1, 2... N-1], where N is the length of Y. But most often, you’ll want to include both X and Y arguments. The action is to take pairs of points from X and Y, in which X and Y are the same length, and to plot them on a graph. Each value in X is matched to the corresponding value in Y to get an (x, y) point. All the points are then plotted and connected. Here’s a simple example, using the np.linspace and np.sin functions. Click here to view code image import numpy as np import matplotlib.pyplot as plt A = np.linspace(0, 2 * np.pi) plt.plot(A, np.sin(A)) plt.show() If you’ve downloaded both numpy and matplotlib, and if you enter this code as shown, your computer should display the window shown in Figure 13.1. The window remains visible until you close it.

Figure 13.1. A sine graph This is simple code, but let’s step through each part of it. The first thing the example does is import the needed packages. Click here to view code image import numpy as np import matplotlib.pyplot as plt Next, the example calls the numpy linspace function. This function, remember, generates a set of values, including the two specified endpoints, to get a total of N evenly spaced values. By default, N is 50. Click here to view code image A = np.linspace(0, 2 * np.pi) Therefore, array A contains floating-point numbers beginning with 0, ending in 2 * pi, and 48 other values

evenly spaced between these two values. The call to the plot function specifies two arrays: A, which contains 50 values along the X axis, and a second array, which contains the sine of each element in A. plt.plot(A, np.sin(A)) The function looks at each element in A and matches it with a corresponding value in the second array, to get 50 (x, y) pairs. Finally, the show function tells the software to display the resulting graph onscreen. We can, of course, just as easily graph the cosine function instead, by deleting the call to np.sin and replacing it with np.cos. Click here to view code image import numpy as np import matplotlib.pyplot as plt A = np.linspace(0, 2 * np.pi) plt.plot(A, np.cos(A)) plt.show() In this version, each value in A is matched with its cosine value to create an (x, y) pair. Figure 13.2 shows the resulting graph.

Figure 13.2. A cosine graph But you aren’t limited to trigonometric functions. The flexibility of numpy arrays—particularly how they can be operated on as a unit—makes the plotting power of matplotlib both simple and versatile. For example, what if you wanted to plot a graph of the reciprocal function—that is, 1/N? If N is 5, its reciprocal is 1/5, and vice versa. We start by creating a range of values for X. This is why the np.linspace function is so useful; it creates a source of values (as many as you want) from the desired domain. Often these values will be monotonically increasing along the X axis. Starting with the value 0 would be a problem, because then 1/N would cause division by 0. Instead, let’s start with the value 0.1 and run to values as high as 10. By default, 50 values are generated. A = np.linspace(0.1, 10) Now it’s easy to plot and show the results by using A and 1/A to provide values for the (x, y) pairs. Each value in A gets

matched with its reciprocal. plt.plot(A, 1/A) plt.show() Figure 13.3 shows the results. Figure 13.3. Plotting the reciprocal function The function creates points by combining values from A and 1/A. So, for example, the first (x, y) pair is (0.1, 10.0) The second point is formed in the same way, combining the next value in A with its reciprocal in the second set. Here are some points that could be plotted. Click here to view code image

(0.1, 10.0), (0.2, 5.0), (0.3, 3.3)... A less interesting, but illustrative, example is to plot a handful of points and connect them. Let’s specify five such values: (0, 1) (1, 2) (2, 4) (3, 5) (4, 3) The plot for this graph would be created by the following statement: Click here to view code image plt.plot([0, 1, 2, 3, 4], [1, 2, 4, 5, 3]) If the X argument is omitted, the default is [0, 1, 2, ... N–1], where N is the length of the Y array. Therefore, this example could be written as plt.plot([1, 2, 4, 5, 3]) In either case, calling the show function puts the graph on screen, as illustrated by Figure 13.4.

Figure 13.4. A primitive graph of five points Note that you don’t necessarily have to use ascending values. You can use any points to create arbitrary lines. Here’s an example: Click here to view code image plt.plot([3, 4, 1, 5, 2, 3], [4, 1, 3, 3, 1, 4]) The points to be plotted would be Click here to view code image (3, 4), (4, 1), (1, 3), (5, 3), (2, 1), (3, 4) Those points, in turn, form a pentagram (more or less), as shown in Figure 13.5. All the points are plotted, and then line segments are drawn between one point and the next.

Figure 13.5. Plot of a pentagram The final example in this section shows that you can graph formulas as complex as you like. This is the beauty of being able to operate directly on numpy arrays. It’s easy to graph complex polynomials. Here’s an example: Click here to view code image import numpy as np import matplotlib.pyplot as plt A = np.linspace(-15, 20) plt.plot(A, A ** 3 - (15 * A ** 2) + 25) plt.show() These statements, when run, graph a polynomial as shown in Figure 13.6.

Figure 13.6. Plotting a polynomial 13.4 PLOTTING MORE THAN ONE LINE What if you’d like to show more complex graphs, such as a graph that contains both a sine function and a cosine function so you can compare them? This is easy. You can, for example, make two calls to the plot function before showing the results. Click here to view code image import numpy as np import matplotlib.pyplot as plt A = np.linspace(0, 2 * np.pi) plt.plot(A, np.sin(A))

plt.plot(A, np.cos(A)) plt.show() Alternatively, two plot statements could have been combined into one by placing four arguments in a single statement. Click here to view code image plt.plot(A, np.sin(A), A, np.cos(A)) In either case, Python responds by displaying the graph shown in Figure 13.7. Figure 13.7. Sine and cosine plotted together The matplotlib package automatically plots curves of two different colors: orange and blue. These show up nicely on a

computer screen, but in a black-and-white printout, you may not be able to see the difference. Fortunately, the plotting software provides other ways of differentiating curves through the formatting arguments, which we look at next. The more complete syntax for the plot function is shown here. Click here to view code image plt.plot( X1, Y1, [fmt1,] X2, Y2, [fmt2,] ... ) What this syntax indicates is that you can include any number of X, Y pairs of arrays. From each such pair, a set of (x, y) points is created by combining pairs of corresponding elements. You can then add optional format arguments. The fmt arguments are useful for differentiating lines by giving them contrasting colors and styles. For example, the following plot statement causes the plot for the cosine function to consist of small circles. Click here to view code image plt.plot(A, np.sin(A), A, np.cos(A), 'o') plt.show() The sine curve has no format specifier, so it takes on a default appearance. The resulting graph differentiates between the two curves nicely, as you see in Figure 13.8. By default, different colors are assigned to the two lines.

Figure 13.8. Sine and cosine differentiated This formatting creates a dramatic contrast. But we can create even more of a contrast by specifying a style for the sine curve. The ^ format symbol specifies that the curve will be made up of tiny triangles. While we’re at it, let’s bring in another plotting function, title. This simple function is used to specify a title for the graph before calling the show function. The xlabel and ylabel functions specify labels for the axes. Click here to view code image plt.plot(A, np.sin(A), '^', A, np.cos(A), 'o') plt.title('Sine and Cosine') plt.xlabel('X Axis') plt.ylabel('Y Axis') plt.show()

Figure 13.9 shows the resulting graph, complete with title and axis labels. Figure 13.9. Sine and cosine with axis labels If you get help on the plt.plot function, it provides you a reference to all the formatting characters. These characters can be combined in strings such as 'og', meaning “Use small green circles for this line.” The characters used to specify colors are listed in Table 13.2. Table 13.2. Color Characters in Plot Format Strings

Character Color b Blue g Green r Red c Cyan m Magenta y Yellow k Black w White The characters that specify shapes are shown in Table 13.3. Table 13.3. Shape Characters in Plotting Format Strings Character Shape . Points , Pixels

o Circles v Down arrow ^ Up arrow < Left-pointing arrow > Right-pointing arrow s Squares p Pentagons * Stars h, H Hexagons (small and large, respectively) + Plus signs d, D Diamonds (light and heavy, respectively) Remember that you can specify any combination of color and shape. Here’s an example: Click here to view code image 'b^' # Use blue triangles. Note

Yet another technique for differentiating between lines is to use labels, along with a legend, to show how the information corresponds to lines of particular color and/or style. This technique is explained in Chapter 15, “Getting Financial Data off the Internet.” 13.5 PLOTTING COMPOUND INTEREST Imagine the following scenario. You’re going to be given a trust fund in which you have your choice of plans. Plan A would add two dollars every year. But you can’t touch the money until you cash out. At that point, you get all the money, and the fund terminates. The two dollars is the only source of increase. Alternatively, under plan B, the same conditions apply, except that instead of two dollars being added to the fund every year, plan B starts with one dollar and then it increases by 10 percent a year. It would seem the choice is easy. One fund increases by two dollars a year, while the other increases—at least in the beginning—by only 10 cents. Obviously, plan A is better, isn’t it? It’s 20 times as good! But plan A grows at a constant rate, while plan B compounds. Every good mathematician, as well as accountant, should know the following. Exponential growth (such as compound interest), however slow, must eventually overtake linear growth, however fast. This is an amazing fact, especially when you consider that it implies that compound growth of .001% on a single dollar must eventually overtake a steady income of a million a year! This is quite true, by the way, but it would take lifetimes for the compounding fund to overtake the million-dollar fund.

This dynamic is easy to show with a graph. Start by creating a numpy array, A, representing values along an axis of time. Set it to measure the passing of 60 years. Click here to view code image A = np.linspace(0, 60, 60) Then we plot a linear-growth function of $2 a year versus a compound-growth function using 10 percent a year—which is mathematically equivalent of raising the number 1.1 to a power, N, where N is the number of years. Click here to view code image 2*A # Formula for increase of $2 a year 1.1 ** A # Formula for growth of 10% a year We’ll use a format string to specify that the first curve is made of little circles, for the sake of contrast (“o”). Click here to view code image plt.plot(A, 2 * A, 'o', A, 1.1 ** A) Alternatively, the two curves could be created by separate statements— with additional spaces inserted here for clarity’s sake. Click here to view code image plt.plot(A, 2 * A, 'o') # +$2 a year (with circles) 1.1 ** A) # Compound 10% a year plt.plot(A, Next, let’s specify some useful labels and finally show the graph. Click here to view code image

plt.title('Compounded Interest v. Linear') plt.xlabel('Years') plt.ylabel('Value of Funds') plt.show() Figure 13.10 displays the results. For the first 30 years, the linear fund ($2 a year) outpaces the compound fund. However, between years 30 and 50, the accelerating growth of plan B becomes noticeable. Plan B finally equals and surpasses the linear fund shortly before year 50. Figure 13.10. Compound growth plotted against linear So if you have 50 years to wait, plan B is the better choice. Eventually, plan B greatly outperforms plan A if you can wait long enough. The compound growth will reach thousands, even millions, of dollars long before the other plan does.

Note The labels along the X axis start in year 0 and run to year 60. Section 13.12, “Adjusting Axes with ‘xticks’ and ‘yticks’,” shows how these years could be relabeled—for example, by starting in the year 2020 and running to the year 2080 —without changing any of the underlying data or calculations. 13.6 CREATING HISTOGRAMS WITH “MATPLOTLIB” Histograms provide an alternative way of looking at data. Instead of presenting individual data points and connecting them, a histogram shows how frequently the results fall into subranges. The data is collected into buckets, or bins, each bin representing a range. We did that manually for some of the problems in Chapter 11, “The Random and Math Packages,” but the numpy and matlibplot packages do this work automatically. Let’s start with a simple example. Suppose you have a list containing the IQ of each person on your software development team. You’d like to see which scores tend to be the most common. Click here to view code image IQ_list = [91, 110, 105, 107, 135, 127, 92, 111, 105, 106, 130, 145, 145, 128, 109, 108, 98, 129, 100, 108, 114, 119, 99, 137, 142, 145, 112, 113 ] It’s easy to convert this Python list into a numpy array. First, however, we’ll make sure we’ve imported the necessary packages. Click here to view code image

import numpy as np import matplotlib.pyplot as plt IQ_A = np.array(IQ_list) Graphing this data as a histogram is the easiest step of all, because it requires only one argument. The hist function produces this chart. Then, as usual, the show function must be called to actually put the results onscreen. plt.hist(IQ_A) plt.show() Wow, that was easy! There are some additional arguments, but they’re optional. One of the main reasons for providing the show function, by the way, is so that the graph can be tweaked in various ways before being displayed onscreen. The following example creates the histogram, gives it a title, and finally shows it. Click here to view code image plt.hist(IQ_A) plt.title('IQ Distribution of Development Team.') plt.show() Figure 13.11 displays the resulting graph.

Figure 13.11. IQ scores shown in a histogram This graph reveals a good deal of information. It shows that the frequency of IQ scores increases until 110, at which point it drops off. There’s a bulge again around 140. A more complete syntax is shown here. Click here to view code image plt.hist(A [, bins=10] [, keyword_args] )

The first argument, A, refers to a numpy array and is the only required argument. The bins argument determines the number of subranges, or bins. The default setting is 10, which means that the function determines the difference between the highest and lowest values and divides by 10 to get the size of each subrange. But you can specify settings other than 10. Click here to view code image plt.hist(A, bins=50) # Place results into 50 bins. Other keyword arguments accepted by this function include color, which takes a string containing one of the characters shown in Table 13.2; align, which takes one of the values 'left', 'right', or 'mid'; and cumulative, which is a Boolean that indicates cumulative results are to be graphed. For more information, use help. >>> help(plt.hist) Another use of histograms is to graph a normal distribution curve. We showed how to do this in Chapter 11, but the approach here gives even better results. We start by using the numpy randomization package to generate 200,000 data points in a normal distribution. This distribution has a mean of 0.0 and a standard deviation of 1.0. But by adding and multiplying, we can convert to an array of values with a mean of 100 and a standard deviation of 10. Click here to view code image A = np.random.standard_normal(200000) A = A * 10 + 100 In graphing these results, we can rely on the default setting of 10 bins, but results are more satisfying if graphed into even

smaller subranges. So we specify 80 bins. Let’s also set the color to green while we’re at it. Click here to view code image plt.hist(A, bins=80, color='g') plt.title('Normal Distribution in 80 Bins') plt.show() The result is an appealing-looking normal-distribution graph, shown in Figure 13.12. If you can see it in green (as it is displayed on a computer screen), then it looks a little bit like a Christmas tree. Happy holidays! Figure 13.12. Histogram of a normal distribution

You might wonder: Can you present this data as a completely smooth line rather than as a series of bars? You can. numpy provides a histogram function that enables you to generate frequency numbers for a series of subranges (bins). The general syntax, shown here, displays the two most important arguments. Click here to view code image plt.histogram(A [, bins=10] [, keyword_args] ) The action of the function is to produce a new array; this array contains the result of the histogram. Each element in this new array is a frequency number corresponding to one of the bins. By default, the number of bins is 10. So for the resulting array: The first element contains the number of values from A that occur in the first bin (the first subrange). The second element contains the number of values from A that occur in the second bin (the second subrange). And so on. The value returned is actually a tuple. The first element of this tuple contains the frequency numbers we want to plot. The second element contains the exact edges of the bin. So, to get the data we need, use the following syntax:

plt.histogram(A, bins)[0] We can now generate a smooth normal-distribution curve. Generate a large set of random numbers, and place the results in 50 bins. (You can pick another number of bins, but using between 50 and 100 tends to produce good results.) Finally, plot the frequency numbers of those bins. This example employs 2 million trials, but there still should be almost no noticeable delay, which is amazing. Click here to view code image import numpy as np import matplotlib.pyplot as plt A = np.random.standard_normal(2000000) A = A * 10 + 100 B = np.histogram(A, 50)[0] plt.plot(B) plt.show() This code specifies no argument for the “X-axis” array; instead, it’s handled by default. The plotting software uses 0, 1, 2 . . . N–1 for the X coordinates, where N is the length of the B array, which contains the result of the histogram function. Figure 13.13 shows the results produced by this example.

Figure 13.13. Smooth drawing of a normal distribution curve The resulting figure is a smooth, pleasing curve. The X axis, however, may not be what you expect. The numbers along the X axis show the bin numbers. The range consists of the lowest random number generated to the highest; this range is then divided into 50 parts. The Y axis shows the frequency of hits in each bin. But instead of placing the bin numbers along the X axis, it’s more useful to display values from the distribution itself. The simplest correct way to do that is to use “X-axis” values that represent the median points of each bin. The next example does that by using the second array returned by np.histogram. That array contains the edges of the bins—that is, the lowest value of the subrange represented by each bin.

That may sound complicated, but it’s only a matter of adding a couple of lines of code. In this case, X represents the edges of the bins—and then X is modified to contain the median point of those bins. In this way, the frequencies get plotted against values in the distribution (centered at 100) rather than bin numbers. Click here to view code image import numpy as np import matplotlib.pyplot as plt A = np.random.standard_normal(2000000) A = A * 10 + 100 B, X = np.histogram(A, 50) X = (X[1:]+X[:-1])/2 # Use bin centers rather than edges. plt.plot(X, B) # Plot against values rather than plt.show() # bin numbers. The X values are calculated by getting the median value of each subrange— by taking the bottom and top edges (which are one off in position) and averaging them. The expression X[1:] shifts one position, because it starts with the second element. The expression X[:-1] excludes the last element to make the lengths equal. Click here to view code image X = (X[1:]+X[:-1])/2 # Use bin centers rather than edges If you look at the revised plot of the histogram (Figure 13.14), you can see it plots values centered at 100 with a standard deviation of 10. A standard deviation of 10 means that roughly 95 percent of the area of the curve should fall within two deviations (80 to 120), and more than 99 percent of the area should fall within three standard deviations (70 to 130).

Figure 13.14. Normal distribution curve centered at 100 The histogram function has other uses. For example, you can use it to replace some of the code in Chapter 11, demonstrating the Law of Large Numbers. The example in that section collected data in a series of bins. The histogram function does the same thing as the code in that section but does it many times as fast. Performance Tip You should observe through this chapter and Chapter 12 that many numpy functions echo actions that can be performed in Chapter 11, “The Random and Math Packages,” by importing random and math. But the numpy versions, especially with large data sets (such as the 2,000,000 random numbers generated for the most recent examples) will be many times as fast. So when you have a choice, prefer to use numpy, including its random sub- package, for large numeric operations.

Some of the other arguments are occasionally useful. To learn about all of them, you can get help from within IDLE. >>> np.info(np.histogram) 13.7 CIRCLES AND THE ASPECT RATIO Sometimes you’ll want to adjust the relative size of the X and Y axes; this is especially true when you draw a geometric shape. In this section, we’ll show how to draw a circle. Usually, you don’t want it to look like an ellipse, so we’ll adjust the aspect ratio between X and Y to be equal before showing the graph. There’s more than one way to draw a circle; the other way is left as an exercise for the end of the chapter. In this section, we’ll use an approach utilizing trig functions. Imagine a little bug traveling around the outside of a circle, starting at 0 degrees at the coordinate (1,0), as shown in Figure 13.15. The bug continues until it’s made a complete trip.

Figure 13.15. “Bug on a circle” about 42 degrees Each point on the circle corresponds to an angle, which we call theta. For example, the point on the circle that is 90 degrees counterclockwise from the starting point has a value of 90 degrees—or rather, the equivalent in radians (pi / 2). Figure 13.15 shows the bug having traveled about 42 degrees (roughly equal to 0.733 radians). At each point on the circle, the X coordinate of the bug’s position is given by cosine(theta) Likewise, the Y coordinate of the bug’s position is given by sine(theta)

By tracking the bug’s journey, we get a set of points corresponding to a trip around the circle. Each point on this journey corresponds to the following (x, y) coordinates: Click here to view code image (cosine(theta), sine(theta)) Therefore, to graph a complete circle, we get a set of points corresponding to many angles on this imaginary trip, from 0 to 2 * pi (equal to 360 degrees). Then we graph the resulting (x, y) pairs. And we’ll get 1,000 data points to get a nice, smooth curve. Click here to view code image import numpy as np import matplotlib.pyplot as plt theta = np.linspace(0, 2 * np.pi, 1000) plt.plot(np.cos(theta), np.sin(theta)) plt.show() If you run these statements, they should draw a circle, but the result looks more like an ellipse. The solution is to specify a plt.axis setting that forces X and Y units to be equally spaced on the screen. plt.axis('equal') Now, let’s plug this statement into the complete application, including the import statements. Click here to view code image import numpy as np import matplotlib.pyplot as plt theta = np.linspace(0, 2 * np.pi, 1000) plt.plot(np.cos(theta), np.sin(theta))

plt.axis('equal') plt.show() Now the code produces the perfect circle onscreen, as shown in Figure 13.16. Figure 13.16. A perfect circle 13.8 CREATING PIE CHARTS The versatility of the numpy and matplotlib packages extends even to pie charts. This is an effective way of illustrating the relative size of several pieces of data. The syntax for the plt.pie function is shown here. As with other plotting functions, there are other arguments, but only the most important are shown here.

Click here to view code image plt.pie(array_data, labels=None, colors=None) The first argument, array_data, is a collection containing a relative size for each category. The labels argument is a collection of strings that label the corresponding groups referred to in the first argument. The colors argument is a collection of strings specifying color, using the values listed earlier in Table 13.2. And all the collections must have the same length. This is a simple function to use once you see an example. Suppose you have data on the off-hours activities of your development team, and you want to see a chart. Table 13.4 summarizes the data to be charted in this example. Table 13.4. Weekly Activity of Development Team Activity Hours per week (average) Color Poker 3.7 black ('k') Chess 2.5 green ('g') Comic books 1.9 red ('r') Exercise 0.5 cyan ('c')

It’s an easy matter to place each column of data into its own list. Each list has exactly four members in this case. Click here to view code image A_data = [3.7, 2.5, 1.9, 0.5] A_labels = ['Poker', 'Chess', 'Comic Books', 'Exercise'] A_colors = ['k', 'g', 'r', 'c'] Now we plug these figures in to a pie-chart plot, add a title, and display. The aspect ratio of the pie chart can be fixed using a plt.axis('equal') statement, just as we did for the circle; otherwise, the pie will appear as an ellipse rather than a circle. Click here to view code image import numpy as np import matplotlib.pyplot as plt plt.pie(A_data, labels=A_labels, colors=A_colors) plt.title('Relative Hobbies of Dev. Team') plt.axis('equal') plt.show() Figure 13.17 shows the resulting pie chart.

Figure 13.17. A pie chart 13.9 DOING LINEAR ALGEBRA WITH “NUMPY” Before wrapping up these two chapters on the numpy package, we’ll take a look at one of the most useful areas, at least to a mathematician or engineer: linear algebra, which often involves vectors (arrays) and matrixes (multidimensional arrays). It isn’t necessary to use separate “vector” or “matrix” collection types with numpy. You don’t even need to download

or import new subpackages. You just apply the appropriate functions. 13.9.1 The Dot Product As these last two chapters have shown, you can multiply arrays with scalars, and arrays with arrays. The requirement in the second case is that two arrays multiplied together must have the same shape. We can summarize this relationship as follows: (A, B) * (A, B) => (A, B) You can multiply an array of shape A, B to another array of shape A, B and get a third array of the same shape. But with numpy, you can also multiply arrays together by using the dot- product function, dot, but it has slightly more complex rules. numpy.dot(A, B, out=None) A and B are two arrays to be combined to form a dot product; the out argument, if specified, is an array of the correct shape in which to store the results. The “correct shape” depends on the size of A and B, as explained here. The dot product of two one-dimensional arrays is simple. The two arrays must have the same length. The action is to multiply each element in A to its corresponding element in B, and then sum those products, producing a single scalar value. Click here to view code image D. P. = A[0]*B[0] + A[1]*B[1] + ... + A[N-1] * B[N- 1]

Here’s an example: Click here to view code image >>> import numpy as np >>> A = np.ones(5) >>> B = np.arange(5) >>> print(A, B) [1. 1. 1. 1. 1.] [0 1 2 3 4] >>> np.dot(A, A) 5.0 >>> np.dot(A, B) 10.0 >>> np.dot(B, B) 30 You should be able to see that the dot product of B with B is equal to 30, because that product is equal to the sum of the squares of its members: Click here to view code image D. P. = 0*0 + 1*1 + 2*2 + 3*3 + 4*4 = 0 + 1 + 4 + 9 + 16 = 30 We can generalize this: D. P.(A, A) = sum(A * A) The dot product between a couple of two-dimensional arrays is more complex. As with ordinary multiplication between arrays, the shapes must be compatible. However, they need only match in one of their dimensions. Here is the general pattern that describes how a dot product works with two- dimensional arrays: (A, B) * (B, C) => (A, C)

Consider the following 2 × 3 array, combined with a 3 × 2 array, whose dot product is a 2 × 2 array. Click here to view code image A = np.arange(6).reshape(2,3) B = np.arange(6).reshape(3,2) C = np.dot(A, B) print(A, B, sep='\\n\\n') print('\\nDot product:\\n', C) [[0 1 2] [3 4 5]] [[0 1] [2 3] [4 5]] Dot product: [[10 13] [28 40]] Here’s the procedure. Multiply each item in the first row of A by each corresponding item in the first column of B. Get the sum (10). This becomes C[0,0]. Multiply each item in the first row of A by each corresponding item in the second column of B. Get the sum (13). This becomes C[0,1]. Multiply each item in the second row of A by each corresponding item in the first column of B. Get the sum (28). This becomes C[1,0]. Multiply each item in the second row of A by each corresponding item in the second column of B. Get the sum (40). This becomes C[1,1]. You can also take the dot product of a one-dimensional array combined with a two-dimensional array. The result is that the array shapes are evaluated as if they had the following shapes: (1, X) * (X, Y) => (1, Y)

For example, you could take the dot product of [10, 15, 30] and the following array, which we’ll call B: [[0 1] [2 3] [4 5]] This next statement shows the dot product between a one- dimensional array and the two-dimensional array, B. The resulting dot product has a shape of (1, 2): Click here to view code image >>> print(np.dot([10, 15, 30], B)) [150, 205] Can we come up with intuitive, real-world examples that show the usefulness of a dot product? They abound in certain kinds of math and physics, such as three-dimensional geometry. But there are simpler applications. Let’s say you own a pet shop that sells three kinds of exotic birds. Table 13.5 shows the prices. Table 13.5. Prices for Birds in Pet Shop Parrots Macaws Peacocks $10 $15 $30 Let’s further suppose that you have tracked sales figures for two months, as shown in Table 13.6. Table 13.6. Monthly Sales Figures for Birds

Birds October sales November sales Parrots 0 1 Macaws 2 3 Peacocks 4 5 What you’d like to do is get the total bird sales for these two months. Although it’s not difficult to pick out the data, it’s easier to take the dot product and let the function np.dot do all the math for you. Figure 13.18 shows how to obtain the first element in the result: 150. Multiply each of the sales figures by the corresponding sales figure for the first month. Figure 13.18. How a dot product is calculated, part I

Following this procedure, you get 0 + 30 + 120, totaling 150. You can obtain the other figure, 205, in the same way (see Figure 13.19). Multiply each of the prices by the corresponding sales figure in the second month, and then total. Following that procedure, you get 10 + 45 + 150, totaling 205. Figure 13.19. How a dot product is calculated, part II The full dot product is therefore [150, 205] In this case, the dot product gives the total sales figures for all birds, in each of the two months tracked (October and November). 13.9.2 The Outer-Product Function Another way of multiplying arrays is to use the outer function to calculate the outer product. This is most often used between two one-dimensional arrays to produce a two-dimensional

array. If this function is used on higher-dimensional arrays, the input in each of those arrays is flattened into one dimension. Click here to view code image numpy.outer(A, B, out=None) The action of the function is to calculate the outer product of arrays A and B and return it. The out argument, if included, specifies a destination array in which to place the results. It must already exist and be of the proper size. To obtain the outer product, multiply each element of A by each element of B, in turn, to produce a two-dimensional array. In terms of shape, here’s how we’d express the relationship: (A) * (B) => (A, B) Simply put, the outer product contains every combination of A * B, so that if C is the result, then C[x, y] contains A[x] multiplied by B[y]. Here’s a relatively simple example: Click here to view code image >>> import numpy as np >>> A = np.array([0, 1, 2]) >>> B = np.array([100, 200, 300, 400]) >>> print(np.outer(A, B)) [[ 0 0 0 0] [100 200 300 400] [200 400 600 800]] In this example, the first element of A is multiplied by each element of B to produce the first row of the result; that’s why

every number in that row is 0, because 0 multiplied by any value is 0. The second element of A (which is 1) is multiplied by each element of B to produce the second row, and so on for the third row. One obvious use for the outer product is a problem we solved in Chapter 11, “The Random and Math Packages”: how to create a multiplication table. The numpy package supports an even simpler solution, and one that is faster in any case. >>> A = np.arange(1,10) >>> print(np.outer(A, A)) Wow, that’s pretty simple code! The result is [[ 1 2 3 4 5 6 7 8 9] [ 2 4 6 8 10 12 14 16 18] [ 3 6 9 12 15 18 21 24 27] [ 4 8 12 16 20 24 28 32 36] [ 5 10 15 20 25 30 35 40 45] [ 6 12 18 24 30 36 42 48 54] [ 7 14 21 28 35 42 49 56 63] [ 8 16 24 32 40 48 56 64 72] [ 9 18 27 36 45 54 63 72 81]] As in Chapter 11, we can use some string operations to clean up the result, eliminating the square brackets. s = str(np.outer(A, A)) s = s.replace('[', '') s = s.replace(']', '') print(' ' + s) We can, if we choose, combine these four statements into two, more compact, statements. Click here to view code image s = str(np.outer(A, A)) print(' ' + s.replace('[', '').replace(']', ''))

Finally, this produces 123456789 2 4 6 8 10 12 14 16 18 3 6 9 12 15 18 21 24 27 4 8 12 16 20 24 28 32 36 5 10 15 20 25 30 35 40 45 6 12 18 24 30 36 42 48 54 7 14 21 28 35 42 49 56 63 8 16 24 32 40 48 56 64 72 9 18 27 36 45 54 63 72 81 13.9.3 Other Linear Algebra Functions In addition to dot product and outer product, numpy provides other linear-algebra functions. Remember, they require no separate “matrix” type. The standard numpy array type, ndarray, can be used with any of these functions. But the list of the functions related to linear algebra is a long one and requires an entire book of its own. For more explanation, see the official online documentation at https://docs.scipy.org/doc/numpy/reference/routin es.linalg.html Table 13.7 summarizes some of the more common linear and higher-math functions supported by numpy. Table 13.7. Common Linear-Algebra Functions Syntax Description np.dot(A, Compute the dot product between A and B. B [,out]) np.vdot(A Compute the vector dot product. , B)

np.outer( Compute the outer product formed by multiplying each A, B element in A by each element in B. Flatten A and B into [,out]) one-dimensional inputs as needed. np.inner( Compute the inner product of A and B. A, B [,out]) np.tensor Compute the tensor dot product of A and B. dot(A, B [,out]) np.kron(A Compute the Kronecker product of A and B. , B) np.linalg Compute the linear-algebra determinant of A. .det(A) 13.10 THREE-DIMENSIONAL PLOTTING The subject of three-dimensional plotting is advanced, and to fully present the subject here would take a long book by itself! However, by looking at the plotting of the surface of a sphere, you can get an idea how numpy functions you’ve already seen help you create three-dimensional surfaces. The following example requires the importing of packages you’re already familiar with, but it also includes the mpl_toolkits package. Fortunately, if you’ve downloaded the

other packages, this package has been downloaded for you, so all you have to do is import it. Click here to view code image from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import numpy as np fig = plt.figure() ax = fig.add_subplot(111, projection='3d') # Make data ua = np.linspace(0, 2 * np.pi, 100) va = np.linspace(0, np.pi, 100) X = 10 * np.outer(np.cos(ua), np.sin(va)) Y = 10 * np.outer(np.sin(ua), np.sin(va)) Z = 10 * np.outer(np.ones(np.size(ua)), np.cos(va)) # Plot the surface ax.plot_surface(X, Y, Z, color='w') plt.show() Most of the calculation here involves getting the sine and cosine of angles as they run from 0 to 2 * np.pi and then multiplying the results by taking outer products. Finally, a set of three-dimensional points are described by the three arrays X, Y, and Z, and the software graphs the surface of the sphere from that. Figure 13.20 shows the resulting graph.

Figure 13.20. Three-dimensional projection of a sphere 13.11 “NUMPY” FINANCIAL APPLICATIONS The numpy package’s powerful range of functions extends to the area of finance. For example, given data about interest rates and payment schedules, you can use the pmt function to determine your monthly payment for a house or car. Click here to view code image

Pages:

Willington Island

Supercharged Python: Take Your Code to the Next Level [ PART II ]

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

Supercharged Python: Take Your Code to the Next Level [ PART II ]

Read the Text Version

Willington Island

TOP SEARCH

RELATED PUBLICATIONS