Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore CU-MCA-Python Programming- Second Draft-converted

CU-MCA-Python Programming- Second Draft-converted

Published by Teamlease Edtech Ltd (Amita Chitroda), 2021-05-02 15:59:34

Description: CU-MCA-Python Programming- Second Draft-converted

Search

Read the Text Version

5. A random variable that assumes a finite or a countably infinite number of values is called ___________ a. Continuous random variable b. Discrete random variable c. Irregular random variable d. Uncertain random variable 6. In a discrete probability distribution, the sum of all probabilities is always? a. 0 b. Infinite c. 1 d. Undefined 7. The covariance of two independent random variable is ___________ a. 1 b. 0 c. -1 d. Undefined 8. What would be the probability of an event ‘G’ if H denotes its complement, according to the axioms of probability? a. P (G) = 1 / P (H) b. P (G) = 1 – P (H) c. P (G) = 1 + P (H) d. P (G) = P (H) 9. The expected value of a discrete random variable ‘x’ is given by ___________ 251 a. P(x) b. ∑ P(x) c. ∑ x P(x) d. 1 Answers CU IDOL SELF LEARNING MATERIAL (SLM)

1 -a, 2 -a, 3 –b, 4 –a, 5 –a, 6 –c, 7 -b, 8 –b, 9 –c 10.10 REFERENCES Text Books: • Allen B. Downey, “Think Python: How to Think like a Computer Scientist”, 2nd edition, Updated for Python 3, Shroff/O ‘Reilly Publishers, 2016 • Michael Urban, Joel Murach, Mike Murach: Murach's Python Programming; Dec, 2016 Reference Books: • Guido van Rossum and Fred L. Drake Jr, An Introduction to Python – Revised and updated for Python 3.2, • Jake Vander Plas, “Python Data Science Handbook”, O ‘Reilly Publishers, 2016. 252 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT - 11: QUANTITATIVE EXPLORATORY DATA ANALYSIS (EDA) Structure 11.0. Learning Objectives 11.1. Introduction 11.2. Summary of Categorical Data 11.3. Summary of Continuous Data 11.4. Summary 11.5. Keywords 11.6. Learning Activity 11.7. Unit End Questions 11.8. References 11.0 LEARNING OBJECTIVES After studying this unit, you will be able to: • Describe about summarization in EDA • Summarize continuous and categorical data 11.1 INTRODUCTION Exploratory data analysis is one of the best practices used in data science today. While starting a career in Data Science, people generally don’t know the difference between Data analysis and exploratory data analysis. There is not a very big difference between the two, but both have different purposes. Exploratory data analysis is one of the best practices used in data science today. While starting a career in Data Science, people generally don’t know the difference between Data analysis and exploratory data analysis. There is not a very big difference between the two, but both have different purposes. A summary analysis is simply a numeric reduction of a historical data set. It is quite passive. Its focus is in the past. Quite commonly, its purpose is to simply arrive at a few key statistics (for example, mean and standard deviation) which may then either replace the data set or be added to the data set in the form of a summary table. 253 CU IDOL SELF LEARNING MATERIAL (SLM)

11.2 SUMMARY OF CATEGORICAL DATA Categorical variables do not admit any mathematical operations on them. We cannot sum them, or even sort them. We can only count them. As such, summaries of categorical variables will always start with the counting of the frequency of each category. Summary of Univariate Categorical Data # Make some data gender <- c(rep('Boy', 10), rep('Girl', 12)) drink <- c(rep('Coke', 5), rep('Sprite', 3), rep('Coffee', 6), rep('Tea', 7), rep('Water', 1)) age <- sample(c('Young', 'Old'), size = length(gender), replace = TRUE) # Count frequencies table(gender) ## gender Girl ## Boy 12 ## 10 table(drink) Coke Sprite Tea Water ## drink 5 3 71 ## Coffee ## 6 If instead of the level counts you want the proportions, you can use prop.table prop.table(table(gender)) ## gender ## Boy Girl ## 0.4545455 0.5454545 Summary of Bivariate Categorical Data library(magrittr) cbind(gender, drink) %>% head # bind vectors into matrix and inspect ## gender drink ## [1,] \"Boy\" \"Coke\" ## [2,] \"Boy\" \"Coke\" 254 CU IDOL SELF LEARNING MATERIAL (SLM)

## [3,] \"Boy\" \"Coke\" ## [4,] \"Boy\" \"Coke\" ## [5,] \"Boy\" \"Coke\" ## [6,] \"Boy\" \"Sprite\" table1 <- table(gender, drink) # count frequencies of bivariate combinations table1 ## drink ## gender Coffee Coke Sprite Tea Water ## Boy 2 5 3 00 ## Girl 4 0 0 71 Summary of Multivariate Categorical Data table2.1 <- table(gender, drink, age) # A machine readable table. table2.1 ## , , age = Old Coffee Coke Sprite Tea Water ## 1 21 00 ## drink 2 00 31 ## gender ## Boy Coffee Coke Sprite Tea Water ## Girl 1 3 2 00 ## 2 0 0 40 ## , , age = Young ## ## drink ## gender ## Boy ## Girl table.2.2 <- ftable(gender, drink, age) # A human readable table. table.2.2 ## age Old Young 255 CU IDOL SELF LEARNING MATERIAL (SLM)

## gender drink 11 ## Boy Coffee 23 ## Coke 12 ## Sprite 00 ## Tea 00 ## Water 22 ## Girl Coffee 00 ## Coke 00 ## Sprite 34 ## Tea 10 ## Water If you want proportions instead of counts, you need to specify the denominator, i.e., the margins. Think: what is the margin in each of the following outputs? prop.table(table1, margin = 1) ## drink Coffee Coke Sprite Tea Water ## gender 0.20000000 0.00000000 0.00000000 ## Boy 0.33333333 0.50000000 0.30000000 0.58333333 0.08333333 ## Girl 0.00000000 0.00000000 prop.table(table1, margin = 2) ## drink ## gender Coffee Coke Sprite Tea Water ## Boy 0.3333333 1.0000000 1.0000000 0.0000000 0.0000000 ## Girl 0.6666667 0.0000000 0.0000000 1.0000000 1.0000000 11.3 SUMMARY OF CONTINUOUS DATA Continuous variables admit many more operations than categorical. We can compute sums, means, quantiles, and more. Summary of Univariate Continuous Data 256 CU IDOL SELF LEARNING MATERIAL (SLM)

We distinguish between several types of summaries, each capturing a different property of the data. Summary of Location The mean, or average, of a sample x:=(x1,…,xn), denoted ¯x is defined as The sample mean is non robust. A single large observation may inflate the mean indefinitely. For this reason, we define several other summaries of location, which are more robust, i.e., less affected by “contaminations” of the data. The α quantile of a sample x, denoted xα, is (non uniquely) defined as a value above 100α% of the sample, and below 100(1−α)% .We emphasize that sample quantiles are non-uniquely defined. See ?quantile for the 9(!) different definitions that R provides. The α trimmed mean of a sample x, denoted ¯xα is the average of the sample after removing the α proportion of largest and α proportion of smallest observations. The simple mean and median are instances of the alpha trimmed mean: ¯x0 and ¯x0.5 respectively. Summary of Scale The scale of the data, sometimes known as spread, can be thought of its variability. The standard deviation of a sample x , denoted S(x), is defined as S(x):=√(n−1)−1∑(xi−¯x)2. For reasons of robustness, we define other, more robust, measures of scale. The Median Absolute Deviation from the median, denoted as MAD(x) , is defined as MAD(x):=c|x−x0.5|0.5. where c is some constant, typically set to c=1.4826 so that MAD and S(x) have the same large sample limit. The Inter Quantile Range of a sample x , denoted as IQR(x), is defined as IQR(x):=x0.75−x0.25. 257 CU IDOL SELF LEARNING MATERIAL (SLM)

Summary of Asymmetry Summaries of asymmetry, also known as skewness, quantify the departure of the x from a symmetric sample. The Yule measure of assymetry, denoted Yule(x) is defined as Yule(x):=(1/2(x0.75+x0.25)−x0.5)/(1/2IQR(x)) Summary of Bivariate Continuous Data When dealing with bivariate, or multivariate data, we can obviously compute univariate summaries for each variable separately. This is not the topic of this section, in which we want to summarize the association between the variables, and not within them. The covariance between two samples, x and y, of same length n, is defined as Cov(x,y):=(n−1)−1∑(xi−¯x)(yi−¯y) We emphasize this is not the covariance you learned about in probability classes, since it is not the covariance between two random variables but rather, between two samples. For this reasons, some authors call it the empirical covariance, or sample covariance. Pearson’s correlation coefficient, a.k.a. Pearson’s moment product correlation, or simply, the correlation, denoted r(x,y), is defined as r(x,y):=Cov(x,y)S(x)S(y). If you find this definition enigmatic, just think of the correlation as the covariance between x and y after transforming each to the unitless scale of z-scores. The z-scores of a sample x are defined as the mean-centered, scale normalized observations: zi(x):=xi−¯xS(x). We thus have that r(x,y)=Cov(z(x),z(y)) 11.4 SUMMARY • A summary analysis is simply a numeric reduction of a historical data set • Summaries of categorical variables will always start with the counting of the frequency of each category. • Continuous variables admit many more operations than categorical 258 CU IDOL SELF LEARNING MATERIAL (SLM)

• The sample mean is non robust. A single large observation may inflate the mean indefinitely • The scale of the data, sometimes known as spread, can be thought of its variability • Summaries of asymmetry, also known as skewness, quantify the departure of the x from a symmetric sample. • The z-scores of a sample x are defined as the mean-centered, scale normalized observations 11.5 KEYWORDS • EDA- Exploratory Data Analysis • Categorical Data-take on only a limited, and usually fixed number of possible values • Univariate - type of data that contains only one attribute or characteristic • Continuous Data-numeric value and can be meaningfully subdivided into finer and finer increments • Bivariate analysis- find the relationship between each variables 11.6 LEARNING ACTIVITY 1. Mean and median cannot be used to summarize all kind of data. Comment 2. Suppose you measure of how well the data are related. Can use pearson correlation. 11.7 UNIT END QUESTIONS 259 A. Descriptive Questions Short Questions 1. What is the need for summary in EDA? 2. Discuss about mean and median. 3. What is harmonic mean? CU IDOL SELF LEARNING MATERIAL (SLM)

4. How summary of uni-variate continuous data is performed? 5. Compare uni-variate and bivariate data Long Questions 1. Illustrate the concepts of summary in EDA. 2. Describe how Summary of continuous data is done 3. Illustrate various strategies of summarizing categorical data 4. Discuss the parameters used in summarizing categorical data B. Multiple Choice Questions 1. What is exploratory data analysis? a. A rigid framework by which we analyze data b. An initial way by which we can get a feel for data c. A type of purely quantitative method of data analysis d. A set of scientific principles for analyzing data in a categorical manner 2. Most often, EDA relies on _____. 260 a. visual techniques b. assumptions c. fixed models d. testing for statistical significance 3. Which of the following is a principle of analytic graphics? a. Don't plot more than two variables at time b. Only do what your tools allow you to do c. Show box plots (univariate summaries) d. Integrate multiple modes of evidence 4. What is the role of exploratory graphs in data analysis? a. They are typically made very quickly. b. They are made for formal presentations. c. Only a few are constructed. d. Axes, legends, and other details are clean and exactly detailed. CU IDOL SELF LEARNING MATERIAL (SLM)

5. Which of the following is true about the base plotting system? a. The system is most useful for conditioning plots. b. Plots are created and annotated with separate functions. c. Plots are typically created with a single function call. d. Margins and spacings are adjusted automatically depending on the type of plot and the data. Answers 1 -b, 2 -a, 3 –c, 4 –a, 5 –b 11.8 REFERENCES Text Books: • Allen B. Downey, “Think Python: How to Think like a Computer Scientist”, 2nd edition, Updated for Python 3, Shroff/O ‘Reilly Publishers, 2016 • Michael Urban, Joel Murach, Mike Murach: Murach's Python Programming; Dec, 2016 Reference Books: • Guido van Rossum and Fred L. Drake Jr, An Introduction to Python – Revised and updated for Python 3.2, • Jake Vander Plas, “Python Data Science Handbook”, O‘Reilly Publishers, 2016. 261 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT - 12: STATISTICS 1 Structure 12.0. Learning Objectives 12.1. Introduction 12.2. Mean 12.3. Median 12.4. Percentile 12.5. Quartiles 12.6. Outliers 12.7. Box Plot 12.8. Summary 12.9. Keywords 12.10. Learning Activity 12.11. Unit End Questions 12.12. References 12.0 LEARNING OBJECTIVES After studying this unit, you will be able to: • Learn the basics of statistics • Use percentile and quartile for data visualization • Perform outlier detection for real time scenarios 12.1 INTRODUCTION Probability is the study of random events. Most people have an intuitive understanding of degrees of probability, which is why you can use words like “probably” and “unlikely” without special training, but we will talk about how to make quantitative claims about those degrees. Statistics is the discipline of using data samples to support claims about populations. Most statistical analysis is based on probability, which is why these pieces are usually presented together. Computation is a tool that is well-suited to quantitative analysis, and computers are commonly used to process statistics. Also, computational experiments are useful for exploring concepts in probability and statistics. 262 CU IDOL SELF LEARNING MATERIAL (SLM)

12.2 MEAN The mean of a set of numbers, sometimes simply called the average, is the sum of the data divided by the total amount of data. The most popular and widely used measure of representing the entire data by one value is what most laymen call an 'average' and what the statisticians call the arithmetic mean. Its value is obtained by adding together all the items and by dividing this total by the number of items. Arithmetic mean may either be: • Simple arithmetic mean, or • Weighted arithmetic mean Merits and Limitations of Arithmetic Mean: The merits and demerits are as follows: Merits: Arithmetic mean is most widely used in practice because of the following reasons: • It is the simplest average to understand and easiest to compute. Neither the arraying of data as required for calculating median nor grouping of data as required for calculating mode is needed while calculating mean. • It is affected by the value of every item in the series. • It is defined by a rigid mathematical formula with the result that everyone who computes the average gets the same answer. • Being determined by a rigid formula, it lends itself to subsequent algebraic treatment better than the median or mode. • It is relatively reliable in the sense that it does not vary too much when repeated samples are taken from one and the same population. At least not as much as some other kind of statistical descriptions. The mean is typical in the sense that it is the center of gravity, balancing the values on either side of it. • It is a calculated value, and not based on position in the series. Limitations: Since the value of mean depends upon each and every item of the series extreme items, i.e., very small and very large items, unduly affect the value of the average. For example, if in a tutorial group there are 4 students and their marks in a test are 60, 70, 10 and 80 the average marks would be 60 + 70 + 10 + 80 = 220 = 55. 263 CU IDOL SELF LEARNING MATERIAL (SLM)

44 One single Item, i.e., 10, has reduced the average marks considerably. The smaller the number of observations the greater is likely to be the impact of extreme value. It is important to understand the following: • In a distribution with open-end classes the value of mean cannot be computed without making assumptions regarding the size of the class interval of the' open-end classes. If such classes contain a large proportion of the values, then mean may be subject to substantial error. However, the values of the median and mode can be computed where there are open-end classes without making any assumptions about size of class interval. • The arithmetic mean is not always a good measure of central tendency. The mean provides a \"characteristic\" value. in the sense of indicating where most of the values lie, only when the distribution of the variable is reasonably normal (bell-shaped). In case of a V-shaped distribution the mean is not likely to serve a useful purpose. 12.3 MEDIAN The median by definition refers to the middle value in a distribution. In case of median one- half of the items in the distribution have a value the size of the median value or smaller and one-half have a value the size of the median value or larger. The median is just the 50th percentile value below which 50 per cent of the values in the sample fall. It splits the observation into two halves. As distinct from the arithmetic mean which is calculated from the value of every item in the series, the median is what is called a positional average. The term 'position' refers to the place of a value in a series. The place of the median in a series is such that an equal number of items lie on either side of it. Example: If the income of five employees is Rs. 5,900, 6,950. 7,020, 7, 200 and 8, 280 the median would be 7, 020. 5, 900 6, 950 7, 020 « value at middle position of the array 7, 200 264 CU IDOL SELF LEARNING MATERIAL (SLM)

8, 280 For the above example the calculation of median was simple because of odd number of observations. When an even number of observations are listed, there is no single middle position value and the median is taken to be the arithmetic mean of two middlemost items. For example, if in the above case we are given the income of six employees as 5,900, 6,950, 7,020, 7,200, 8.280, 9,300, the median income would be: 5,900 6.950 7,020 7,200 8,280 9,300 There are two middle position values Median = 7,020 + 7,200 = 14,220 22 =Rs.7, 110 Hence, in case of even number of observations median may be found by averaging two middle position values. Thus, when, N is odd the median is an actual value, with the remainder of the series in two equal parts on either side of it. If N is even the median is a derived figure, i.e., half the sum of the middle values. Calculation of Median-Individual Observations: The steps involved are: • Arrange the data in ascending or descending order of magnitude. (Both arrangements would give the same answer). • In a group composed of an odd number of values such as 7, add 1 to be the total number of values and divide by 2. Thus, 7 + l would be 8 which divided by 2 gives 4- the number of the values starting at either end of the numerically arranged groups will 265 CU IDOL SELF LEARNING MATERIAL (SLM)

be the median value. In a large group the same method may be followed. In a group of 199 items the middle value would be 100th value. This would be determined by 199 + 1 in the. form of formula: 2 Median = Size of N+1 th item 2 Example 1 From the following data of the wages of 7 workers, compute the median wage: Wages (in Rs.) 4100 4150 6080 7120 5200 6160 7400 Solution: CALCULATION OF MEDIAN S. No. Wages arranged in S. No. Wages arranged in ascending order ascending order 1 4100 5 6160 2 4150 6 7120 3 5200 7 7400 4 6080 Median = Size of N+1th item = 7+1 = 4th item = Rs. 6080. 266 CU IDOL SELF LEARNING MATERIAL (SLM)

22 Size of 4th item = 6080. Hence the median wage = Rs. 6080 We thus find that median is the middle most items: 3 persons get a wag less than Rs. 5200 and equal number, i.e., 3, get more than. Rs. 5200. The procedure for determining the median of .an even-numbered group of items is not as obvious as above. If there were for instance, different values in a group, the median is really not determinable since both the 5th and 6th values are in the centre. In practice the median value for a group composed of an even number of items is estimated by finding the arithmetic mean of the two middle values that is, adding the two values in the middle and dividing by two. Expressed in the form of formula, it amounts to: Median = Size of N+1th item 2 Thus, we find that it is both when N is odd as well as even that 1 (one) has to be added to determine median value. 12.4 PERCENTILE A percentile is a comparison score between a particular score and the scores of the rest of a group. It shows the percentage of scores that a particular score surpassed. For example, if you score 75 points on a test, and are ranked in the 85 th percentile, it means that the score 75 is higher than 85% of the scores. The percentile rank is calculated using the formula R=P100(N) where P is the desired percentile and N is the number of data points. Example 1: If the scores of a set of students in a math test are 20, 30, 15 and 75 what is the percentile rank of the score 30? Arrange the numbers in ascending order and give the rank ranging from 1 to the lowest to 4 to the highest. 267 CU IDOL SELF LEARNING MATERIAL (SLM)

NumberRank151202303754 Use the formula: 3=P100(4)3=P2575=P Therefore, the score 30 has the 75 th percentile. Note that, if the percentile rank R is an integer, the P th percentile would be the score with rank R when the data points are arranged in ascending order. If R is not an integer, then the P th percentile is calculated as shown. Let I be the integer part and be the decimal part of D of R. Calculate the scores with the ranks I and I+1. Multiply the difference of the scores by the decimal part of R. The P th percentile is the sum of the product and the score with the rank I. Example 2: Determine the 35 th percentile of the scores 7,3,12,15,14,4 and 20. Arrange the numbers in ascending order and give the rank ranging from 1 to the lowest to 7 to the highest. NumberRank314273124145156207 Use the formula: R=35100(7)     =2.45 The integer part of R is 2, calculate the score corresponding to the ranks 2 and 3. They are 4 and 7. The product of the difference and the decimal part is 0.45(7−4) =1.35. Therefore, the 35 th percentile is 2+1.35=3.35. 12.5 QUARTILES A Quartile is a percentile measure that divides the total of 100% into four equal parts: 25%,50%,75% and 100%. A particular quartile is the border between two neighboring quarters of the distribution. 268 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 12.1 Quartile Q1 (quartile 1) separates the bottom 25% of the ranked data (Data is ranked when it is arranged in order.) from the top 75%. Q2 (quartile 2) is the mean or average. Q3 (quartile 3) separates the top 25% of the ranked data from the bottom 75%. More precisely, at least 25% of the data will be less than or equal to Q1 and at least 75% will be greater than or equal Q1. At least 75% of the data will be less than or equal to Q3 while at least 25% of the data will be greater than or equal to Q3. Example 1: Find the 1st quartile, median, and 3rd quartile of the following set of data. 24,26,29,35,48,72,150,161,181,183,183 There are 11 numbers in the data set, already arranged from least to greatest. The 6th number, 72, is the middle value. So, 72 is the median. Once we remove 72, the lower half of the data set is 24,26,29,35,48 Here, the middle number is 29. So, Q1=29. The top half of the data set is 150,161,181,183,183 Here, the middle number is 181. So, Q3=181. The inter-quartile range or IQR is the distance between the first and third quartiles. It is sometimes called the H-spread and is a stable measure of disbursement. It is obtained by evaluating Q3−Q1. 12.6 OUTLIER An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations. 269 CU IDOL SELF LEARNING MATERIAL (SLM)

Two activities are essential for characterizing a set of data: • Examination of the overall shape of the graphed data for important features, including symmetry and departures from assumptions. • Examination of the data for unusual observations that are far removed from the mass of data. These points are often referred to as outliers. Two graphical techniques for identifying outliers, scatter plots and box plots Example The data set of N’s = 90 ordered observations as shown below is examined for outliers: 30, 171, 184, 201, 212, 250, 265, 270, 272, 289, 305, 306, 322, 322, 336, 346, 351, 370, 390, 404, 409, 411, 436, 437, 439, 441, 444, 448, 451, 453, 470, 480, 482, 487, 494, 495, 499, 503, 514, 521, 522, 527, 548, 550, 559, 560, 570, 572, 574, 578, 585, 592, 592, 607, 616, 618, 621, 629, 637, 638, 640, 656, 668, 707, 709, 719, 737, 739, 752, 758, 766, 792, 792, 794, 802, 818, 830, 832, 843, 858, 860, 869, 918, 925, 953, 991, 1000, 1005, 1068, 1441 The computations are as follows: Median = (n+1)/2 largest data point = the average of the 45th and 46th ordered points = (559 + 560)/2 = 559.5 Lower quartile = .25(N+1)th ordered point = 22.75th ordered point = 411 + .75(436-411) = 429.75 Upper quartile = .75(N+1)th ordered point = 68.25th ordered point = 739 +.25(752-739) = 742.25 Interquartile range = 742.25 - 429.75 = 312.5 Lower inner fence = 429.75 - 1.5 (312.5) = -39.0 Upper inner fence = 742.25 + 1.5 (312.5) = 1211.0 Lower outer fence = 429.75 - 3.0 (312.5) = -507.75 Upper outer fence = 742.25 + 3.0 (312.5) = 1679.75 From an examination of the fence points and the data, one point (1441) exceeds the upper inner fence and stands out as a mild outlier; there are no extreme outliers. 270 CU IDOL SELF LEARNING MATERIAL (SLM)

A histogram with an overlaid box plot is shown below. Figure 12.2 Histogram overlaid Box Plot The outlier is identified as the largest value in the data set, 1441, and appears as the circle to the right of the box plot. Outliers should be investigated carefully. Often, they contain valuable information about the process under investigation or the data gathering and recording process. Before considering the possible elimination of these points from the data, one should try to understand why they appeared and whether it is likely similar values will continue to appear. Of course, outliers are often bad data points. 12.7 BOX PLOT In descriptive statistics, a box plot or boxplot (also known as box and whisker plot) is a type of chart often used in explanatory data analysis. Box plots visually show the distribution of numerical data and skewness through displaying the data quartiles (or percentiles) and averages. Box plots show the five-number summary of a set of data: including the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score. 271 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 12.3 Box Plot Minimum Score The lowest score, excluding outliers (shown at the end of the left whisker). Lower Quartile Twenty-five percent of scores fall below the lower quartile value (also known as the first quartile). Median The median marks the mid-point of the data and is shown by the line that divides the box into two parts (sometimes known as the second quartile). Half the scores are greater than or equal to this value and half are less. Upper Quartile Seventy-five percent of the scores fall below the upper quartile value (also known as the third quartile). Thus, 25% of data are above this value. Maximum Score The highest score, excluding outliers (shown at the end of the right whisker). Whiskers The upper and lower whiskers represent scores outside the middle 50% (i.e., the lower 25% of scores and the upper 25% of scores). The Interquartile Range (or IQR) 272 CU IDOL SELF LEARNING MATERIAL (SLM)

This is the box plot showing the middle 50% of scores (i.e., the range between the 25th and 75th percentile). Why are box plots useful? Box plots divide the data into sections that each contain approximately 25% of the data in that set. Figure12.4: Sample Box Plot Box plots are useful as they provide a visual summary of the data enabling researchers to quickly identify mean values, the dispersion of the data set, and signs of skewness. Note the image above represents data which is a perfect normal distribution, and most box plots will not conform to this symmetry (where each quartile is the same length). Box plots are useful as they show the average score of a data set. The median is the average value from a set of data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less. Box plots are useful as they show the skewness of a data set The box plot shape will show if a statistical data set is normally distributed or skewed. 273 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 12.5: Distribution in Box Plot When the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric. When the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right). When the median is closer to the top of the box, and if the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left). Box plots are useful as they show the dispersion of a data set. In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. The smallest value and largest value are found at the end of the ‘whiskers’ and are useful for providing a visual indicator regarding the spread of scores (e.g., the range). Figure 12.6: IQR in Box Plot 274 CU IDOL SELF LEARNING MATERIAL (SLM)

The interquartile range (IQR) is the box plot showing the middle 50% of scores and can be calculated by subtracting the lower quartile from the upper quartile (e.g., Q3−Q1). Box plots are useful as they show outliers within a data set. An outlier is an observation that is numerically distant from the rest of the data. When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot. How to compare box plots Box plots are a useful way to visualize differences among different samples or groups. They manage to provide a lot of statistical information, including — medians, ranges, and outliers. Note, although box plots have been presented horizontally in this article, it is more common to view them vertically in research papers Step 1: Compare the medians of box plots Compare the respective medians of each box plot. If the median line of a box plot lies outside of the box of a comparison box plot, then there is likely to be a difference between the two groups. Figure 12.7: Comparing Box Plot 275 CU IDOL SELF LEARNING MATERIAL (SLM)

Step 2: Compare the interquartile ranges and whiskers of box plots Compare the interquartile ranges (that is, the box lengths), to examine how the data is dispersed between each sample. The longer the box the more dispersed the data. The smaller the less dispersed the data. Figure 12.8: Comparing interquartile range Next, look at the overall spread as shown by the extreme values at the end of two whiskers. This shows the range of scores (another type of dispersion). Larger ranges indicate wider distribution, that is, more scattered data. Step 3: Look for potential outliers (see above image) When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot. Step 4: Look for signs of skewness If the data do not appear to be symmetric, does each sample show the same kind of asymmetry? 276 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 12.9: Symmetry in Box Plot 12.8 SUMMARY • The mean is the sum of the data divided by the total number of data • The median by definition refers to the middle value in a distribution. In case of median one-half of the items in the distribution have a value the size of the median value or smaller and one-half have a value the size of the median value or larger • A percentile is a comparison score between a particular score and the scores of the rest of a group. It shows the percentage of scores that a particular score surpassed • A Quartile is a percentile measure that divides the total of 100% into four equal parts: 25%,50%,75% and 100%. A particular quartile is the border between two neighboring quarters of the distribution. • An outlier is an observation that lies an abnormal distance from other values in a random sample from a population • A box plot or boxplot (also known as box and whisker plot) is a type of chart often used in explanatory data analysis. Box plots visually show the distribution of numerical data and skewness through displaying the data quartiles (or percentiles) and averages. • Box plots show the five-number summary of a set of data: including the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score 277 CU IDOL SELF LEARNING MATERIAL (SLM)

12.9 KEYWORDS • Mean-average of the numbers • Median-middle number in a sorted list • Percentile-ow a score compares to other scores in the same set • Quartile-divides the number of data points into four parts • Outlier-data points that are far from other data points • Whisker Plot-displaying the data distribution through their quartiles 12.10 LEARNING ACTIVITY 1. Find the outliers for the following data set: 3,10,14,19,22,29,32,36,49,70 2. From the following data, find the value of median: --------------------------------------------------------------------------------------------------------- Income (Rs.) No. of persons ---------------------------------------------------------------------------------------------------------- 4,000 24 4,500 26 5,800 16 5,060 20 6,600 6 5,380 30 12.11 UNIT END QUESTIONS A. Descriptive Questions 278 CU IDOL SELF LEARNING MATERIAL (SLM)

Short Questions 1. Define mean and median 2. Compare percentile and Quartile 3. What is outlier? 4. How do you construct whisker plot? 5. Lit the category of data represented in box plot Long Questions 1. Illustrate the role of mean and median in data analysis. 2. How visualization is done using percentile and quartiles? 3. Describe the concept of whisker plot. 4. Discuss the role of outlier detection 5. Outlier sometime contains important information. Comment B. Multiple Choice Questions 1. Any measure indicating the centre of a set of data, arranged in an increasing or decreasing order of magnitude, is called a measure of: a. Skewness b. Symmetry c. Central tendency d. Dispersion 2. Scores that differ greatly from the measures of central tendency are called: a. Raw scores b. The best scores c. Extreme scores d. Z-scores 3. The measure of central tendency listed below is: 279 a. The raw score b. The mean c. The range CU IDOL SELF LEARNING MATERIAL (SLM)

d. Standard deviation 4. The total of all the observations divided by the number of observations is called: a. Arithmetic mean b. Geometric mean c. Median d. Harmonic mean 5. While computing the arithmetic mean of a frequency distribution, each value of a class is considered equal to: a. Class mark b. Lower limit c. Upper limit d. Lower class boundary 6. Change of origin and scale is used for calculation of the: a. Arithmetic mean b. Geometric mean c. Weighted mean d. Lower and upper quartiles 7. The arithmetic mean is highly affected by: a. Moderate values b. Extremely small values c. Odd values d. Extremely large values 8. Which of the following statements is always true? a. The mean has an effect on extreme scores b. The median has an effect on extreme scores c. Extreme scores have an effect on the mean d. Extreme scores have an effect on the median 280 CU IDOL SELF LEARNING MATERIAL (SLM)

9. The midpoint of the values after they have been ordered from the smallest to the largest or the largest to the smallest is called: a. Mean b. Median c. Lower quartile d. Upper quartile 10. If a set of data has one mode and its value is less than mean, then the distribution is called: a. Positively skewed b. Negatively skewed c. Symmetrical d. Normal Answers 1 -c, 2 -c, 3 –b, 4 –a, 5 –b, 6 –a, 7 –d, 8 -c, 9 –b, 10 –a 12.12 REFERENCES Text Books: • Allen B. Downey, “Think Python: How to Think like a Computer Scientist”, 2nd edition, Updated for Python 3, Shroff/O ‘Reilly Publishers, 2016 • Michael Urban, Joel Murach, Mike Murach: Murach's Python Programming; Dec, 2016 Reference Books: • Guido van Rossum and Fred L. Drake Jr, An Introduction to Python – Revised and updated for Python 3.2, • Jake Vander Plas, “Python Data Science Handbook”, O‘Reilly Publishers, 2016. 281 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT - 13: STATISTICS II Structure 13.0. Learning Objectives 13.1. Introduction to Variance 13.2. Standard Deviation 13.3. Covariance 13.4. Scatter Plot 13.5. Pearson Correlation Coefficient 13.6. Summary 13.7. Keywords 13.8. Learning Activity 13.9. Unit End Questions 13.10. References 13.0 LEARNING OBJECTIVES After studying this unit, you will be able to: • Compare variance and covariance • Describe standard deviation • Analyze positive and negative correlation among the data 13.1 INTRODUCTIO TO VARIANCE Variance is the measure of statistical dispersion, that is, the variation among the different samples in a data set. It is the average of the squared differences from the mean. Variance is a numerical value that shows how widely the individual figures in a set of data distribute themselves about the mean and hence describes the difference of each value in the dataset from the mean value Variance is the square of the standard deviation. If you do not know the standard deviation, you can use the following procedure to determine the variance. Procedure for Finding the Variance: 1. Find the mean of the scores (x¯) . 282 CU IDOL SELF LEARNING MATERIAL (SLM)

2. Subtract the mean from each individual score (x−x¯). 3. Square each of the differences obtained above. (x−x¯)2. 4. Add all of the squares obtained in step 3. (∑(x−x¯)2). 5. Divide the total from step 4 by the number (n−1), where n is the total number of scores used. 13.2 STANDARD DEVIATION Standard deviation is the measure of how spread out your data is. It is a statistic that tells you how closely all of the examples are gathered around the mean (average) in a data set. The steeper the bell curve, the smaller the standard deviation. If the examples are spread far apart, the bell curve will be much flatter, meaning the standard deviation is large. In business, the smaller the standard deviation is the better. Procedure for Finding the Standard Deviation: 1. Find the mean of the scores (x¯). 2. Subtract the mean from each individual score (x−x¯). 3. Square each of the differences obtained above. (x−x¯)2. 4. Add all of the squares obtained in step 3. (∑(x−x¯)2). 5 Divide the total from step 4 by the number (n−1), where n is the total number of scores used. 6. Find the square root of the result of step 5. Be careful not to round the mean too much as the resulting standard deviation can be in error. Try not to round any intermediate results. Round only at the end. 13.3 COVARIANCE Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co variance tells you how two variables vary together. 283 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 13.1 Covariance The Covariance Formula The formula is: Cov(X,Y) = Σ E((X-μ)E(Y-ν)) / n-1 where: X is a random variable E(X) = μ is the expected value (the mean) of the random variable X and E(Y) = ν is the expected value (the mean) of the random variable Y n = the number of items in the data set Example Calculate covariance for the following data set: x: 2.1, 2.5, 3.6, 4.0 (mean = 3.1) y: 8, 10, 12, 14 (mean = 11) Substitute the values into the formula and solve: Cov(X,Y) = ΣE((X-μ)(Y-ν)) / n-1 = (2.1-3.1)(8-11)+(2.5-3.1)(10-11)+(3.6-3.1)(12-11)+(4.0-3.1)(14-11) /(4-1) = (-1)(-3) + (-0.6)(-1)+(.5)(1)+(0.9)(3) / 3 = 3 + 0.6 + .5 + 2.7 / 3 = 6.8/3 = 2.267 The result is positive, meaning that the variables are positively related. 284 CU IDOL SELF LEARNING MATERIAL (SLM)

13.4 SCATTER PLOT A scatter plot can be used for data in the form of ordered pairs of numbers. The result will be a bunch of points \"scattered\" around the plane. If the general tendency is for the points to rise from the left to the right of the graph, then we say there is a positive correlation between the two variables measured. If the points tend to fall from the left to the right of the graph, we say there is negative correlation . If there is no general tendency, then there is no correlation . If the tendency is not very pronounced – that is, the points are scattered widely – then we say the variables are weakly correlated. If the correlation is more pronounced, we say the variables are strongly correlated. Figure 13.2 Weak Positive Correlations 285 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 13.3 Strong Negative Correlations Figure 13.4 No Correlations Examples: If you graphed a person's height on one axis and their weight on the other, you would probably get a strong positive correlation (because taller people generally weigh more). If you graphed a man's age and the number of hairs on his head, you would probably get a weak negative correlation (because some men have a tendency for baldness as they get older). 286 CU IDOL SELF LEARNING MATERIAL (SLM)

If you graphed a woman's shoe size and the length of her hair, you would probably get no correlation. (These variables are unrelated.) 13.5 PEARSON CORRELATION COEFFICIENT Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is the Pearson Correlation. The full name is the Pearson Product Moment Correlation (PPMC). In layman terms, it’s a number between “+1” to “-1” which represents how strongly the two variables are associated. Or to put this in simpler words, it states the measure of the strength of linear association between two variables. Basically, a Pearson Product Moment Correlation (PPMC)attempts to draw a line to best fit through the data of the given two variables, and the Pearson correlation coefficient “r” indicates how far away all these data points are from the line of best fit. The value of “r” ranges from +1 to -1 where: r= +1/-1 represents that all our data points lie on the line of best fit only i.e., there is no data point which shows any variation from the line of best fit. Figure 13.5 Data points with r=1 • Hence, the stronger the association between the two variables, the closer r will be to 1/-1. • r = 0 means that there is no correlation between the two variables. • The values of r between +1 and -1 indicate that there is a variation of data around the line. 287 CU IDOL SELF LEARNING MATERIAL (SLM)

• The closer the values of r to 0, the greater the variation of data points around the line of best fit. Formula of Pearson Correlation coefficient: Example Find the value of the correlation coefficient from the following table: Age and Glucose levels of 6 subjects We’ll calculate the value of r using the formula mentioned above. For using that formula, we need to compute Σ(X*Y), Σ(X), Σ(Y), Σ(X²), Σ(Y²). The table below shows the computed values of all the summations mentioned above. 288 CU IDOL SELF LEARNING MATERIAL (SLM)

From our table we get: Σ(X) = 247 Σ(Y) = 486 Σ(X*Y) = 20,485 Σ(X²) = 11,409 Σ(Y²) = 40,022 n is the sample size, in our case = 6 r = 6(20,485) — (247 × 486) / [√ [[6(11,409) — (24⁷²)] × [6(40,022) — 48⁶²]]] r = 0.5298. The range of the correlation coefficient is from -1 to +1. Our result is 0.5298 or 52.98%, which means the variables have a moderate positive correlation Problems with Pearson correlation. The PPMC is not able to tell the difference between dependent variables and independent variables. For example, if you are trying to find the correlation between a high calorie diet and diabetes, you might find a high correlation of .8. However, you could also get the same result with the variables switched around. In other words, you could say that diabetes causes a high calorie diet. That obviously makes no sense. Therefore, as a researcher you have to be 289 CU IDOL SELF LEARNING MATERIAL (SLM)

aware of the data you are plugging in. In addition, the PPMC will not give you any information about the slope of the line; it only tells you whether there is a relationship. 13.6 SUMMARY • The Variance is the measure of statistical dispersion, that is, the variation among the different samples in a data set. It is the average of the squared differences from the mean • Standard deviation is the measure of how spread out your data is. It is a statistic that tells you how closely all of the examples are gathered around the mean (average) in a data set. The steeper the bell curve, the smaller the standard deviation • Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co variance tells you how two variables vary together • A scatter plot can be used for data in the form of ordered pairs of numbers. The result will be a bunch of points \"scattered\" around the plane. • Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is the Pearson Correlation • Pearson's Correlation Coefficient is a linear correlation coefficient that returns a value of between -1 and +1 13.7 KEYWORDS • Variance-measure of variability • SD-Standard Deviation • Covariance-measure of the directional relationship between two random variables • Scatter Plot-observe and visually display the relationship between variables. • Pearson Correlation-Correlation between sets of data 13.8 LEARNING ACTIVITY 1. Find the standard deviation of 4, 9, 11, 12, 17, 5, 8, 12, 14 290 CU IDOL SELF LEARNING MATERIAL (SLM)

2. Calculate the covariance of Daily Return for Two Stocks Using the Closing Price 13.9 UNIT END QUESTIONS 291 A. Descriptive Questions Short Questions 1. Define variance 2. Compare variance and covariance 3. What is the use of scatter plot? 4. Differentiate box plot and scatter plot 5. List the disadvantage of Pearson Correlation coefficient Long Questions 1. Illustrate the role of variance and standard deviation in data analysis. 2. How visualization is done using scatter plot? 3. Compare the performance of box plot and scatter plot 4. Discuss the role of Pearson correlation coefficient 5. Describe about positive and negative correlation B. Multiple Choice Questions 1. Mean and variance of Poisson’s distribution is the same. CU IDOL SELF LEARNING MATERIAL (SLM)

a. True 292 b. False 2. What is the mean and variance for standard normal distribution? a. Mean is 0 and variance is 1 b. Mean is 1 and variance is 0 c. Mean is 0 and variance is ∞ d. Mean is ∞ and variance is 0 3. Variance of a random variable X is given by _________ a. E(X) b. E(X2) c. E(X2) – (E(X))2 d. (E(X))2 4. Mean of a constant ‘a’ is ___________ a. 0 b. a c. a/2 d. 1 5. Variance of a constant ‘a’ is _________ a. 0 b. a c. a/2 d. 1 6. The covariance is: a. A measure of the strength of relationship between two variables. b. Dependent on the units of measurement of the variables. c. An unstandardized version of the correlation coefficient. d. All the above. CU IDOL SELF LEARNING MATERIAL (SLM)

7. If Pearson’s correlation coefficient between stress level and workload is .8, how much variance in stress level is not accounted for by workload? a. 20% b. 2% c. 8% d. 36% 8. How much variance has been explained by a correlation of .9? a. 18% b. 9% c. 81% d. None of these 9. Correlation analysis is a _____________ a. Univariate analysis b. Bivariate analysis c. Multivariate analysis d. Both b and c 10. When the amount of change in one variable leads to a constant ratio of change in the other variable, then correlation is said to be _____________ a. Linear b. Non-linear c. Positive d. Negative Answers 293 1 -a, 2 -a, 3 –c, 4 –b, 5 –a, 6 –d, 7 –d, 8 -d, 9 –d, 10 –a. 13.10 REFERENCES Text Books: CU IDOL SELF LEARNING MATERIAL (SLM)

• Allen B. Downey, “Think Python: How to Think like a Computer Scientist”, 2nd edition, Updated for Python 3, Shroff/O ‘Reilly Publishers, 2016 • Michael Urban, Joel Murach, Mike Murach: Murach's Python Programming; Dec, 2016 Reference Books: • Guido van Rossum and Fred L. Drake Jr, An Introduction to Python – Revised and updated for Python 3.2, • Jake Vander Plas, “Python Data Science Handbook”, O‘Reilly Publishers, 2016. 294 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT - 14: WEB APPLICATION 1 Structure 14.0. Learning Objectives 14.1. Introduction 14.2. Virtual Environment 14.3. Creating a Django Project 14.4. Summary 14.5. Keywords 14.6. Learning Activity 14.7. Unit End Questions 14.8. References 14.0 LEARNING OBJECTIVES After studying this unit, you will be able to: • Identify the benefits of Django • Design a web application using Django • Create a project using Django 14.1 INTRODUCTION Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. Django is a widely used free, open-source, and high-level web development framework. It provides a lot of features to the developers \"out of the box,\" so development can be rapid. However, websites built from it are secured, scalable, and maintainable at the same time. Required Setup 1. Git Bash: The user of all operating systems can use it. All the Django related commands and Unix commands are done through it 2. Text-Editor: Any Text-Editor like Sublime Text or Visual Studio Code can be used. For the following project, Sublime Text is used. 295 CU IDOL SELF LEARNING MATERIAL (SLM)

3. Python 3: The latest version of Python can be downloaded in internet 14.2 VIRTUAL ENVIRONMENT Virtual Environment acts as dependencies to the Python-related projects. It works as a self- contained container or an isolated environment where all the Python-related packages and the required versions related to a specific project are installed. Since newer versions of Python, Django, or packages, etc. will roll out, through the help of a Virtual Environment, you can work with older versions that are specific to your project. In Summary, you can start an independent project related to Django of version 2.0, whereas another independent project related to Django of version 3.0 can be started on the same computer. Creating a Virtual Environment To work with Django, we’ll first set up a virtual environment to work in. A virtual environment is a place on your system where you can install packages and isolate them from all other Python packages. Separating one project’s libraries from other projects is beneficial and will be necessary when we deploy Learning Log to a server in Chapter 20. Create a new directory for your project called learning_log, switch to that directory in a terminal, and create a virtual environment. If you’re using Python 3, you should be able to create a virtual environment with the following command: learning_log$ python -m venv ll_env learning_log$ Here we’re running the venv module and using it to create a virtual environment named ll_env. If this works, move on to “Activating the Virtual Environment” Installing virtualenv If you’re using an earlier version of Python or if your system isn’t set up to use the venv module correctly, you can install the virtualenv package. To install virtualenv, enter the following: $ pip install --user virtualenv If you’re using Linux and this still doesn’t work, you can install virtualenv through your system’s package manager. On Ubuntu, for example, the command sudo apt-get install python-virtualenv will install virtualenv 296 CU IDOL SELF LEARNING MATERIAL (SLM)

Change to the learning_log directory in a terminal, and create a virtual environment like this: learning_log$ virtualenv ll_env New python executable in ll_env/bin/python Installing setuptools, pip...done. learning_log$ If you have more than one version of Python installed on your system, you should specify the version for virtualenv to use. For example, the command virtualenv ll_env --python=python3 will create a virtual environment that uses Python 3. Activating the Virtual Environment Now that we have a virtual environment set up, we need to activate it with the following command: learning_log$ source ll_env/bin/activate (ll_env)learning_log$ This command runs the script activate in ll_env/bin. When the environment is active, you’ll see the name of the environment in parentheses; then you can install packages to the environment and use packages that have already been installed. Packages you install in ll_env will be available only while the environment is active To stop using a virtual environment, enter deactivate: (ll_env)learning_log$ deactivate learning_log$ Installing Django Once you’ve created your virtual environment and activated it, install Django: (ll_env)learning_log$ pip install Django Installing collected packages: Django Successfully installed Django Cleaning up... (ll_env)learning_log$ Because we’re working in a virtual environment, this command is the same on all systems. There’s no need to use the --user flag, and there’s no need to use longer commands like python -m pip install package_name. Keep in mind that Django will be available only when the environment is active 297 CU IDOL SELF LEARNING MATERIAL (SLM)

14.3 CREATING A DJANGO PROJECT 1. The first step is creating your project by using the 'django-admin startproject project_name' command, where 'project_name' is 'django_blog' in your case. Also, it will generate a lot of files inside our newly created project, which you can research further in Django documentation if needed. 2. Change the directory to the newly created project using 'cd' command and to view the created file using 'ls' command. 3. You can run your project by using 'python manage.py runserver'. 4. The project can be viewed in your favorite browser (Google Chrome, Mozilla Firefox, etc.).You can come into your browser and type 'localhost:8000' or '127.0.0.1:8000'. 14.4 SUMMARY • Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design • Django is a widely used free, open-source, and high-level web development 298 CU IDOL SELF LEARNING MATERIAL (SLM)

framework. It provides a lot of features to the developers \"out of the box,\" so development can be rapid. • Virtual Environment acts as dependencies to the Python-related projects. It works as a self-contained container or an isolated environment where all the Python-related packages and the required versions related to a specific project are installed • A virtual environment is a place on your system where you can install packages and isolate them from all other Python packages. • Before installing Django, it's recommended to install Virtualenv that creates new isolated environments to isolates your Python files on a per-project basis. This will ensure that any changes made to your website won't affect other websites you're developing • Activating a virtual environment will put the virtual environment-specific python and pip executables into your shell's PATH 14.5 KEYWORDS • Django-web framework that enables rapid development of secure and maintainable websites • Web Application-uses a web browser to perform a particular function • Virtual Environment-to execute an application • Django Project-entire application and all its parts 14.6 LEARNING ACTIVITY 1. Build a couple of empty projects and look at what it creates. Make a new folder with a simple name, like InstaBook or FaceGram (outside of your learning_log directory), navigate to that folder in a terminal, and create a virtual environment Install Django, and run the command django-admin.py startproject instabook . (make sure you include the dot at the end of the command). 2. Look at the files and folders this command creates, and compare them to Learning Log. 299 CU IDOL SELF LEARNING MATERIAL (SLM)

Do this a few times until you’re familiar with what Django creates when starting a new project. Then delete the project directories if you wish 14.7 UNIT END QUESTIONS 300 A. Descriptive Questions Short Questions 1. What is the difference between web and desktop application? 2. Define client server architecture. 3. What is the need for virtual environment? 4. List the benefits of Django. 5. Is it necessary to activate virtual environment? Long Questions 1. Illustrate the process of creating web application with Django. 2. Describe the benefits of using Django for application creation 3. Discuss how virtual environment is used with Django 4. Illustrate the steps in creating a Django Project 5. Describe about activating and deactivating virtual environments B. Multiple Choice Questions 1. What is a Django App? a. Django app is an extended package with base package is Django b. Django app is a python package with its own components. c. Both 1 & 2 Option d. All of these 2. What are Migrations in Django? a. They are files saved in migrations directory. b. They are created when you run makemigrations command. c. Migrations are files where Django stores changes to your models. d. All of these CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook