Soil Nutrient Deciencies as Reected in Crops/Plants A plant's su ciency range of nutrients is the amount necessary for meeting the plant's nutritional needs and maximize growth. This depends on individual plant species and the nutrient. Nutrient deciency occurs when a essential nutrient is not available in su cient quantity to meet the growth requirements. Toxicity occurs when a nutrient is more than plant needs and decrease plant growth and quality (Figure12.1). Diagnosis of nutrient deciency and toxicity are carried out with the help of three basic methods 1. Soil testing 2. Plant analysis and 3. Visual observation in the eld. Both soil testing and plant analysis are quantitative tests that compare soil and plant concentrations to the su ciency range for a crop or plant. Visual observation on the other hand makes qualitative assessment and based on external symptoms such as stunted growth or yellowing of leaves due to nutrient stress. Common Deciency Symptoms It is important to learn the deciency symptoms. each deciency is related to some function of the nutrient in the plant (Havlin et al., 1999). Symptoms are generally grouped into ve categories: 1. Stunted growth 2. Chlorosis 3. Interveinal Chlorosis 4. Purplish red colouring 5. Necrosis Figure 12.1 : Relationship between crop yield and nutrient concentration (Havlin et al. 1999) 181
Soil Nutrient Deciencies as Reected in Crops/Plants 1. Stunting occurs when plant functions such as stem elongation, photosynthesis and protein production are decient, plant growth are typically slow, and plants are small in stature. 2. Chlorosis and interveinal chlorosis are found in plants decient of nutrients necessary for photosynthesis and or chlorophyll production. Chlorosis can result in either plant or leaf turning light green to yellow or appear more localized as white or yellow spotting. Interveinal chlorosis occurs when certain nutrients (B, Fe, Mg, Mn, Ni and Zn) are decient. Purplish-red discolouration in plant stems and leaves are due to above normal levels of anthocyanin that can accumulate when plant functions are disrupted or stressed. However, it is di cult to diagnose chlorosis because the purplish tinge can be due to other factors such as cool temperatures, disease, drought and even maturation of some plants can also cause anthocyanin to accumulate (Bennett, 1993). Certain plant cultivars also exhibit purplish colouring. Nutrients are categorized into mobile and immobile nutrients based on where symptom appears in plant. Mobile nutrients can move out of older leaves to younger plants when supplies are inadequate. Mobile nutrients include N, P, K, Cl, Mg and Mo. In mobile nutrient deciency visual symptoms rst appear in lower older leaves. The immobile nutrients (B, Ca, Cu, Fe, Mn, Ni, S and Zn cannot move from one plant part to another and deciency symptoms will initially occur in the younger plant parts and upper leaves and will be localized. Generally, deciency symptoms di er among crop types and plant specic symptoms cannot be listed as such. Important Mobile Nutrients Nitrogen (N): Nitrogen needed by plants to produce proteins, nucleic acids and chlorophyll. Deciency symptoms of nitrogen show chlorosis in lower leaves, stunted and slow growth and necrosis of older leaves in severe cases. N decient plats mature early and crop quality and yield are often reduced (Jones, 1998). \"In cereals yellow discoloration from the leaf tip backward in the form of a \"V\" is common\" (Jacobsen and Jasper, 1991). Photo plate 12.1 shows general chlorosis due to nitrogen deciency. Photo plate 12.1 : Deciency symptoms of nitrogen (Source: http://plantsci.sdstate.edu/woodardh/soilfert/Nutrient_Deciency_Pages/soy_def/SOY-N1.JPG) 182
Soil Nutrient Deciencies as Reected in Crops/Plants Phosphorus (P): Plants require P for the development of ATP, sugars and nucleic acids. P deciency are more seen in young plants which have greater relative demand for P than more mature plants (Grundon, 1987). Plants leaf and stem turns dark green. Older leaves show purplish discolouration. Colour changes due to sugar accumulation. leaf tip becomes brown and die. Photo plate 12.2 shows leaves with P deciency where in leaves are dark green to purple red in colour. Potassium (K): Potassium plays a major role in osmotic regulation and is utilised by plants in the activation of enzymes, photosynthesis, protein formation and sugar transport. Initially there is only a reduction in growth rate with chlorosis and necrosis occurring in later stage (Mengel and Kirkby, 2001). Potassium deciency results inreduction in growth rate, burn leaf tips of the plant, reduced straw and stalk in plants and show low protein level. Photo plate 12.3 Leaves with necrotic spots that are symptoms of K deciency. Photo plate 12.2 : Deciency symptoms of phosphorus (Source: http://www.ext.vt.edu/news/periodicals/viticulture/04octobernovember/photo3.jpg) Photo plate 12.3 : Deciency symptoms of potassium (Source: http://www.ipm.iastate.edu/ipm/icm/les/images/antonio004f.jpg) 183
Soil Nutrient Deciencies as Reected in Crops/Plants Chloride (Cl): Chloride is required by the plant for leaf turgor and photosynthesis. Chloride deciencies mostly occur in winter wheat (Engel et al. 2001). Plants with chloride deciency show chlorotic and necrotic spotting along leaves with abrupt boundaries between dead and live tissue. Chloride deciency are highly cultivar specic and can be easily mistaken for leaf diseases. Magnesium (Mg): It is the central molecule in chlorophyll and is an important co-factor to produce ATP. Symptoms include inter veinal chlorosis and leaf margins becoming yellow or reddish-purple while midrib remains green. In wheat distinct yellowish green patches occur.Photo plate 12.4 shows Mg deciency where leaves show yellow chlorotic interveinal tissue. Photo plate 12.4 : Deciency symptoms of magnesium (Source: http://quorumsensing.ifas.u.edu/HCS200/images/deciencies/-Mgcq.jpg) Molybdenum (Mo): It is needed for enzyme activity in the plant and for nitrogen xation in legumes. Molybdenum deciency resembles nitrogen deciency like stunted plant growth, chlorosis occurring in legumes. Other symptoms are pale leaves that may be scorched, cupped or rolled. Leaves may appear thick or brittle and will eventually wither leaving only the midrib. Important Immobile nutrients: Sulfur (S): Sulfur is an essential constituent of certain amino acids and proteins. Deciency results in inhibition of protein and chlorophyll synthesis. In contrast to N and 184
Soil Nutrient Deciencies as Reected in Crops/Plants Mo deciency symptoms sulfur deciency symptoms occur in younger leaves causing them to turn light green to yellow. Sulphur decient plants have thin stem and are small and spindle shaped. Photo plate 12.5 (a) shows Suphur deciency where in leaves are light green and yellowing. Boron (B): Boron is required for cell wall formation and reproductive tissue. Deciency leads to chlorotic young leaves and death of main growing point (terminal bud). Boron decient leaves have dark brown irregular lesions which later become leaf necrosis in severe cases. Photo plate 12.5 (b) shows boron deciency with necrotic short internodes and cracked and rough stem. (a) (b) Photo plate 12.5 : Deciency symptoms of (a): sulphur and (b) boron (Source: http://www.ag.ndsu.nodak.edu/aginfo/entomology/ndsucpr/Years/2007/june/7/soils.jpg; http://www.canr.msu.edu/vanburen/ c12.jpg) Iron (Fe): Iron plays important role in plant respiratory and photosynthetic reactions. Chlorophyll production in leaves is reduced in plants su ering from iron deciency. There is interveinal chlorosis. In young leaves there is sharp distinction between veins and chlorotic areas. Iron decient elds exhibit irregularly shaped yellow areas especially where subsoil is exposed at the surface (Follet and Westfall, 1992). Photo plate 12.6 (a) shows iron deciency with yellow and white patches and chlorotic veins. Zinc (Zn): For growth hormone production and internode elongation zinc is needed. The symptoms show up in middle leaves as it has intermediate mobility. Photo plate 12.6 (b) shows zinc deciency showing leaves with short internodes and chlorotic small leaves. Photo plate 12.6 : Deciency symptoms of (a): iron and (b) zinc (Source: http://bexar-tx.tamu.edu/HomeHort/F1Column/2003A rticles/Graphics/iron%20chlorosis.jpg; http://agri.atu.edu/people/Hodgson/FieldCrops/Mirror/Nutrient%20Def_les/slide24.jpg) 185
Soil Nutrient Deciencies as Reected in Crops/Plants Calcium (Ca): Calcium is component of plant cell walls and regulates cell wall construction. Insu cient calcium can cause young leaves to become distorted and turn abnormally dark green. Leaf tips often become dry or brittle and eventually wither and die. Stems become weak and germination is poor. Photo plate 12.7 shows calcium deciency where growing point of leaves or fruits are damaged (die back). Photo plate 12.7 : Deciency symptoms of calcium (Source: http://hubcap.clemson.edu/~blpprt/acid_photos/BlossomEndRot.JPG) Copper (Cu): Copper is needed for chlorophyll production, respiration, and protein synthesis. Cu decient plants display chlorosis of younger leaves, stunted growth, delayed maturity. Cu decient plats are prone to increased disease specically ergot (a fungus causing reduced yield and grain quality, Solberg et al. 1999). Winter and spring wheat are most sensitive crops to copper deciency. The visual identication process has many limitations which are discussed below: 1. Many symptoms appear similar: For example, N and S deciency symptoms can be very similar depending on plant growth stage and severity of deciencies. In general N deciency appears on older leaves while S deciency on new leaves. 2. Multiple deciencies and /or toxicities can occur at the same time: An abundance of one nutrient can lead to deciency of another nutrient. Very often more than one deciency or toxicity can produce severe deciency symptoms. 3. Crop species even some cultivars of the same species di er in their ability to adapt to nutrient deciencies and toxicities: e.g. Field observation suggest corn is more sensitive to a Zn deciency than barley. 4. Pseudo (false) deciency symptoms There are many factors like soil compaction, disease, drought, excess water, genetic abnormalities, insects, herbicides, and pesticides residue that generally lead to visual symptoms that appear like nutrient deciency symptoms. 186
Soil Nutrient Deciencies as Reected in Crops/Plants 5. Hidden hunger: Nutrient deciency without visual symptoms is called \"Hidden Hunger\". 6. Field symptoms appear di erent than \"ideal\" symptoms: Crop health and productivity slows down due to hidden hunger. By the time visual symptoms appear it is often too late for correction measures. Therefore, experience and eld history are important aide in diagnosis of nutrient stress. Nutrient Antagonism Antagonism refers to the competition between nutrients for uptake by plants. Two nutrients often ions with same charge are said to be antagonistic with regards to the other. Table 12.4 depicts the antagonistic e ects of nutrients in plants. Elements in excess Table 12.4 : Nutrient A ntagonisms Nitrogen Potassium Nutrients usually a ected Phosphorus Potassium, calcium Magnesium Nitrogen, calcium, magnesium Iron Zinc, Copper, Iron Manganese Calcium, Potassium Copper Manganese Zinc Iron, Molybdenum, Magnesium Molybdenum Molybdenum, Iron, Manganese, Zinc Sodium Iron, Manganese Aluminium Copper, Iron Ammonium Ion Potassium, Calcium, Magnesium Sulfur Phosphorus Calcium, Copper Molybdenum (Source: IPNI (International Plant Nutrition Institute) 187
Soil Nutrient Deciencies as Reected in Crops/Plants Nutrient Management Soil and plant nutrient management cannot be dealt with in isolation but should be promoted as an integral part of productive farming.In general nutrient-pathogen interactions are not well understood. However, deciency of plant nutrient may lead to disease susceptibility due to changes in plant metabolites concentration leading to an environment for disease development.For e.g. Calcium deciency can lead to membrane leakage of sugars, amino- acids and other compounds that then become available for pathogens.Soil testing complements plant analysis however soil tests are not su cient in predicting nutrients that leach e.g. N ans S. Plant analysis assess nutrient uptake and is a better tool to measure micronutrients like B, Fe and Mo. A typical plant testing programme has four phases as follows: Collection of plant samples, Analysis of plant samples, calibration, and interpretation of results of chemical analysis and recommendation. Before giving the plant samples to a testing laboratory for chemical analysis collection and preparation of samples should be done with perfection. Plant analysis results should always be interpreted by a trained scientist. Plant analysis data must be compared to establish su ciency ranges. Conclusion There is a strong need to train farmers and youths under green skill building development programme to learn plant nutrition management for better farmer livelihood, food security and environment. Plant analysis helps to manage crop nutrient status. It conrms a diagnosis made from visual symptoms. Identies \"hidden hunger\" when no symptoms appear. Provides site- specic estimates of nutrient removal and pinpoints potential soil problem areas. Soil health card are tools through which harmers could be made aware of the nutritional status of their farms resulting in the farmer's taking informed decision on nutrient application and management. The soil health cards can be made more useful through pictorial depiction of nutrient deciencies as reected in plant parts for farm specic crops in hotspot areas of Uttarakhand with signicant micronutrient deciency. The micronutrient deciency maps are available with the state agriculture department. References Bennett WF (1993). Nutrient deciencies and toxicities in crop plants. St. Paul,Minn. APA Press, 202. Brady NC and Weil RR (2002). The nature and properties of soil. Upper Saddle River,N.J. Prentice Hall, Inc. 690. Engel R,Bruebaker IJ and Ornberg TJ (2001). A chloride decient leaf spot of WB881Durum. Soil. Sci. Soc.Am. J. 65, 1448-1454. Follett RH and Westfall DG (1992). Identifying and correcting zinc and iron deciencyin eld crops. Colorado State University Cooperative Extension. Service in action.No.545.http://cospl.coalliance.org/fez/eserv/co:6978/ucsu2062205451992internet .pdfSoil Nutrient Decencies as Reected in Crops/Plants200 188
Soil Nutrient Deciencies as Reected in Crops/Plants Grundon NJ (1987). Hungry crops: A guide to nutrient deciencies in eld crops.Brisbane, Australia: Queensland Government, 246. Havlin JL, Beaton JD, Tisdale SL and Nelson WL (1999). Soil fertility and fertilizers,6th Edition Upper Saddle River, N.J. Prentice-Hall, Inc. 499. Jacobsen JS and Jasper CD (1991). Diagnosis of nutrient deciencies in Alfalfa andWheat. EB43, February 1991. Bozeman, Mont. Montana State University Extension. Jones Jr JB (1998). Plant nutrition manual. Boca Raton, Fla.CRC Press, 149. Mengel K and Kirkby EA (2001). Principles of plant nutrition. Netherlands. KluwerAcademic Publishers, 849. Solberg E, Evans I and Penny D (1999). Copper Deciency: diagnosis and correction.Government ofAlberta,Agriculture and Rural development.Agdex, 532-3. 189
13 MULTIVARIATE TECHNIQUES FOR DATA ANALYSIS FOR ENVIRONMENTAL MONITORING Raman Nautiyal Abstract Monitoring of environmental indices is an important exercise to ensure the parameters are following the norms laid out. While carrying out the exercise, lot of data are generated that need to be analyzed for drawing patterns, estimating parameters, understanding underlying relationships, establishing cause-and-e ect relations, etc. There are several methods in basic statistics like central tendencies, dispersions, etc., that help us to achieve the aim. However, multivariate analysis, both exploratory and inferential, gives us a strong insight into the datasets and help in formulation of hypothesis, discovering patterns, exploring data and drawing meaningful and applicable inferences. Three important multivariate techniques are discussed in this chapter, namely, cluster analysis, multiple regression analysis and factor analysis. A case study with detailed analysis in each case has been included. The methods have been treated with an aim to make them easy for a researcher with non-mathematical background. Some of the output, mainly of interest to an analyst, has been omitted from the chapter to make the understanding clear and applicable in situations foreseen from the point of view of the stakeholder. Keywords: cluster analysis, factor analysis, multiple regression analysis Introduction Environmental monitoring is concerned with ensuring that the involved processes perform in a way to ensure compliance with the set standards. Almost all the processes involved generate a large amount of data that requires statistical tools to analyze and draw inferences about the processes, whether they are environmentally compliant or not, where are the gaps and what is required to ll in those gaps to make them compliant. The information generated also provides knowledge for policy formulation and implementation. Raw data is not able to give all that is required. There are several tools and techniques in statistical Raman Nautiyal Division of Forestry Statistics, ICFRE, Dehradun [email protected] Monitoring and A ssessment of Environmental Parameters Eds. V. A gnihotri, S. Rai, A . Tiwari, S. Mukherjee, K. Kumar, R. Joshi, GBPNIHE, A lmora, Uttarakhand, India ©GBPNIHE 2020 190
Multivariate Techniques for Data A nalysis for Environmental Monitoring literature that are of use to address the issue but putting all of them in this chapter is not possible. Some tools are basic like measuring central tendencies with the help of measures like mean (arithmetic, geometric and harmonic), median, mode, etc. or the scatter around the mean with measures of dispersion like variance, standard deviation, range, etc. Although these measures are helpful in describing the outcome of a process, higher order statistical tools are required to extract information that is useful for taking decisions. After the introduction of soft computing, these higher order tools have taken a front foot as computations have become relatively easy to do and the output, that was earlier di cult to obtain or prepare, has also become come within the reach of analysts. The areas of designing experiments for optimizing factors and multivariate analysis has seen a quick growth with the advancement of soft computing and statistical software. The main foundations that are required to build a strong analytical framework are how the data are collected and the tools used to analyse them. Collecting data largely depends upon two processes, namely, conducting a sample survey or designing an experiment. While the former focuses on collecting data from a population to estimate the population characteristics and is usually done on the existing sample point without any interference from the enumerator, the latter gives the exibility of manipulating factors (or treatments as is generally known in biological parlance) to see how di erent levels of the chosen factors or kinds of factors a ect the response variables. Using these we can test certain hypotheses we have developed in relation to the processes that we wish to monitor. For example, we can hypothesize that the levels of air pollution decreases as we move perpendicularly away from a certain stretch of highway or a category of plants are helpful in combating air pollution as recorded from the index values (known as Air Pollution Tolerance Index in environmental context). Certain hypotheses can be developed to see how changes in factor levels bring about a change in the response. As an example, one can hypothesize that a certain combination of techniques can help in combating e uents from industrial plants. Based upon these hypotheses, studies are then planned to test the correspondence statistical hypotheses. These studies may be in the form of sample surveys or a statistically designed experiment. The inference is then drawn after proper data analysis giving a probability value to account for uncertainty in the results. These inferences help us to take decisions with regard to the process under investigation. In this chapter, three important situations requiring Cluster Analysis, Multiple Regression Analysis and Factor Analysis, have been explained with appropriate examples. Identifying Similar Cases In many situations, it is the aim to nd out similarities among locations or individuals. For example, one might be interested in grouping places having similar levels of pollution load. This will help in taking decisions or formulating hypotheses about the possible treatments. It may, however, be noted that the pollution load may be dened by several parameters and not a single one. Once observed, an index may be formulated to classify the locations. As an example, consider the following eight places (A, B, C, D, E, F, G and H) that have various levels of pollutants in air, water and soil (Table 13.1). To keep the output simple, a sample 191
Multivariate Techniques for Data A nalysis for Environmental Monitoring dummy dataset has been taken for the purpose of demonstration. The abbreviations have the usual meanings and expansions as used in environmental literature. Table 13.1 : Observed dummy value of pollutants Location CO SO 2 NO 2 PM10 PM2.5 A 52 B 72 0.05 125 22 63 C 124 D 75 0.01 123 19 109 E 98 F 100 0.69 98 26 83 G 125 H 68 0.25 162 31 143 52 1.33 138 18 95 0.23 175 29 71 0.14 119 35 86 1.2 106 27 Based on the observed values, it is desired to group the places into homogenous clusters, i.e., places that can be classied as being near to each other based on the levels of pollutants. It is pertinent to note that the di erent pollutants have varying values and a place may be badly polluted due to one and not so due to some other pollutant. A handy multivariate technique known as Cluster Analysis is often used in such cases. Using a proper distance measure and an algorithm, we can from groups of places or 'clusters' that are like each other with respect to the pollutant load when all ve pollutants are taken together. The application of the technique results in a hierarchical tree plot that helps us to understand the patterns and to decide the number of groups or clusters that are present in the dataset. It is an exploratory technique meaning thereby that it does not have a probabilistic statement attached to it. Mostly, cluster analysis is of three kinds, Hierarchical Tree, k-means and 2-way joining, but for the purpose of this example, Hierarchical Tree is used as it is the appropriate method.Upon running the algorithm, a distance matrix is obtained (Table 13.2). 192
Multivariate Techniques for Data A nalysis for Environmental Monitoring Table 13.2 : Distance Matrix after running the algorithm ABCDE FGH A 0.0 12.0 81.9 68.7 52.0 63.6 74.4 94.2 B 12.0 0.0 70.9 61.9 44.5 60.0 64.3 82.9 C 81.9 70.9 0.0 73.3 68.1 87.4 36.9 24.9 D 68.7 61.9 73.3 0.0 33.5 39.7 46.2 68.1 E 52.0 44.5 68.1 33.5 0.0 59.7 41.7 65.5 F 63.6 60.0 87.4 39.7 59.7 0.0 74.2 91.9 G 74.4 64.3 36.9 46.2 41.7 74.2 0.0 28.0 H 94.2 82.9 24.9 68.1 65.5 91.9 28.0 0.0 Careful examination of the above matrix reveals that it is a triangular matrix with the values in the lower triangle equal to the values in the upper triangle. This matrix gives the distance of each pair of location using the Euclidean distance measure and single linkage amalgamation (or linkage) rule. There exist several distance measures and amalgamation rules and selection of a pair of them depends upon the dataset, the situation being investigated and the knowledge of the analyst. However, Euclidean and Squared Euclidean measures are most commonly used with interval data. The researcher may x a distance between two places that can be considered as the minimum distance for the two to belong to di erent clusters or have a look at the tree diagram that gives an idea of the groupings in the given dataset.Atypical tree diagram for the above matrix is given Figure 13.1. Tree Diagram for 8 Cases Single Linkage Squared Euclidean distances A B C H G D E F 0 500 1000 1500 2000 Linkage Distance Figure13.1 : Tree Diagram for 8 cases single Linkage Squared Euclidean distances. 193
Multivariate Techniques for Data A nalysis for Environmental Monitoring A cursory glance at the diagram reveals presence of four clusters with membership as given in Table 13.3. Table 13.3 : Representation of clusters with their respective membership Cluster Membership 1 A, B 2 3 C, H, G 4 D, E F From Table 13.3, it is seen that A and B are close to each other in pollutant load; C, H and G are close, D and E are close and F is the isolated place. The above output may help us in taking decisions about management of the pollutant load, or even simple decisions like purchase of property, living, etc. In some cases the technique may also help in formulating hypotheses for studies. As an example, we may formulate the hypotheses of e ect of distance of these places form some source of pollution and then test it with appropriate data. Though there are other outputs as well like amalgamation schedule, icicle plots, the above two outputs are su cient for taking decisions or measures to manage the situation. As the technique is exploratory, it is worth mentioning that the user gets an idea of using it by repeated practice and studying the tree plots (also known as dendrograms) and mapping the output with the situation. It may also be necessary to transform the variables or rescale them if scale of measurements is vastly di erent. Predictive Modelling and Picking Important Predictors Predictive modelling is a technique that helps in predicting a situation using some explanatory (predictors or independent variables). For example, one might be interested in knowing the incidence of disease (in terms of infection) based on one or more explanatory variables like the ones described in the section above. It is common knowledge that environmental degradation is one of the causes of prevalence of diseases like respiratory problems, stomach related issues, etc. The formulation of similar problems can be understood in the same way. One may be interested in estimating the photosynthesis rate based on air pollutants hypothesizing that photosynthesis is a ected by the particulate matter (dust, etc.) and other predictors. Thus, the problem here is of estimating a dependent variable by use of independent variables. 194
Multivariate Techniques for Data A nalysis for Environmental Monitoring The concept of dependent and independent variables must be clear. Although in many situations the dependent variable may seem to be an independent one. As a situation consider predicting the forest cover using satellite imagery. Here forest cover will be taken as the dependent variable and the colour reected by the remotely sensed image as the independent one. In common knowledge the situation is just the reverse. Colour depends upon the forest cover. Why this anomaly then? The problem statement claries that it is easier to predict forest cover using the colour if the relationship between the two is known. Thus, the procedure has progressed from the nomenclature of dependent and independent variables to criterion and predictor respectively. Some common techniques for building such a model are regression analysis (linear and non-linear), Discriminant Function Analysis (DFA), logistic regression, etc. The use of technique is decided by the scale of the variables. If predictor and criterion are both metric, regression analysis is used. If the criterion variable is classicatory and predictor variables are metric, DFA is the right technique to be used. If the criterion is ordinal and predictors are metric, multinomial regression provides the solution. Thus, before the analysis is attempted, it is important to note the scale of measurement of the variables. Consider the problem of studying the e ect of pollution on the rate of photosynthesis of a certain species. This study can also help in deciding whether measuring photosynthesis can provide a clue of the pollution levels in a area. The technique is applicable in similar situations. It should also be noted that the number of cases should be large enough to allow bifurcating the dataset into two parts, the model building set (or training set) and the validation set. Around 60 to 70 % of the cases are used to build the model and the remaining for validating the model. The dataset taken is a dummy dataset to explain the procedure and may not be large enough according to the standards. The actual procedure is to build the model using the training set and then predict the value of the criterion variable from the values of the predictors in the validation set and comparing the two sets of values (observed and expected) using some appropriate measure like chi-square test to see if the di erence between the observed and the expected can be attributed to chance alone. As an example, consider a dependent variable 'Y' and ve independents X1, X2, X3, X4 and X5. Here it is important to understand that Xi and Y is a general nomenclature, and this can be adjusted to the phenomenon being studied. For example, the Xi can be the ve pollutants included in the above section. The independent variable 'Y' can be the photosynthesis rate, disease incidence or any other variable of interest. The data set comprises 100 points and is large enough not to be included in the chapter. Since the Y depends, or is assumed to depend, upon the ve independents. This problem of building a predictive model is approached by tting a multiple regression to the given dataset. The functional form of the model is Y = b0 + b1X1 + b2X2 + b3X3 + b4X4 + b5X5 Here, b0 is the constant (also known as the intercept) and bi (the coe cients of Xi), i = 1,2...5 195
Multivariate Techniques for Data A nalysis for Environmental Monitoring are the partial regression coe cients. It is also important the independent variables are independent of each other, that is rarely true in biological systems. However, there are methods that can be used to address this problem of collinearity. When the algorithm for a multiple regression is run, we get the output as provided in Table 13.4: Table 13.4 : Performance of coe cients and S.E. in multivariate regression Beta Std.Err. of B Std.Err. of t(94) p-level Beta B -0.06 0.09 Intercept -0.01 0.02 -0.01 0.01 -0.74 0.46 X1 0.01 0.01 0.08 0.20 -0.85 0.40 X2 0.08 0.02 0.09 0.02 0.42 0.68 X3 0.94 0.01 0.01 0.00 3.98 0.00 X4 -0.20 0.02 -0.65 0.06 110.64 0.00 X5 -10.19 0.00 R2 = 0.92, A djusted R2 = 0.89 The Beta coe cients help us to decide whether a predictor is strong or not while B coe cients help is to formulate the model. To determine how strong a predictor is we see the p-levels of the t values corresponding to the predictors. Careful examination of the above table reveals that the p-levels of the t-values corresponding to X1 and X2 are not signicant (all are greater than 0.05, the level of signicance. This level can be set by the researcher but is usually xed at 0.05) thereby indicating that these two predictors are not strong to predict Y, the dependent variable. The three strong predictors are X3, X4 and X5 as all the associated p-values are < 0.05. The model can be written as: Y = -0.06 - 0.01X1 + 0.08X2 + 0.09X3 + 0.01X4 - 0.65X5 Since few predictors are not strong, it will be better if we recalibrate the model using a step- wise procedure to obtain better predictions by including only those variables that are good predictors.Abackward procedure is used. The output is given in Table 13.5. Table 13.5 : Performance of coe cients and SE in Post multivariate regression Intercept Beta Std.Err. of Beta B Std.Err. of B t(96) p-level X3 -0.019822 0.037773 -0.5248 0.600952 X4 0.085241 0.018840 0.100257 0.022159 4.5244 0.000017 X5 0.945134 0.008256 0.010191 0.000089 114.4743 0.000000 -0.218263 0.013356 -0.699857 0.042824 -16.3425 0.000000 196
Multivariate Techniques for Data A nalysis for Environmental Monitoring This model includes only the good predictors and the coe cients have changed. The stepwise model is (with gures rounded o to two places of the decimal) Y = -0.02 + 0.10X3 + 0.01X4 – 0.70X5 Residual analysis throws more light on the goodness of the tted model. Finding Interrelationships In several situations the aim of the study is to nd out the underlying interrelationships of the variables that are observed for environmental monitoring. As a case study let us consider a dataset prepared by observing 16 variables related to environmental monitoring. X1,...,X16. Out of these some may relate to air pollution, some to water and some to soil. The idea is to group these variables and form factors (better known as latent factors). These are di erent from factors as used in techniques like Analysis of Variance or t-test. In this case a factor is a combination or a group of some of these variables. Few of these may combine to form factor 1, other few to form the second factor and so on. An understanding of the procedure will be made easy by going through the analysis. Factor analysis is of two kinds – exploratory and conrmatory. In the former we try nding factors through understanding interrelationships present in a dataset while in the latter we conrm the factors that we have already thought of. The subject matter in this section is limited to exploratory factor analysis. A dataset of 200 cases has been taken as an example. Each case has been observed for sixteen variables. Let us understand this as observing each of the sixteen environmental indicators for each case (locality, city, coordinate or any unit which we are interested in). As an initial step we obtain an unrotated solution for the problem. The factor loadings are obtained in Table13. 6. Table 13.6 : Factor loadings in Factor A nalysis with 3 factors Variable Factor 1 Factor 2 Factor 3 X1 0.94 -0.26 0.09 X2 0.01 0.99 0.01 X3 0.02 0.88 0.11 X4 -0.56 0.80 -0.08 X5 0.96 -0.13 0.09 X6 -0.02 0.97 -0.03 X7 0.95 -0.19 0.10 X8 0.96 -0.18 0.08 X9 0.98 -0.03 -0.02 X10 0.97 0.00 -0.19 X11 -0.82 0.17 0.46 X12 0.93 0.05 -0.32 X13 0.97 0.02 -0.17 X14 0.10 -0.03 0.96 X15 0.87 0.21 -0.16 X16 -0.63 0.15 0.75 Expl.Var 9.49 3.60 1.93 Prp.Totl 0.59 0.23 0.12 197
Multivariate Techniques for Data A nalysis for Environmental Monitoring The values given in the table above are loadings of a variable on a factor. Loading can be understood as correlation of a variable with a factor. Thus, the rst value, 0.94 is the loading of X1 on Factor 1, -0.26 is the loading of X1 on Factor 2 and 0.09 is the loading on Factor 3. All other values are interpreted thus. Now, a decision has to be taken on what value of the loading is to taken as signicant. By default a loading of 0.70 is taken as signicant but in certain situations it can be reduced to as low as 0.60. A variable is said to belong to the factor on which it has the highest and signicant loading (in this case loading more than 0.60). Un-rotated solution is not always the desired solution. For example, some of the variables pertaining to air pollution may have a higher loading on factor made up by variables of water pollution. In such cases the researcher may not be convinced by the solution. It now becomes important to rotate the axis of factor coordinates to seek a better solution. There are several methods of rotating the axes like Varimax, Varimax Normalized, Quatrimax, etc. Using Varimax Normalized Rotation we get the following output (Table 13.7). Table 13.7 : Output using Varimax noramalized rotation Variable Factor 1 Factor 2 Factor 3 X1 0.94 -0.26 0.09 X2 0.01 0.99 0.01 X3 0.02 0.88 0.11 X4 -0.56 0.80 -0.08 X5 0.96 -0.13 0.09 X6 -0.02 0.97 -0.03 X7 0.95 -0.19 0.10 X8 0.96 -0.18 0.08 X9 0.98 -0.03 -0.02 X 10 0.97 0.00 -0.19 X 11 -0.82 0.17 0.46 X 12 0.93 0.05 -0.32 X 13 0.97 0.02 -0.17 X 14 0.10 -0.03 0.96 X 15 0.87 0.21 -0.16 X 16 -0.63 0.15 0.75 Expl.Var 9.49 3.60 1.93 Prp.Totl 0.59 0.23 0.12 198
Multivariate Techniques for Data A nalysis for Environmental Monitoring Going through the output above we can suggest the following factors: Factor 1: X1, X5, X7, X8, X9, X10, X11, X12,X13, X15 Factor 2: X2, X3, X4, X6 Factor 3: X14, X16 If a variable does not load signicantly on any factor, it is kept out of the nal solution. This can provide useful information for future research and the variable can be omitted from further analysis.Also note that 16 variables have been reduced to 3 factors. Factor scores also provide a method to carry out multiple regression analysis as the scores are independent of each other and the correlation coe cients are 0. One question that a researcher faces is to determine how many factors to extract. There are two ways to arrive at a decision. One is to observe the scree plot and select the number of factors equal to the location of the point where the scree becomes parallel to the x-axis. However, this depends upon the judgement of the researcher. An objective criterion is to use the Kaiser Criterion according to which the factors that have an eigenvalue of at least 1 should be extracted. The eigenvalues of this dataset are given in Table 13.8. Table 13.8 : The eigenvalues and % total variance explained by the factors Factor Eigenvalue % Total variance Cumulative Eigenvalue Cumulative % 1 9.89 61.81 9.89 61.81 2 3.39 21.20 83.01 3 1.74 10.88 13.28 93.90 4 0.56 3.51 15.02 97.41 5 0.30 1.85 15.59 99.26 6 0.06 0.38 15.88 99.64 7 0.03 0.21 15.94 99.85 8 0.01 0.07 15.98 99.92 9 0.01 0.05 15.99 99.97 10 0.00 0.01 15.99 99.98 11 0.00 0.01 16.00 99.99 12 0.00 0.01 16.00 13 0.00 0.00 16.00 100.00 14 0.00 0.00 16.00 100.00 15 0.00 0.00 16.00 100.00 16 0.00 0.00 16.00 100.00 16.00 100.00 199
Multivariate Techniques for Data A nalysis for Environmental Monitoring The rst three factors explain 93.9% of variance and hence these are enough to nd interrelationships. The rest of the factors can be overlooked.The eigenvalue of fourth factor is 0.56 and, therefore, it explains less than 10% of the variance. Thus, we opted to extract only three factors. This can also be changed to suit the situation. The 3-dimensional plot of the factors extracted can then be drawn to plot the dataset in three dimensions. The plot is given below: Factor Loadings, Factor 1 vs. Factor 2 vs. Factor 3 Rotation: Varimax normalized Extraction: Principal components XX62XX316 X14 X15XXX113X90XXX5871 X11 X12 X4 Figure 13.2 Three-dimensional plot of factors expressing the dataset in three dimensions References Johnson RA and Wichern DW (2017). Applied Multivariate Statistical Analysis.Pearson India educational ServicesNoida. Morrison DF (1978). Multivariate Statistical Methods. Tokyo. McGraw Hill. 200
G.B. Pant National Institute ISBN: 978-93-5396-711-6 of Himalayan Environment Ptd. by : Almora Book Depot www.almorabookdepot.com
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222