LAB MANUAL STM3117 Food Science Research Methodology FACULTY OF FISHERIES AND FOOD SCIENCE UNIVERSITI MALAYSIA TERENGGANU 1

LAB TABLE OF CONTENT PAGE 1 3 2 DESCRIPTION 7 3 8 4 Developing elements in research proposal (Part A) – 10 Generating Title, identifying research problem, justification 5 and objectives 20 6 Developing elements in research proposal (Part B) – Frame 23 Out Literature Review and Reference Management 7 35 8 Developing elements in research proposal (Part C) – 53 Identifying Experimental Design Concepts in Experiment Introduction to SPSS Variable and data entry – SPSS - Data Entry / Input of data - Data Cleaning - Data Transformation Descriptive Statistical Analyses – SPSS - Frequency/percentage table, charts, etc, Cross- tabulation, - Mean, median, mode, maximum, minimum - Range, variance, standard deviation, coefficient of variation Inferential Statistical Analyses (Part A) – SPSS - Parametric Tests: Test of hypothesis of differences between means – Z-test, t-test, ANOVA. - Non-parametric Tests: Chi-square, Mann-Whitney, Kruskal Wallis, Inferential Statistical Analyses (Part B) – Minitab - Introduction to Minitab & t-Tests - One-way and Two-way ANOVA Inferential Statistical Analyses (Part C) – SPSS - Reliability and validity of the measurement scale – Pearson correlation, Spearman rho, Cronbach’s α (alpha) 2

Lab 1 Developing Elements in Research Proposal (Part A) Generating Title, identifying research problem, justification and objectives You are provided with five essays from the Introduction Section of five different research articles. In your group discussion, please provide the following information that you should extracted from each article (Article 2 to 4); Article No : ______ 1. What is the research problem that has led to this study? 2. What is the justification for the research problem? In other words, why do/does researcher/s think that this is an important matter to study? 3. Who would benefit from reading this study? In other words, what is/ are the impact/s of the study findings to whom that may concern? 4. What is/are the objective/s of the study? 5. Suggest a good title for the article. 6. Do you think this is the basic or applied research? Give your reason. Note: The information for Article 1 has been discussed. Please submit information for the remaining articles by the end of the lab session 3

Apendices for Lab 1 Article 1 – From Yinliang Zhang & Wenshui Xia (2008) Salting fish is one of the oldest treatments in food preservation and is still popular even in developed countries for its low processing costs and satisfying consumer’s habits (Zugarramurdi & Lupin, 1980). A high level of salt decreases water activity and thus hinders germ development and proliferation of fungi and yeast. It also contributes to developing desirable characteristic flavour to the products (Ismail & Wootton, 1982). A high level of salt results in consumer rejection whereas a low level of salt brings about shelflife problem. Therefore, simple, sensitive and reliable assay of salt content becomes more important for salted foods. There are several methods to detect sodium chloride. First is the volumetric method (AOAC, 1997; Yu, (1991)). This method is based on Mohr method and Volhard method, but they are time-consuming and difficult for routine analysis of a large number of samples. This method consumes much of silver reagent and there may be serious interference from other sample components. Second is the potentiometry method (Lapa et al., 1995; Pe´rez-Olmos et al., 1997). It is simple, economical and has higher sampling rates. The most serious problem with this method is its lack of accuracy. An error of ±0.5 mV in the electromotive force can cause a 2% error in the concentration calculated for the electrode sensitive to single charge species. The third is the ion chromatography method (Zhou & Guo, 2000; Alcazar et al., 2003), which needs high investment on equipment and consumes more reagent. Another disadvantage in this method is that it is less selective. There are other methods such as flow injection analysis (Taylor & Grate, 1995; Chalk & Tyson, 1998; Couto et al., 1998; Silva et al., 1999) and robotic method (Velasco-Arjona et al., 1998). These methods need high investment on equipment. Until now, no published literature is available on the determination of sodium chloride in food by measuring turbidity with spectrophotometer (Xian & Chen, 2003). The aim of this study was to set up a new method with turbidity measurement and to compare this method with Volhard method. Article 2 – From Mian-hao Hu & Yansong Ao (2007) The melon fruit belongs to the family of Cucurbitaceae and is cultivated in all the tropical regions of the world. Melon (Cucumis melo) seeds, besides possessing medicinal qualities (Lal & Lata, 1980; Woo et al., 1981; Bellakhdar et al., 1991), are also a rich source of protein (53.90%) and oil (37.67%) (Badifu, 1993; Rashwan et al., 1993). In spite of being a rich source of oil and protein, the seeds are normally treated as waste products. The proximate composition of the seeds or seed kernels of C. melo from different origins and varieties has been reported (Madaan & Lal, 1984; Teotia & Ramakrishna, 1984; Kaur et al., 1988; Tekin & Velioglu, 1993). It has been reported that the seeds of pumpkin and watermelon can be utilised successfully as sources of good quality edible oil and protein for human consumption (Chowdhury et al., 1955; Sawaya et al.1983; Kamel et al., 1985; Lazos, 1986; Akoh & Nwosu, 1992). Reports are also available on the amino acids composition of the proteins of the melon seeds grown in Egypt (Lasztity et al., 1986; Rashwan et al., 1993) and India (Kaur et al., 1988) as well as on the fatty acids composition of the seed oil from Egypt (El-Magoli et al., 1979), 4

India (Hemavatahy, 1992) and Vietnam (Imbs & Pham, 1995). One of the prominent melon varieties which originated in China and has gained large acceptance in the south-east region of China is C. melo var. reticulates Naud, Hybrid F1 commercially classified as ‘ChunLi’. The fruits of the hybrid ‘ChunLi’ developed for its resistance against Fusarium weigh, on an average, about 1.5 kg and possess pulp of over 16_Brix. However, no report is available on the nutritional constituents of the seeds obtained from the fruits of this hybrid. The present work was therefore undertaken with the objective of analysing the proximate and amino acids compositions of the seed kernels and fatty acids composition of the seed oil obtained from the melon hybrid ‘ChunLi’. Article 3 – From Massoud Nejati-Yazdinejad (2007) Vitamin C, a water soluble vitamin, is an important micronutrient and plays many physiological roles (Machlin, 1984; Toloen, 1990). Fruits and vegetables constitute the principle source of vitamin in most human diets. Vitamin C, better known as l-ascorbic acid, is classified as a carbohydrate and has the chemical structure of 1- keto-1-threo – hexono-c-lacone-2,3-enediol. It is the enediol group which is responsible for the molecule’s acidic and reducing properties (Seib & Tolbert, 1982). Ascorbic acid concentrations are frequently determined by 2,6-dichlorophenolindophenol (DCPIP) titrations (Verma et al., 1996) or the official pharmacopoeial methods (British Pharmacopeia, 1981). Results are obtained rapidly but the methods are not particularly specific for ascorbic acid or very sensitive (Davies & Masten, 1991). If used with beverages or fruits, the colouring matter may interfere with determination of end-point. Many methods have been reported for the determination of ascorbic acid in pharmaceutical preparations, food products and biological samples and reviews have appeared on the results recently (Washko et al., 1992; Arya et al., 2000). These include indirect spectrophotometric methods based on the reduction of compounds such as DCPIP (Arya et al., 2000), iron (III) (Gu et al., 1996), bromate (Ensafi et al., 2002), the ketone derivatisation method with 2,4-dinitrophenylhydrazine (Shindoh et al., 1995) or o-phenylenediamine (Wu et al., 2003). Electrochemical (Wu et al., 2000; Zen et al., 2002), fluorimetric (Dilgin & Nisli, 2005; Wang et al., 2005), kinetic (Leon & Catapano, 1993; Safavi & Fotouhi, 1994), enzymatic (Daily et al., 1991; Zhu et al., 1996), chromatographic (Leubolt & Klein, 1993; Daood et al., 1994; Iwase & Ono, 1994), spectrophotometric (Sultan et al., 1994; Fujita et al., 2001; Guclu et al., 2005) and chemiluminescence (Kim et al., 1990; Perezruiz et al., 1995; Kato et al., 2005) methods have also been proposed. These methods have been used to increase the analytical sensitivity for ascorbic acid and some of them have been automated, but specialised equipments are required for these procedures (Arya et al., 1998). A sensitive analytical method was required that was simple, capable of use with a large number of samples (and possibly automation) and that would provide results quickly. The above methods did not completely satisfy these criteria especially when the level of ascorbic acid is very low. So a procedure was developed based on the reducing effect of ascorbic acid on Cu (II) ion and the chelating ability of alizarin red s (ARS) with excess of Cu (II) ion to provide a coloured solution. 5

Article 4 – From Kuley et al. (2008) Seafood products, including crustacean shellfish, are recommended for human diet due to their health-promoting characteristics (Skonberg and Perkins 2002). In terms of the amount of fat and the proportions of saturated, monounsaturated, and polyunsaturated fat, shellfish provide a healthful diet for humans (Dong 2001). The n3 and n6 essential fatty acids (EFAs) cannot be synthesized in humans, and therefore must be obtained through foods (Burr 2000). n3 polyunsaturated fatty acids (PUFAs) have been reported to be preventive effects in cancers, rheumatoid arthritis, multiple sclerosis, autoimmune disorders, coronary heart and inflammatory bowel diseases, inflammation and arrhythmias (Belluzzi 2001; Connor 2001; Leaf et al. 2003; Jahan et al. 2004; Mnari et al. 2007) and decrease the risk of sudden death among men without evidence of prior cardiovascular disease (Albert et al. 2002; Hall et al. 2007) in human and skin symptoms in infant (Minihane and Lovegrove 2006). Blue crabs, Callinectes sapidus, are important members of the estuarine food chain (Jop et al. 1997), because of their feeding on fish, aquatic vegetation, molluscs, crustaceans, and annelids while they serve as prey to mammals, birds, and fishes (FWRI 2005). The crab is usually consumed by European and Far East countries and is often recommended for pregnant women (Adeyeye 2002). Gokoglu and Yerlikaya (2003) indicated that blue crabs had high protein and low fat contents. Skonberg and Perkins (2002) reported that n3 PUFAs, such as eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA), corresponded to approximately 60% of the total lipid in green crab. Therefore, investigating the fatty acid compositions of crab species has great importance due to the good effects on human health. The fat and fatty acid compositions of fish can vary depending on fish species, diet, gender, location and season of capture (Gruger 1967; O¨ zog˘ul et al. 2007; Schmidt et al. 2006), and the section and type of muscle (Bell et al. 1998; Katikou et al. 2001; Thakur et al. 2002, 2003; Toussaint et al. 2005; Palmeri et al. 2006; Testi et al. 2006). Tsai et al. (1984) reported that the total lipid concentrations of blue crab were significantly correlated in gender. Akbar et al. (1988) found that the protein content of body meat and claw meat in swim crabs vary, and also protein is slightly higher in the edible portion of male crabs than female crabs. Although many studies were carried out on the proximate composition and nutritive value of blue crab, there is hardly any available information about proximate and fatty acid composition changes in both male and female crabs in lagoons and straits. Thus, the objective of this study was to determine and compare fatty acid and proximate compositions of the body and claw of male and female blue crabs from the different regions. 6

Lab 2 Developing Elements in Research Proposal (Part B) Frame Out Literature Review and Reference Management 1. Read the article (an excerpt) given in the following page entitled: “Physical Properties and Stability of Oil-in-Water Emulsions as Affected by Tamarind Seed Gum and Whey Protein Interaction” 2. List down related keywords 3. Frame out your Literature Review Subchapters (LRS) 4. By using Google Scholar, Science Direct, SpringerLink or Wiley Database: Search the most updated related articles (2020 – 2015) read the abstract and modify your (LRS) if necessary. 5. List down five most referred articles by providing: Authors. Year. Title. Journal Name. Volume. Pages. See an example: Le, X.T. and Turgeon, S.L. 2017. Formation and functional properties of protein-polysaccharide electrostatic hydrogels in comparison to protein or polysaccharide hydrogels. Advance in Colloid and Interface Science. 239:127- 135 6. With those five selected references, follow the steps below in order to manage and prepare a database of your references; a. Open a new folder with a suitable name. For each reference, save the pdf version using a suitable ID (considering the sequence, keywords, year of publication or first author). b. Prepare an excel spreadsheet as follows* and fill in related information extracted from each journal. Use column ‘Reference ID’ to link the journal as saved in (a) and column DOI to link to the journal URL (online). Save your work with a suitable file name. * Please refer to briefing and demonstration on how to prepare the spreadsheet. Also, see an example provided. 7. With regards to steps 1 to 5 above, please submit your work through ‘MyLearn’ by the end of the lab session. 7

Lab 3 Developing Elements in Research Proposal (Part C) Identifying Experimental Design Concepts in Experiments You are provided with five descriptions of different experiment (Case Study 1, 2, 3, 4, 5). In your group discussion, please provide the following information that you should extracted from each description; Case Study No : ______ 1. What is the Experimental Objective? 2. How many factors are there in the experiment and what are they? 3. What are the Treatments and their Arrangement? 4. What type of Assignment Rule used? 5. Name the Block (if any) used 6. How many Replications? 7. What is the Experimental Unit and How many experimental units are required? 8. Name the Response 9. Provide Sample layout and Diagram of Experimental Design Note: The information for Case Study 1, 2 and 3 has been discussed. Please submit information for the remaining case studies by the end of the lab session. 8

Appendices for Lab 3 Case Study 1 The extensibility of noodle may depend on three different types of flour (A, B and C) used and the levels of each flour (60 and 80%) in overall formulation. Three batches of noodles have been prepared for each type of flour at both levels. Case Study 2 An experiment was conducted to determine the effect of stevia extract and honey at four different ratios (50:50, 40:60, 30:70 and 20:80) which have been 50% substituted the sucrose content in energy drinks. The experiment used a total of 4 batches of energy drink for each treatment and the effect was studied in terms of viscosity. Case Study 3 A chemical engineer is designing the production process for a new product. The chemical reaction that produces the product may have higher or lower yield, depending on the temperature and the stirring rate in the vessel in which reaction takes place. They decided to investigate the interaction effects of temperatures (50°C and 60˚C) and three stirring rates (60, 90 and 120 rpm) on the yield of the process. Two batches of the food stock will be processed at each combination of temperature and stirring. Case Study 4 It is expected that total phenolic compound (TPC) and ascorbic acid (AA) content in different part of melon fruits. A researcher want to investigate the effect of different extraction solvent i.e. methanol, petroleum ether and acetone, on TPC and AA content of melon fruits. Separate experiment for peels and pulps of the fruits have been conducted and involved three batches of fruits. Case Study 5 Interaction of whey protein isolate (WPI) and carboxymethyl cellulose (CMC) will affect the consistency index of salad dressing. Three bathes of salad dressing will be prepared in order to know how the consistency of salad dressing at combinations of two levels WPI (0.4 and 0.6%) and two levels CMC (1.5 and 2.5%) in overall formulation. 9

Lab 4 Introduction to SPSS, Variable and Data Entry INTRODUCTION TO SPSS SPSS is a Window based program that can be used to perform data entry and analysis and to create tables and graphs. SPSS is capable of handling large amounts of data and can perform all of the analyses covered in the text and much more. Originally it is an acronym of Statistical Package for the Social Sciences but now it stands for Statistical Product and Service Solutions (Chaleunvong 2009). It is one of the most popular statistical packages which can perform highly complex data manipulation and analysis with simple instructions. There are few SPSS modules: SPSS Base for Windows and Mac SPSS Regression Models SPSS Tables SPSS Trends SPSS Advanced Models GETTING STARTED Starting SPSS for Windows Launch SPSS either by double clicking the SPSS icon on the desktop, or from the Start menu – SPSS will have a group under programs. The opening screen should appear as: You can immediately start putting data into the datasheet, or open a previously saved file. Layout of SPSS Layout of SPSS 10

Layout of SPSS Data View - is where you see the data you are using. Variable View – is where you can specify the format of your data when you are creating a file or where you can check the format of a pre-existing file. SPSS Menus and Icons File includes all of the options you typically used in other programs, such as open, save, exit. Notice, that you can open or create new files of multiple types as illustrated to the right. Edit includes the typical cut, copy, and paste commands, and allows you to specify various options for displaying data and output. Click on Options, and you will see the dialog box to the left. You can use this to format the data, output, charts, etc. 11

View allows you to select which toolbars you want to show, select font size, add or remove the gridlines that separate each piece of data, and to select whether or not to display your raw data or the data labels. Data allows you to select several options ranging from displaying data that is sorted by a specific variable to selecting certain cases for subsequent analysis. Transforms includes several options to change current variables. For example, you can change continuous variables to categorical variables, change scores into rank scores etc. Analyze includes all of the commands to carry out statistical analyses and to calculate descriptive statistics. Graphs include the commands to create various types of graphs including box plots, histograms, line graphs, and bar charts. Opening a Saved Data File Go to “File” at the top right-hand corner of your screen. Select “Open”, then select “Data”. And, then, select the correct drive (e.g. : A – from floppy; C – from hard drive), and either select the file from the list provided, or type in the file name. The saved data will open to the datasheet. Entering Data into SPSS How to type in data: Open SPSS, click on type in data and then click OK. The other way is to open an existing SPSS data file under File click on New. 12

You should have a blank file that looks like below: The Data View window is where you can type in your data. However, you must first tell SPSS certain things about your data and you will do this in the Variable View window. Variable View window has 10 columns and they tell program different things about measurement values such as whether or not the values are qualitative or quantitative. 1. Name – name of your variable. Can only be 8 characters long. Before typing in data, you must enter a name for your variable. (e.g. : Gender, race) 2. Type – what type of variable (qualitative or quantitative). The default is quantitative. Now, if you are going to be typing in numbers, you don’t need to anything. * How to differentiate between quantitative and qualitative data? a. Quantitative: examples of quantitative data are height, weight, number of ice cream purchased etc. b. Qualitative : examples of qualitative data are blood type, eye colour, gender etc. 3. Width and Decimals – to modify the format of the number if you like 4. Label – can be used to give a variable a more meaningful name. Just click on the appropriate cell and type in the desired name. 5. Values – used with qualitative data. For example female or male, you should enter these as codes 1 & 2 respectively. Type your first number code in the Value box and its word name in the Value Label box. 13

6. Missing – you can use this to tell SPSS any numbers you are using to code for missing data. 7. Columns – use the up and down arrows to the right of this box to set the width of the column in Data View. 8. Align – the alignment of data in the columns in Data View. 9. Measure – the default is nominal. Types of data in Measure: Nominal: categories only, data cannot be arranged in an ordering scheme (examples: East, West, North, South) Ordinal: categorical data where there is a logical ordering (example: Likert scale, 1= strongly agree, 2=agree, 3=neutral) Interval: differences between values can be determined but no inherent starting point. Data generally obtained from the measurement of quantities such as temperature. Ratio: data contain an inherent starting point. Examples: height, weight Saving Data To save data, click one time on “File” in the top left-hand corner of the screen. Select “Save”. Select drive data will be saved to and type in a file name. Examples of Data View and Variable View: Data View 14

Variable View Data Cleaning Why clean your data? It’s a screening process to; Steps to clean your data: 1. Detect errors: Missing data Outliers 2. Make sure data meets assumptions to analysis – by checking normality 1. Check for missing data: Go to “Analyze” in SPSS, then click “Descriptive statistics” “Explore” “Plots” Histogram Normality plots with tests Example: 2. Check for normality: Still using the same information from “Explore” in SPSS Look at – Descriptive table, Test of normality table, Histogram, Box plot Example of Descriptive Table: 15

Example of Test of Normality Table: If the table showed significant value = not normally distributed Example of Histogram: Not normally distributed Normally distributed Example of Box Plot: If the median located exactly at the middle of the box, the sample distribution is normal. The closer the median to the upper boundary, the more negative the distribution is skewed. Another way to look at our data normality is to use Normal Q-Q Plot: 16

The dots (small circles) in the Normal Probability plot represent the pair of observed and expected value from the normal distribution. The closer all the dots to the straight line, the closer the variable to the normality. Detrended Normal Q-Q Plot: If the dots in the plot cannot be associated with any pattern of clustering and all of them assemble around a horizontal line through zero, the sample is justified to be concluded as normal. Another way to test normality is through Kolgomorov-Smirnov test. Click on ‘Analyze’, choose ‘Nonparametric Tests’, and then click on ‘1-Sample K- S’. Example: 17

Data Transformation There are two basic ways to transform variables; compute and recode. Recoding into different Variable: - Recoding to create a new variable, so that the original data is preserved - Instruction (example): - - If the procedure is followed, then a fourth column of data will appear in the Data Editor window. - This is the new variable we have just created through recoding. - It is also very important to define the missing values for the new variable; just because a value was specified as missing for the original variable it does not mean that it will also be counted as missing for the new recoded variable. Recoding into the same Variable: - Allows you to reassign the values of existing variables or collapse ranges of existing values into new values. For example, you could collapse salaries into salary range categories. - You can recode numeric and string variables. If you select multiple variables, they must all be the same type. You cannot recode numeric and string variables together. 18

- Instructions: Transform > Recode > Into Same Variables - Example: Compute Variable: - Allows you to arithmetically combine or alter variables and place the resulting value under a new variable name - Instruction: Transform > Compute variable. Example: 19

Lab 5 Descriptive Statistical Analyses – SPSS Descriptives How to get there: Analyze > Descriptive statistics > Descriptives. The following dialog box will appear: To select variables, first click on a variable name in the box on the left side of the dialog box, then click on the arrow button that will move those variables to the Variable (s) box. To view the available descriptive statistics, click on the button labeled Options. Output can be generated by clicking on the Continue button, then clicking on the OK button . Frequency To obtain a frequency distribution, click one time on “Analyze”, go to “Descriptive Statistics”, then “Frequencies” 20

From the “Frequencies” menu, place the variable of interest into the “Variables” box by clicking on the variable one time then clicking on the arrow key. You will receive statistics you select for each variable in the “Variables” box. Click on the “Display frequency tables” box to get a frequency table. This table will give you the variable values, its frequency, percentage, valid percentage and cumulative percentage for each variable placed in the “Variables” box. Central Tendency and Variance From the “Frequencies” menu, place the variables of interest into the “Variables” box. Click one time on the “Statistics” box at the bottom of the screen. Select desired statistics by clicking on the box to the left of the desired statistic. Then click “Continue”, which will bring you back to the “Frequencies” menu. If done, click “OK”. Selected statistics will be generated in an output file. Graphs Click one time on the “Charts” button at the bottom of the screen. 21

Select “Bar chart” or “Histogram”. Clicking on “Continue” will return to the “Frequencies” menu. Stem and Leaf Plots Click on “Analyze”, then “Descriptive statistics”, then “Explore” Place the variables of interest into the “Dependant List”. Under “Display” box in the left lower-hand corner, select “Plots”. Then click one time on “Plots” button. When “Plots” menu opens, select “Stem-and-Leaf” plot from choices. Then click “Continue”, followed by “OK”. 22

Lab 6 Inferential Statistical Analyses (Part A) – SPSS Introduction Statistics Descriptive Inferential - Concerned with collection, - are used to describe systems organization, enumeration of of procedures that can be used the frequency of characteristics, to draw conclusions for summarization and datasets arising from systems presentation of data. affected by random variation. - Methods of data presentation: - Provide more detailed i. Tabulation: data presented information than descriptive statistics in tables ii. Graphical: data prsented in - Reveal causes and effects and make predictions graphs; e.g. bar chart, pie chart, line graph etc - Methods: i. testing of statistical hypotheses ii. estimation of parameters 23

Inferential Statistics Compare groups / Exploring relationships Difference between groups among variables / Association among example of analyses: factors Correlation, Z-test, t-test, ANOVA, regression Mann-Whitney test, Kruskal-Wallis test Hypothesis formulation & testing Hypothesis: an educated guess based on published results or preliminary observations Null hypothesis (H ): hypothesis of no difference Alternative hypothesis (H ): hypothesis that postulates that there is a treatment effect or a difference between groups The p value – probability of obtaining a value as extreme or more extreme than that observed in the sample given that the null hypothesis is true If p is less than α, reject H If p is greater than α, do not reject H Example of Hypothesis testing: i. One-tailed test: - p < 0.05 - reject hypothesis null (ada perbezaan) - p > 0.05 - accept hypothesis null (tiada perbezaan) ii. Two-tailed test: - p < 0.025 reject hypothesis null (ada perbezaan) - p > 0.025 accept hypothesis null (tiada perbezaan) 24

Statistical techniques to compare groups Parametric Tests t-test: ANOVA i. independent samples t-test Z-test ii. paired samples t-test - random sampling - level of measurement - normal distribution (to Assumptions to check differentiate between parametric BEFORE conducting and non-parametric test) anayses - power of a test (factors that influence the power of test such as sample size, effect size, alpha level set by the researcher - planned comparisons/ post-hoc comparisons - missing data Independent-Samples t-test used to compare the mean scores of two different groups of people or conditions example: difference of mean BMI scores between males and females what do you need two variables - one categorical, independant variable (e.g. males/females) - one continuous, dependant variable (e.g. BMI scores) procedure: Analyze Compare means Independent samples t-test Move the dependent (continuous) variable (e.g. BMI scores) into the area labelled Test variable Move the independent varible (categorical) variable (e.g. gender) into the section labelled Grouping variable Click on Define groups and type in numbers used in the data set to code each group. In the current data file, 1=males, 2=females; therefore, in the Group 1 box, type 1; and in Group 2 box, type 2. 25

Example of output from independent samples t-test: Checking information from the output Checking assumptions: If Sig. Value (in Levene’ Test column) is larger than .05 (.06, .07), use the first line in the table, which refers to Equal variances assumed. If Sig. Value is = .05 or less, use the second line of the table, which refers to Equal variances not assumed. Checking differences between groups: Choose whichever Levene’s test result says that you should use If value in the Sig. (2-tailed) column is equal or less than .05, than there is a significant difference in the mean scores between two groups If the value is above .05, there is no significant difference between the two groups. 26

Paired-Samples t-test Also referred to as repeated measures Used when you have only one group of people and you collect data from them on two diffrent occasions or under two different conditions. Usually used in pre-test/post-test experimental design or for intervention Example: level of blood glucose among adults before meal and after meal What do you need One set of subjects (or matched paired). Each person (or pair) must provide both sets of scores. - one categorical independent variable (e.g. Time: Time 1=before meal and Time 2=after meal) - one continuous dependent variable (e.g. level of blood glucose) Procedure: Analyze Compare means Paired samples t-test Click on the two variables that you interested in comparing for each subject With both of the variables highlighted, move them into the box labelled Paired Variables by clicking on the arrow button. Click on OK. 27

Example of output from paired-samples t-test: How to interpret the output: a. Determining overall significance - in the final column labelled Sig. (2-tailed), this is probability value (p) - if less than .05, there is significant different between the two scores - if more than .05, there is no significant difference between the two scores - take note that the t-value (in the example) is 5.394, you will need this when you report the result b. Comparing mean values - to find out which set of scores is higher (e.g. Time 1 & Time 2) - look in the box labelled Paired Samples Statistics, which gives the mean scores for each two sets of scores. One-way Analysis of Variance (ANOVA) used when you have one independent (grouping) variables with three or more levels (groups) and one dependent continuous variable. Example: level of blood glucose among Malays, Chinese and Indian What do you need two variables: - one categorical independent variable with three or more distinct categories (e.g. ethnic, age group) - one continuous dependent variable (e.g. level of blood glucose) Procedure: Analyze Compare means One-way ANOVA Click on dependent variable (continuous). Move this into box marked Dependent list by clicking on the arrow button. Click on independent, categorical varible and move it to the box labelled Factor. Click Options button, and click on Descriptive, Homogeneity-of- Variance and Means Plot. Click on Continue. 28

Click on the button marked Post Hoc. Click on Tukey. Click on Continue and then OK. Example of output from one-way ANOVA: Descriptives gives information about each group (in this example, age group) 29

In the column marked Sig., if the value is less than .05 or equal to .05, there is a significant difference somewhere among the mean scores on dependent variable for the three groups. The statistical significance of the differences between each pair of groups is provided in the table labelled Multiple comparisons, which gives the results of the post-hoc tests. Post-hoc tests - used when you want to conduct a whole set of comparisons, exploring the differences between each of the groups in your study. - Tukey : assume equal variances for the two groups - Dunnet’s C : do not assume equal variance - Two most commonly used are Tukey’s Honestly Significance Difference test (HSD) and Scheffe test. 30

Two-way between groups ANOVA Allows to test the effect of each independent variables on the dependent variable and also identifies any interactions effect Example of research question: what is the impact of age and gender on blood glucose level? Does gender moderate the relationship between age and blood glucose level? What do you need three variables - two categorical independent variables (e.g. Gender: males/females; Age group: young, middle, old) - one continuous dependent variable (e.g. blood glucose level) procedure: Analyze General Linear Model Univariate Click on dependent continuous variable, move it to box labelled Dependent variable. Click on two independent categorical variables, move these to box labelled Fixed factors. Click Options button Click on Descriptive. statistics, Estimates of effect size, Observed power and Homogeineity tests. Click on Continue. Click on Post Hoc button. Choose the test you wish to use (in this case, Tukey). Click on Continue. Click on Continue and OK. Non-Parametric Tests Chi-square Mann-Whitney Kruskal-Wallis test test Chi-square Used to compare frequencies (counts) in between two or more groups Both independent and dependent variables are categorical variables (two or more groups). Procedure: Analyze Descriptive statistics Crosstabs Click your two categorical variables into the Row and Column boxes. Click on the Statistics button. Check the Chi-square box. Click on Continue and OK. 31

Example of chi-square output: 32

Mann-Whitney U test Two samples t-test Independent (categorical) variables has two groups and dependent (continuous) variable has one group Example: test score between two groups of Professors Procedure: Analyze Nonparametric test 2 Independent samples Example of output: Kruskal-Wallis test One way ANOVA by ranks To compare the means of three or more groups A generalization of the Wilcoxon test for two independent samples It tests the null hypothesis that multiple independent samples come from the same population Does not tell how the groups are different, only that some diference are present 33

Example: three teaching methods among a group of students Procedure: Analyze Nonparametric test K Independent samples Example of output: 34

Lab 7 Inferential Statistical Analyses (Part B) – Minitab Introduction to Minitab and t-Tests MINITAB’s StatGuide provides statistical guidance for interpreting statistical tables and graphs in a practical, easy-to-understand way. You can access statistical guidance for the following commands in the Stat menu: - Basic statistics - Regression - Analysis of variance - DOE (factorial, response surface, mixture, and Taguchi designs) - Control charts - Quality tools (including planning tools, process capability, and gage study) - Reliability/survival (including distribution analysis, regression with life data, and accelerated life testing) - Multivariate - Time series - Tables - Nonparametrics - Power and sample size To view statistical guidance and interpretation of results, click Help Topics and choose one of the topics displayed. Copyright © 2000-2003 Minitab Inc. Independent t-Test An experiment was conducted to compare between the effect of microwave oven baking method and conventional oven baking method on the biscuit texture. The experiment used a total of 8 batches of cake for each baking method. Researcher want to know which method resulted in higher hardness of the biscuit. Data were recorded as follows; 35

Microwave Baking (A) Oven Baking (B) Hardness (g) Hardness (g) 46 52 47 50 43 45 39 49 49 43 44 46 48 47 42 43 Example Data Entry Open Minitab Fill in the worksheet according to the Table. Save the data Fail Save Project As New Folder STM3104 filename Hypotheses (One-tailed) H0 : A – B = 0 There is no significance difference between the two population means in terms of hardness as affected by different baking method Ha : A – B > 0; A – B < 0 There is a significance difference between the two population means in terms of hardness, and baking method of A/ B gives harder sample. Evaluation Assumptions Normality Test – Anderson-Darling (default) Hypotheses: H0 : Data are from a normally distributed population Ha : Data are not from a normally distributed population Rules: If p-value > 0.05, fail to reject H0 and we can conclude that the data are from a normally distribute population. There is not enough evidence to suggest that the data are not from a normal population. Therefore, the assumption is correct. Stat Basic Statistic Normality Test In dialog box : Variable : A OK Use the normal probability plot to verify that the data do not deviate substantially from what is expected when sampling from a normal distribution. 36

Probability Plot of A Normal 99 Mean 44.75 StDev 3.370 95 N 8 90 AD 0.175 P-Value 0.888 80 Percent 70 60 50 40 30 20 10 5 1 35.0 37.5 40.0 42.5 45.0 47.5 50.0 52.5 A 1. If the data come from a normal distribution, the points should roughly follow the fitted line. 2. If data do not come from a normal distribution, the point will likely not follow the line The plots indicate that the distributions are reasonably normal; all the points fall fairly close to the line Anderson-Darling normality test The p-value (0.888) is greater that the significance level of = 0.05. We fail to reject H0 and thus, it is reasonable to assume that the data do not deviate substantially from a normal distribution. DO THE SAME FOR VARIABLE B (You will get p-value = 0.805) The assumption of normality is reasonably satisfied, so you can proceed with the t- test. Test Statistic – Two-sample t-test 1. Stat Basic Statistics 2-Sample t 2. Complete the dialog box Check Sample in different column First A, Second B OK 37

The result was : Two-Sample T-Test and CI: A, B Two-sample T for A vs B N Mean StDev SE Mean A 8 44.75 3.37 1.2 B 8 46.88 3.27 1.2 Difference = mu (A) - mu (B) P-Value = 0.223 DF = 13 Estimate for difference: -2.12500 95% CI for difference: (-5.71193, 1.46193) T-Test of difference = 0 (vs not =): T-Value = -1.28 Rule Since p-value (0.223) > 0.05, we fail to reject H0 Conclusion There is no significant difference between the hardness of biscuit baked in microwave oven and conventional oven. Paired-Test A researcher needs to know whether a moisture analyzer could provide similar measurement on moisture content of the same corn flour or not, before and after calibration of the instrument. The flour was divided into two portions and prepared in 10 replications. The moisture content of the first portion was analyzed by oven method. The analysis for the second portion was carried out with an automatic moisture analyzer. Data were recorded as follows; Before Calibration (A) After Calibration (B) Corn Flours Moisture Content (%) Moisture Content (%) Batch 1 11.6 8.7 Batch 2 10.8 10.5 Batch 3 13.2 10.0 Batch 4 13.1 11.2 Batch 5 10.5 9.2 Batch 6 11.6 9.0 Batch 7 12.3 9.8 Batch 8 10.3 9.7 Batch 9 10.1 11.0 Batch 10 11.6 8.7 38

Example Data Entry Open Minitab Fill in the worksheet according to the Table. Save the data Fail Save Project As New Folder STM3104 filename Hypotheses H0 : D = 0 The mean difference between paired observation in the population is zero Ha : D ≠ 0 The mean difference between paired observation in the population is not zero Evaluation Assumptions Before run the test, you need to treat the data as a one sample t-test and meet the normality assumption for one sample t-test. Therefore, it is necessary to first store the pairwise differences in the worksheet; 1. Calc Calculator 2. In the dialog box : Store result in variable : Type ‘Diff’ and select A – B In Expression OK You will get the differences (Diff) – Then run the normality test for the differences as discussed before. You will get – Probability Plot of Diff : 99 Probability Plot of Diff Mean 1.73 StDev 1.362 95 Normal N 90 AD 10 0 12 34 P-Value 0.426 80 Diff 0.250 70 Percent 60 50 40 -1 5 30 20 10 5 1 -2 39

Use the same rules Anderson-Darling normality test The p-value (0.250) is greater that the significance level of = 0.05. There is not sufficient evidence to reject the null hypothesis of normality. Therefore, the one sample t-test assumption of normality is appropriate. The assumption of normality is reasonably satisfied, so you can proceed with the paired t-test. Test Statistic – Paired t-Test 1. Stat Basic Statistics Paired t 2. Complete the dialog box Check Sample in columns First sample : A, Second sample : B OK The result was : Paired T-Test and CI: A, B Paired T for A - B N Mean StDev SE Mean A 10 11.5100 1.1060 0.3497 B 10 9.7800 0.9041 0.2859 Difference 10 1.73000 1.36223 0.43077 95% CI for mean difference: (0.75552, 2.70448) T-Test of mean difference = 0 (vs not = 0): T-Value = 4.02 P-Value = 0.003 Rule Since p-value (0.003) < 0.05, we can reject H0 Conclusion There is a significant difference between the moisture content of corn flour when analyzed before and after calibration of the moisture analyzer. 40

Exercise By using a viscometer, the following viscosity data (in mpa.s) were obtained for mango drinks with 20% and 25% sugar content in their formulations. The experimenter want to know if increasing 5% sugar content in the formulation will significantly affect the viscosity of mango drink. 20% 120 128 121 113 109 125 120 118 130 106 118 25% 131 125 126 138 120 135 133 126 130 129 41

Lab 7 Inferential Statistical Analyses (Part B) – Minitab ANOVA Two-way ANOVA (Interaction) Interaction of fat and thickener will affect viscosity of salad dressing. Three bathes of salad dressing were prepared in order to know how the viscosity of salad dressing at combinations of two levels of fat (100, and 200 g) and three levels of thickener (4, 5 and 6 g) in overall formulation. Data were recorded as follows; Rep. 1 Thickener Fat (g) 4 (g) 100 200 5 6 29 45 29.6 49 30.1 62.1 Rep. 2 Thickener Fat (g) 4 (g) 100 200 5 33.1 47.1 6 32.6 56 37.5 61 Rep. 3 Thickener Fat (g) 4 (g) 100 200 5 6 31 47.1 29.1 52.9 33.2 61.4 Working Example Data Entry Open Minitab Fill in the worksheet according to the following column; Fat, Thickener, Viscosity Save the data Fail Save Project As New Folder STM3104 filename Hypotheses For interaction effect Ho: There is no interaction effect between fat and thickener on the viscosity All ()ij = 0 Ha: There is an interaction effect between fat and thickener on the viscosity Not all ()ij = 0 42

For main effects of A Ho: There is no difference in the means of viscosity due to fat content (at different levels of thickener) Ho (A) : 1 = 2 = …..i = 0 Ha: There is a difference in the means of viscosity due to fat content (at different levels of thickener) Not all i = 0 For main effect of B Ho: There is no difference in the means of viscosity due to thickener content (at different levels of fat) 1 = 2 =….. = j Ha: There is a difference in the means of viscosity due to thickener content (at different levels of fat) Not all j = 0 Evaluation Assumptions 1. Normality Test – Anderson-Darling (default) Hypotheses: H0 : Data are from a normally distributed population Ha : Data are not from a normally distributed population Rules: If p-value > 0.05, fail to reject H0 and we can conclude that the data are from a normally distributed population. There is not enough evidence to suggest that the data are not from a normal population. Therefore, the assumption is correct. Stat Basic Statistic Normality Test In dialog box : Variable : Viscosity OK Use the normal probability plot to verify that the data do not deviate substantially from what is expected when sampling from a normal distribution. Probability Plot of Viscosity Normal 99 Mean 42.6 StDev 12.31 95 N 18 90 AD 0.831 P-Value 0.025 80 Percent 70 60 50 40 30 20 10 5 1 10 20 30 40 50 60 70 Viscosity 3. If the data come from a normal distribution, the points should roughly follow the fitted line. 4. If data do not come from a normal distribution, the point will likely not follow the line 43

The plots indicate that the distributions are reasonably normal; all the points fall fairly close to the line Decision Anderson-Darling normality test The p-value (0.025) is lower than the significance level of = 0.05. We need to reject H0 and thus, it is unreasonable to assume that the data do not deviate substantially from a normal distribution. 2. Test for equal variance Hypotheses: H0 : Data are from the populations with equal variances Ha : Data are not from the populations with equal variances Stat ANOVA Test for equal variances In dialog box: Response viscosity, Factor Fat, Thickener OK Test for Equal Variances for Viscosity Fat Thickener Bartlett's Test 4 Test Statistic 6.09 100 5 P-Value 0.298 6 Lev ene's Test Test Statistic 0.88 P-Value 0.521 4 200 5 6 0 10 20 30 40 50 60 95% Bonferroni Confidence Intervals for StDevs The results was; Bartlett's Test (normal distribution) Test statistic = 6.09, p-value = 0.298 Levene's Test (any continuous distribution) Test statistic = 0.88, p-value = 0.521 Decision The p-values for both Bartlett’s and Levene’s test are greater than the significance level of = 0.05. We fail to reject H0 and thus, it is reasonable to assume that the data do not deviate substantially from populations with equal variances. The assumption of equal variances is reasonably satisfied, so we can proceed with the two-way ANOVA test. 44

Test Statistic – Two-way ANOVA 3. Stat ANOVA Two-way In dialog box : Response Viscosity, Row factor Fat, Column factor Thickener 4. Complete the dialog box Check Sample in different column First A, Second B OK The result was : Two-way ANOVA: Viscosity versus Fat, Thickener Source DF SS MS F P Fat 1 2142.94 2142.94 360.36 0.000 Thickener 2 244.32 122.16 20.54 0.000 Interaction 2 118.13 59.07 9.93 0.003 Error 12 71.36 5.95 Total 17 2576.76 S = 2.439 R-Sq = 97.23% R-Sq(adj) = 96.08% Main Effects and Interaction effect Plots 1. Stat ANOVA ANOVA Main effects 2. Stat ANOVA ANOVA Interaction plots: Display full interaction plot matrix Main Effects Plot (data means) for Viscosity Fat Thickener 55 50 Mean of Viscosity 45 40 35 30 200 4 5 6 100 45

Interaction Plot (data means) for Viscosity F at 100 456 200 60 50 Fat 40 30 60 Thickener 4 5 50 6 Thickener 40 30 200 100 Decision Rule For interaction effect Since the p-value < 0.05, the effects of each treatment are different at different levels of the other treatment. For this reason, it does not make sense to try and interpret the individual effects of treatment which are involved in significant higher-order interactions. So, we need to reject H0. Conclusion There is a significant interaction effect between fat and thickener on the viscosity of salad dressing. The effects of fat were generally found to be different at different levels of thickener and vice versa. The interaction plots exhibit lack of parallelism, indicating the effect of fat content was not the same at each level of thickener. Multiple Comparisons following ANOVA Two-way ANOVA does not lead to determination of significant difference between two means. Therefore, we need to run a One-way ANOVA and followed by Multiple Comparisons analyses. One-way ANOVA (Combination Factor – For Simple Effect) Since the interaction is significant – we can run One-way ANOVA and multiple comparisons considering a combination of two factors at all levels (treatments). Data Entry Open Minitab Use the same worksheet Make an additional column for a new factor (combination factor – e.g. 2004 (for 200 g of fat and 4 g of thickener) Save the data Hypotheses There is no difference between means for all salad dressing in terms of viscosity. Ho : 1 = 2 = ……. = k where k = the number of groups to be compared Two or more means differ from each others in terms of viscosity. Ha : Not all k are equal. 46

Evaluation Assumptions Since the data have been proven before to be normally distributed and having the variances, further evaluation of assumption is not compulsory. Test Statistic – One-way ANOVA with Multiple Comparisons 1. Stat ANOVA One-way In dialog box : Response Viscosity, Factor New Factor . Comparison Check any of post-hoc tests e.g. Tukey’s Family 2. Complete the dialog box OK The results was : One-way ANOVA: Viscosity versus Factor Source DF SS MS F P Factor 5 2505.40 501.08 84.26 0.000 Error 12 71.36 5.95 Total 17 2576.76 S = 2.439 R-Sq = 97.23% R-Sq(adj) = 96.08% Decision Rule From One-way ANOVA - Since the p-value < 0.05, two or more means are different. There is a significant combination effect of fat and thickener on the viscosity of salad dressing. So, we need to reject H0. Individual 95% CIs For Mean Based on Pooled StDev COUTION!!! It is WRONG to judge Level N Mean StDev ---+---------+---------+---------+------ significant differences among samples based 1004 3 31.033 2.050 (--*--) on these lines. 1005 3 30.433 1.893 (--*---) 1006 3 33.600 3.716 (--*--) 2004 3 46.400 1.212 (--*--) 2005 3 52.633 3.508 (--*--) 2006 3 61.500 0.557 (---*--) ---+---------+---------+---------+------ 30 40 50 60 Pooled StDev = 2.439 USE TUKEY’S LSD OUTPUT TO JUDGE ANY SIGNIFICANT DIFFERENCES AMONG SAMPLES RULE : No significant difference between two samples if the range includes ‘O’ and otherwise. In this example sample 1004, 1005 and 1006 are not significantly different. Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Factor Individual confidence level = 99.43% Factor = 1004 subtracted from: 47

Factor Lower Center Upper ---------+---------+---------+---------+ 1005 -7.288 -0.600 6.088 1006 -4.121 9.254 (---*--) 2004 2.567 22.054 (--*---) 2005 8.679 15.367 28.288 (---*--) 2006 14.912 21.600 37.154 (---*--) 23.779 30.467 (--*---) ---------+---------+---------+---------+ -20 0 20 40 Factor = 1005 subtracted from: Factor Lower Center Upper ---------+---------+---------+---------+ 1006 -3.521 3.167 9.854 (---*--) 2004 22.654 (--*--) 2005 9.279 15.967 28.888 (--*--) 2006 15.512 22.200 37.754 (---*--) 24.379 31.067 ---------+---------+---------+---------+ -20 0 20 40 Factor = 1006 subtracted from: Factor Lower Center Upper ---------+---------+---------+---------+ 2004 6.112 12.800 19.488 (--*---) 2005 12.346 19.033 25.721 (---*--) 2006 21.212 27.900 34.588 (--*--) ---------+---------+---------+---------+ -20 0 20 40 Factor = 2004 subtracted from: Factor Lower Center Upper ---------+---------+---------+---------+ 2005 -0.454 6.233 12.921 (--*--) 2006 21.788 (---*--) 8.412 15.100 ---------+---------+---------+---------+ -20 0 20 40 Factor = 2005 subtracted from: Factor Lower Center Upper ---------+---------+---------+---------+ 2006 2.179 8.867 15.554 (--*---) ---------+---------+---------+---------+ -20 0 20 40 Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of New Factor (Simplified) Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ---+---------+---------+---------+------ COUTION!!! The assigned 1004 3 31.033 2.050 (--*--)c superscript letters to express significant 1005 3 30.433 1.893 (--*---)c differences among sample is based on 1006 3 33.600 3.716 (--*--)c Tukey's LSD output 2004 3 46.400 1.212 (--*--)b only! 2005 3 52.633 3.508 (--*--)b 2006 3 61.500 0.557 (---*--)a ---+---------+---------+---------+------ 30 40 50 60 48

Conclusion There is a significant interaction effect between fat and thickener on the viscosity of salad dressing. The effects of fat were generally found to be different at different levels of thickener and vice versa. The effect of incorporation 200 g fat at 6 g thickener on the viscosity of salad dressing was significantly different from the effects of other fat and thickener combinations. Combinations of 200 g fat at 4 or 5 thickener levels did not resulted in any significant difference on the viscosity. The same observation goes to combinations of 100 g fat at all thickener levels. However, there was a significant difference in the effect of incorporation of 100g and 200 g fat on the viscosity of salad dressing regardless of thickener level. 49

Two-way ANOVA (Interaction) - Exercise Polysaccharide gels have been prepared with gellan gum (GG), sodium caseinate (SC) and whey protein concentrated (WPC). Levels of both GG and WPC have been varied whilst level of SC was constant. Fracture stress (Pa) of all gel samples have been measured as followed; WPC (%) 0.1 GG (%) 0.5 1 54.81 60.03 0.3 48.21 50.23 2.5 47.10 50.11 51.21 49.27 4 60.23 57.24 60.15 59.12 45.10 53.14 66.29 64.50 65.70 63.20 50

Search