Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Development of a Psychophysiological Artificial Neural Network to Measure Science Literacy

Development of a Psychophysiological Artificial Neural Network to Measure Science Literacy

Published by amandakavnerphd, 2020-07-02 14:58:31

Description: Dissertation

Search

Read the Text Version

area hierarchy is that as one increases in the hierarchy, the sizes get progressively smaller (Arbib & Bonaiuto, 2016). A hierarchy is a prominent feature of these visual regions, which applies to the low, mid, and high levels. Broadly speaking, V1 and MT appear to be involved in mid-level vision in low- level processing, V2, V4 and MST and IT in high-level vision (Arbib & Bonaiuto, 2016). The prefrontal cortex assists executive regulation mechanisms in the corresponding regions by biasing memory (Racz et al., 2017). These top-down biasing activities serve as guidelines for perceptive and reflective attention. Prefrontal biasing of perceptual attention during encoding allows for the selection of goal-relevant stimulus characteristics. During retrieval, the prefrontal biasing of reflective attention enables the selective generation or collection of information produced internally. Neuroimaging studies have demonstrated the specific mechanisms that lead to these processes — and the corresponding prefrontal subregions (Wixted, 2018). Thus, relationships between neural encoding responses and subsequent recall are likely to reflect contributions from the mechanisms of attention. In addition, variation in neural activity just before an occurrence is encountered predicts future memory. Therefore, the relationship between attention and subsequent memory does not rely on intentional acts of remembering; rather, memory is a by-product of how attention was previously focused, whether or not those prior acts of attention were linked to a specific desire to create long-term memories. This means attention is necessary to encrypt relevant information about our environment as well as to avoid the encoding of irrelevant data (Wixted, 2018). 39

Neuroscience of Mental Imagery. Visual perception is a dynamic affair, involving both feed-forward and feedback signals that function closely together and interactively (Pearson, 2019). It is therefore very difficult to distinguish the effects of bottom-up feed-forward or feedback signals during perception. Since perception is interactive and thus cannot be understood without studying feedback mechanisms, it can be argued that imagery is the key to understanding normal visual perception. Although not always, mental visual imagery is perhaps the only pure form of sensory representation that can occur solely because of feedback signals and in complete sensory isolation. Therefore, it represents a unique window for studying the dynamics of feedback signals in the brain and understanding the constructive nature of visual perception, an important step of reductionism in understanding the mechanisms behind sensory perception (Pearson, 2019). This link between visual imagery and visual perception can be seen with neuroimaging studies where EEG and fMRI have found that the mental rehearsal of an action has been shown to be as effective as physical practice in facilitating the acquisition of an implicit perceptual motor skill, in this case a series of button presses with the non-dominant hand (Krauetner, 2016). In fact, healthy individuals and minimally conscious subjects show significantly similar fMRI data when asked to imagine playing tennis (Monti et al., 2010). Blindsight is what occurs when there is a disorder in that link between visual imagery and visual perception, specifically, when there is damage to the ventral stream of vision or the visual cortex. Individuals experiencing blindsight or visual form agnosia show a disruption to that link between visual imagery and visual perception in that individuals with blindsight cannot mentally visualize and therefore cannot recognize objects that they can physically see (Ganel & Goodale, 2019; Goodale, 2013). 40

Mental Imagery and Visual Science Literacy. Successful reading comprehension involves the construction of a coherent mental representation of the text, and that this representation consists of a network of semantic relationships between text elements and between text elements and the background knowledge of the reader (Schnotz et al., 2002). Mental manipulations and transformations of images are a recurrent theme in the reports of imagery by scientists. Psychological research indicates that visual-spatial images are easily susceptible to transformations: in the mind, or externally via concrete models, or on paper. Further, images can hold powerful metaphorical connotations that suggest relations and concepts extending beyond their concrete physical form (Ramadas, 2009). Whereas text comprehension has been studied rather intensively, picture comprehension research has always been a secondary field of research (Schnotz et al., 2002). Text and image comprehension studies focused primarily on the mnemonic function of text-illustrating images. The main assumption of these studies was that text information is better remembered when it is illustrated by images than when there is no illustration. However, text and picture comprehension do not usually occur in isolation. Instead, text comprehension occurs in the context of images and picture comprehension occurs in the context of text (Schnotz, 2002). The facilitation of learning from text-based pictures was originally explained by Paivio's dual coding theory (Paivio, 1986). According to this theory, verbal and pictorial information are processed in different cognitive subsystems: the verbal system and the imagery system. Words and phrases are usually processed and encoded only in the verbal system, while images are processed and encoded both in the imagery system and in the verbal system. As a result, the high memory for pictorial information and the memory-enhancing effect of images in text are 41

attributed to the advantage of dual coding compared to single encoding in memory. As an update to the dual coding principle, Kulhavy, Stock, and Kealy (1993) clarified the mnemonic role of images by the simultaneous presence of text and pictorial content in working memory, which is intended to facilitate the retrieval and processing of information (Schnotz, 2002). In fact, students who read a text explanation, accompanied by corresponding illustrations, generated a median of 79% more solutions to problem-solving transfer tests than students who read the text alone. Pictures support comprehension when both texts and images are explanatory, when verbal and pictorial content is related to each other, when verbal and pictorial information is presented closely together in space or time (i.e. in contiguity) and when individuals have low prior knowledge of the subject area but high cognitive spatial abilities. In order to explain these results, Mayer built a multimedia learning paradigm that incorporates the principles of Paivio's (1986) dual coding theory with the notion of perception as a form of multi-level mental representation (Mayer, 2002; Schnotz, 2002). Perceptual images created in the sense of visual perception are sensory sensitive because they are related to the visual modality. The similarity of these images to perceptions can be attributed to the fact that visual representations and perceptual experiences are based on the same cognitive processes (Pearson & Kosslyn, 2015). Mental imagery or mental representations, whether created during visual comprehension or during text comprehension, are internal depictive representations as they have intrinsic conceptual features in common with the object displayed. This means that they represent an object based on a conceptual or functional comparison (Johnson-Laird, 1983; Johnson-Laird & Byrne, 1991; Schnotz, 2002). As humans view the world, perception provides a mental model of what objects are in front of them. Likewise, as they understand the description of the world they will create a 42

comparable, if not less complex, representation— a mental model of the world based on the interpretation of the description and on their experience, where the form of the representation correlates to the nature of what it represents (Johnson-Laird, 1983). History of Mental Imagery Study Mental imagery history begins at the beginning of written records; early writings throughout the world reference mental imagery. Greek philosophers like Aristotle and Plato, for example, had ideas about the essence of mental representation in their philosophic doctrines. Aristotle argued that visualization was fundamental in cognition (Birondo, 2001), indicating an imageless perception of the mind (Emilsson, 1988). The historical record indicates that imagery is, thus, widespread among cultures as early as humans started writing, as demonstrated by early writings in China (Waley, 2005) and Greece (Beare, 1906). Although thinkers have debated about the existence of mental imagery is stated in ancient writings and was a common theme at the beginning of Eastern and Western philosophy (Beare, 1906; Waley, 2005), in the 19th century the truly scientific study of imaging began. The debate on the nature of imagery goes beyond simply the existence of different types of mental imagery; some authors question the very existence of mental imagery. A key issue for the scientific study of mental imagery is its basic representation: is visual imagery visual or is it propositional? On one side of the debate, scholars such as Fodor and Pylyshyn (1988), assume that mental processes are essentially symbolic, propositional and verbal, not visual/spatial. These theories propose that brain activity is an abstract simulation, and the perception of visualization (i.e. seeing with the “mind's eye”) is epiphenomenal, meaning imagery would only be a non-causal byproduct of the brain's awareness and algorithmic processing. For example, if one is asked to think about a cat, the idea of a cat would be “thought”, the syntactic features would come to one's mind, and the sensation of 43

“pseudo-seeing” the cat would be a pure epiphenomenon. However, some theorists, including Finke et al. (1989) and Kosslyn (2005) claim that imagery is a visual process and is, therefore, a source of visual information. Although the word mental in mental imagery may lack parsimony, and imply a dualistic ontology (Watson, 1913), the concept is nevertheless used in cognitive psychology to refer to the experience of cognition in the absence of stimuli, without actually referring to a conceptual (or otherwise dualistic) ontology (Pearson and Kosslyn, 2015). Galton (1880) became the first step towards scientific evidence about the existence of mental imaging. Through his writings, he used conversations with several academics and scientists about the vividness of mental imagery. Galton asked his participants about several aspects of their encounter of imagery such as vividness (how close to the visual experience is the perception of seeing), lighting (bright or dim), definition (sharp and specified or blurred), coloring (vivid or not), and several other specific questions. Galton's conclusion was that intellectual (scientific) people had low vivid imagery (Galton, 1880). The study was subsequently checked and the findings, validity and reliability of the work were challenged (Brewer & Schommer-Aikins, 2006). Galton's attempt to study the subject, however, sparked research in the study of mental imagery, and began with what is today referred to as \"vividness.\" The contextual aspect of his work has shown that the perception of imagery is emotional, unique and varies from person to person. Each individual in Galton's study identified different imagery interactions and used different imagery terms. This contextual variation indicates that individual differences are a key feature of the imagery analysis. Almost three decades later, Betts (1909) produced a Questionnaire on Mental Imagery (QMI) that examines mental imagery in multiple aspects of imagery rather than just visual qualities. The questionnaire also inquired for imagery in other sensations such as 44

hearing, touch, smell, motor imagery, taste, and emotions. Later, the QMI was simplified by Sheehan (1967). Since then, several other questionnaires have been created, including the Gordon Visual Imagery Control Test (TVIC; Reisberg, Pearson, & Kosslyn, 2003), the Vividness of Visual Imagery Questionnaire (VVIQ; Marks, 1973), the Vividness of Visual Imagery Questionnaire 2 (Marks, 1973), the Spontaneous Use of Imagery Scale (SUIS; Pearson, Deeprose, Wallace-Hadrill, Heyes, & Holmes, 2013), and the Susse Imagery Scale. While part of the scientific community acknowledged anecdotal observations as a credible measure of imagery, another part of the scientific community did not agree that mental imagery was, or could be, an established phenomenon. The debate about whether there is an \"image in the mind's eye\" is ongoing. The classical behaviorist Watson (1913) indicated that imagery did not exist; Fodor and Pylyshyn (1988) proposed that imagery was epiphenomenal only and that people's perception was influenced by words only. The main criticism of these early hypotheses was the lack of empirical evidence to support them. Shepard & Metzler (1971) carried out the first non- verbal study of mental imagery. The Mental Rotation Test (MRT) was developed to provide an objective analysis of mental imagery with limited language interference The MRT consists of a matching process in which the target object is viewed along with at least one other; the other object is either flipped and/or a replicated copy of the target object or similar-looking misleading objects. The participant is instructed to mentally rotate the target object and to determine which of the possible answers is the rotated / mirror variant of the target. Response times of the participants are then reported and evaluated. Work using the MRT has shown that response times improve, on average, with higher rotation speeds from the target object, varying from 15° to 180°, with incremental increases of 15° (Shepard & Metzler, 1971). This association between rotation degrees and response times was repeated for 2-D and 3-D objects (Cooper, 1975; 45

Wright, Thompson, Ganis, Newcombe, & Kosslyn, 2008), rotated letters, rotated faces, rotated hands and rotated bodies (Pearson, Marsh, Hamilton, & Ropar, 2014). The linearity of the relationship between rotational degrees and response time suggested that participants rotate mental images in a manner similar to actual images, indicating that the underlying representation promoting mental imagery is pictorial, not epiphenomenal, as previously suggested. Several neuroimaging studies have also shown that visualization is physiologically similar to vision. For example, research has found that visual imagery stimulates specific neural structures within the visual system relative to visual perception. Specifically, the visual system is split into two neural pathways, one for target awareness projecting from the occipital cortex to the ventral temporal cortex and the other for spatial vision projecting from the occipital cortex to the dorsal parietal cortex. Neuroimaging results suggest that visual imagery is also split into two neuronal streams: spatial imagery uses the dorsal stream and object imagery uses the ventral stream (Courtney, Ungerleider, Keil, & Haxby, 1996; Farah, Hammond, Levine, & Calvanio, 1988). Spatial Computing for Immersive Education. These relationships between experiences, neural encoding responses and the resulting recall are commonly used in spatial computing to create immersive environments. By involving the use of these pedagogical tools, the recall of these visual scientific concepts can potentially be increased by encouraging focused attention on the relevant data so that mental imagery can be encouraged and long-term memories of the science concepts can be formed. Glasersfeld (1992), a constructivist theorist, described teaching as a social activity, but learning as a private activity, and further described learning as happening on the basis of failures and successes of his own behavior. Virtual reality tools thus provide the opportunity to learn as a 46

private activity as well as the opportunity to succeed and fail in a soft failure environment. Participants in a traditional learning environment, for example, have the opportunity to succeed and fail in a group setting that may be overwhelming to some participants. Conversely, participants ' success or failure in a digital learning environment can be based solely on input from the digital learning environment itself. Participants get the opportunity to move, test, and explore the proprioceptive characteristics of the instruments in a virtual reality environment and begin to understand how they can independently measure or classify an object effectively. In the same way, participants will provide immediate feedback on whether they have completed the specified task correctly or incorrectly. The view of learning and teaching as socio-constructivism is the hardest to achieve in online learning. Socio-constructivism suggests the meaningful knowledge building occurs when one learner interacts with other learners (Ertmer & Newby, 2013). While interactive technologies have the potential to remove or substitute social interaction from learning, the optimum implementation of these devices aims at providing flexibility for learning institutions with a portion of the curriculum having a social component alongside the more private and customized immersive learning tools. In a social setting, as opposed to the privacy immersive education can provide, there is an increased ability to communicate with the teacher and other learners. The most often referred to as the flipped classroom which is a pedagogic model that reverses lectures and homework. In the context of this possible use of immersive learning environments to promote virtual science literacy, students will complete online exercises and lectures using the preferred virtual reality tool, be evaluated by cognitive evaluations, and use class time to conduct hands-on experiments in a laboratory setting, along with discussion of the content covered by digital instruments. 47

Augmented Reality (AR). Augmented reality (AR), a technology which overlays abstract or virtual objects (augmented components) into the physical world (Akçayır & Akçayır, 2017) has been debated since the late 1960s but was previously considered too costly and inefficient to be useful (Pausch et al., 1997). Only now, especially in combination with the smartphone, it emerges as an important application for communication, research, industry, and art. Although a widely accepted definition of augmented reality doesn't seem to exist (Bimber and Raskar, 2005), Milgram and Kishino (1994) defined what has become known as Milgram’s Virtuality Continuum, ranging from face-to-face interaction to digitally improved environments, to digitally enhanced physical worlds (such as digital maps) and finally to virtual worlds (such as virtual reality). Maybe the simplest definition is that AR is the combination of digital and real- world information since it acts as a computer-generated feature added to the real environment, whereas in true virtual reality the whole experience is based on computers. While virtual reality (VR) and gamified virtual social networks have become more popular with eLearning environments, augmented reality could affect education in the future as much or even more than virtual worlds. Virtual worlds have the ability to provide a much more realistic environment than augmented reality, but for those overwhelmed or concerned with the realism of VR, AR can actually offer the best of both worlds, as well as increased flexibility to be used in a classroom setting. The two main types of augmented reality cell phone currently in use are called markerless, which uses telephone or image recognition position data to identify a location and then overlays digital information, as opposed to marker-based, which requires a specific label, such as a bar code, to identify a location. The modern cell phone makes these kinds of 48

applications inexpensively and easily usable. The iPhone, BlackBerry, and Android phones can be downloaded from an application called Google Goggles. It's basically a familiar search engine that will use an image of a cell phone to provide that phone with information about a landmark, tourist spot, bar, or even a bottle of wine. Several companies have created brandless augmented reality applets for smartphones, including Layar, Wikitude, and Junaio. An application developed by Layar, for example, allows a software developer to overlay images on the phone's video, combining real-life views with digital data. Therefore, if someone with their cell phone takes a video of a popular spot, the Layer software adds additional information about the live camera stream. This is creating a new kind of guided tourism. Presently, Layar offers thousands of layers of location-related content, including the location of nearby schools, museums, restaurants, transportation, and health care (Layar, http://www.layar.com). Virtual Reality (VR) and Serious Educational Games (SEG). In the 1990s virtual reality (a type of computer simulation generating three-dimensional models where people immersed themselves using head-mounted displays, projected video, audio feedback and haptic devices allowing real-time computer-human interaction) was being used for military training (Witmer, Bailey, and Knerr, 1996), training firefighters (Bliss, Tidwell, and Guest, 1997), viewing chemical interactions (Illman, 1994), and education (Briggs, 1996). After a decade of research, virtual reality (VR), as described above, is uncommon in educational settings. Yet, digital learning can provide a means for students to achieve a level of mastery at a distance and be better prepared when they participate in a laboratory environment. When educational video games were used, Din and Caleo (2000) and Rosas et al. (2003) found a positive increase in student motivation and classroom dynamics. San Chee (2001) describes how words and concepts at the language level are rooted in experience. This concrete 49

experience provides the needed base for reflective observation and abstract conceptualization as described by Kolb’s learning theory. San Chee (2001) successfully incorporated experiential learning and dialogue between learners into the classroom using battleship and vacuum chamber simulations. Incorporating simulations and VR applications into educational settings achieve two goals. First, VR applications allow learners to intuitively understand how objects function and behave (Salzman, Loftin and Dede, 1996; San Chee, 2001). Second, VR applications and simulations may lay the foundation for deeper learning and conceptual change for both the teacher and the student. This is accomplished by providing common experiences students and teachers reference within a conversational framework of dialog, reflection, and instructional activities (San Chee, 2001). Due to the realism and visually engaging elements present in VR and SEGs, the study’s authors hypothesize that utilizing VR and SEG elements would be preferable to the use of AR elements as a pedagogical tool for the promotion of visual science literacy. To this point, a mentored hands-on condition may be more beneficial for student learning efficiency than a poorly designed VR experience but further exploration should occur. However, SEGs and VR could provide a better use of resources since students can repeatedly engage in these learning environments within and outside of the classroom without consuming classroom resources making the technology helpful for resource-constrained environments, for example in schools with smaller budgets, The second advantage of VR and SEGs may be the opportunity for students to engage with soft failure environments (Lamb, 2014; Lamb et al., 2018). In the VR and SEG environments, the student has the opportunity to fail without fear of serious repercussions or danger as the electronic environments can simply be restarted. In addition, the students can start, stop, and replay activities as often as they like providing a mechanism for self- 50

paced learning. The student is also afforded the opportunity to engage in exploratory and play type behaviors which may also aid in learning. Essentially, with both the VR and SEGs, students can attempt to develop questions and experiment with combinations in virtual environments on their own. In addition, with the use of instructional technology becoming ubiquitous inside and outside of the classroom, and combined with the onset of the Next Generation Science Standards (NGSS) there exists an ever-ready platform for SEGs and VR in the educational environment in the form of mobile telephones (Lamb et al., 2018). 51

CHAPTER THREE Methods This study was the second phase of a two-phase study. The first phase involved the data collection and this study shows the development of the cognitive model. This research utilizes secondary data which is data that was previously collected to answer a different research question. The data used for this study was initially collected as part of the National Science Foundation (NSF) funded Project Leveraging Neuroscience for the Education of Science (LENS) (Award #1540888)1 and that physiological data is being used to standardize the sample. Therefore, for this study, the sample of interest is the 131390 data points which were derived from the cognitive task, not the individual 40 participants conducting the task. It is for this reason that the focus on the following pages will be on the collected data points and data analysis as opposed to how that data was collected from the study participants. Research Design This study’s research design is secondary data analysis of neurocognitive data, wherein the data is derived from the ratio of deoxygenated to oxygenated hemoglobin indicating the level 1 Project LENS is a Science of Learning Collaborative Network (SL-CN) of scholars from STEM Education, neuroscience, cognitive psychology, computer science and educational technology and measurement from multiple universities, led by Pasha Antonenko as head Principal Investigator, working to advance research about learning through integrative and empirical research (NSF DUE 1540888). 52

of cognitive processing (fNIRS) to neurocognitive sensors. Cognitive measures of visuospatial processing accuracy (fNIRS homedynamic responses when mental rotation task was successfully completed) and processing speed (response time) for fNIRS serve as the dependent variables collected with the 160 mental rotation tasks acting as independent variables. Participants A diverse group of 40 postsecondary students was previously recruited by each node the Project LENS collaborative network largely from local community colleges (for a total of 120 participants across the nodes). Study candidates were pre-screened for visual attention span, visual inhibitory control, visual episodic memory capacity, and reading ability to ensure that the sample represents a spectrum of attentional and cognitive differences as seen in the population. Participants were recruited through the placement of informative posters, near chemistry classrooms at three community colleges pertaining to the study during the months of July and August. Participants were guided to contact the lab directly and then wait to receive a follow-up email or phone call to schedule a time to visit the laboratory. At the laboratory, study protocols were reviewed and participants were afforded an opportunity to ask questions. Procedure Participants were randomly selected from the sample to conduct organic chemistry-based mental rotation tasks. Two-dimensional (2D) molecular structures commonly known as Wedge and Dash models (dash/wd) and three-dimensional (3D) molecular structures commonly known as Ball and Stick models (BL/b) were shown to participants while they wore fNIRS headband tool to test brain blood deoxygenation (hemodynamic) levels. There were 80 individual mental rotation problems for both tasks (n=160) where the participants had to decide whether or not the two molecules shown were rotated versions of the same molecule. 53

Participants were seated in a comfortable chair approximately 50 cm from the computer screen in a laboratory with controlled lighting and acoustics. Using a computerized molecular model test, participants determine whether the molecules presented were the same or different from each other (ie. Figure 4). Each stimulus was presented for 4 seconds and both presentation and responses were controlled using Superlab software and response pad. If the participant did not respond, the item was scored as incorrect and the program advanced to the next item with a 750 ms intertribal interval. It was during this portion of the study, where participants had two non-invasive sensors, EEG and fNIRS, placed on their head and an eye-tracking camera focused on their eyes. The EEGs and fNIRS make use of disposable sensors and the three systems monitor physiological responses and record those responses using SuperLab and MATLAB without any skin or scalp preparation required making them ideal neurocognitive measurements to be recorded in a classroom. Figure 4. Sample Wedge and Dash Mental Rotation Question Measurements Data Collection Procedure. Data collection took place at the University of Buffalo Neurocognition Science Laboratory (UBNCSL). UBNCSL is a fully equipped laboratory with a single wireless Mobita EEG system, an fNIR unit, an eye-tracking system and a computer running Biopac's Acqknowledge, MATLAB, and SuperLab software for the collection and analysis of 54

psychophysiological data. This Neurocognition Lab is a low-noise and controlled lighting environment, suitable for performing fNIR and EEG studies. Responses were collected using MATLAB and SuperLab Software through keyboard strokes. Participants selected \"Y\" for yes and \"N\" for no and graduate students scored for correctness using 1 as a successful attempt and 0 as an unsuccessful attempt. The dependent variables collected were cognitive measurements of hemodynamic responses (visuospatial processing) and processing speed or response time. Neurological Data from Science Learning Tasks as Inputs to the ANN. Psychophysiology is the study of the relationship between psychological manipulations and the resulting physiological responses, measured in the living organism, to promote understanding of the relation between mental and bodily processes (Andreassi, 2007). Psychophysiological responses are therefore automatic, non-voluntary responses from the autonomic nervous system. These autonomic responses are useful because being involuntary, they are not subject to participant bias as are self-report measures. Although behavior is the result of ongoing mental processes, the observed behavior is not the equivalent of mental activities, because these activities are not always translated into motor acts. However, these mental activities themselves, although not directly observable, are behaviors. Therefore, utilizing autonomic psychophysiological measurements gives the ability to bypass observed behavior which is a more direct mode of learning about the underlying cognitive processes (Andreassi, 2007). These psychophysiological measurements are numerical since they measure the magnitude or frequency of a physiological state (ie blood oxygenation level) and, as in the case with using fNIRS, robust since there are many electrodes being used concurrently, all recording data about blood oxygenation level. Such comprehensive, quantitative data is perfect for an ANN 55

to analyze and forecast. Cleaning the data Hemodynamic data from fNIRSoft was exported as an Excel file (N=131,390). This was separated by the participant and by the task. Problem complexity data, also organized by participant and task, was kept in a separate Excel workbook. Mental rotation tasks were coded as 0 for the 2D Wedge and Dash models (wd) and 1 for the 3D Stick and Ball models (b). Strings (i.e. words) from both spreadsheets were changed into number codes (ie low complexity= 1, medium complexity= 2, advanced complexity= 3) for analysis before merging, or combining, the spreadsheets using the ID number and problem number as Join Keys (outer join). In order to handle missing data, several data imputation methods have been evaluated with each machine learning algorithm to optimize model performance. This step was taken for all algorithms, particularly since the step regression algorithm could not be used on the Rapidminer platform with missing data. Therefore, each algorithm was tested with data that was imputed using zero, the average value, or with no imputation at all. With average imputation, missing values within the dataset would be replaced with the average value of all possible values within the dataset, whereas zero imputation would mean that missing values would be replaced with' 0.' The data was then separated into four sections using stratified sampling into two training and two testing sets (N= 1,533,642) i.e. a K-fold holdback approach (Singhi & Liu, 2006), saved as CSV files, and then imported and saved into Rapidminer’s repository for the development of the Artificial Neural Network. 56

Exploratory Factor Analysis Exploratory Factor Analysis (EFA) is used when a researcher wants to discover the number of factors influencing variables and to identify the correlated variables as it is easier to focus on a few key variables than many trivial variables making it a useful technique for placing variables into meaningful categories (Lloret et al., 2017). The EFA was conducted using RStudio version 1.2.5033 (www.rstudio.com). The first training dataset was imported into RStudio as CSV. The “psych” and “GPArotation” packages were loaded and parallel analysis was conducted to determine the number of factors for factor analysis. This analysis produced the scree chart and results indicated that there were 5 factors in the instrument. To do an exploratory factor analysis, extraction method and rotation both need to be predetermined (Lloret et al., 2017). For this data, the Ordinary Least Squared/Minres factoring method was chosen as it is known to provide similar results to Maximum Likelihood without assuming a normal distribution and derives solutions like principal axis through iterative eigendecomposition. Varimax was used as rotation method as it assumes that the factors are uncorrelated. Mental Rotation for Visual Literacy Assessment Visual literacy is defined as the ability to read and interpret to understand the information presented in pictorial or graphic form (Elkins, 2009). Spatial thinking includes a variety of skills including recognizing spatial relationships that are relevant to STEM discipline phenomena, predicting the effects of spatial transformations and creating representations that encode spatial information (Ramadas, 2009). Visual literacy tasks with an average level of difficulty (Mnguni et al., 2016), such as mental rotation, assess our ability to determine whether objects have the same shape despite 57

differences in orientation or size. This form of assessment is a classical visual perception problem in which participants have to imagine one of the figures rotated in the same orientation as the other to successfully mentally rotate in the given time (Larsen, 2014). Mental rotation calls for the visual review and spatial thinking to visualize the rotation of an object in space (Hegarty & Waller, 2005) and has been found to be a core component of scientific reasoning (Xu & Franconeri, 2015). These results are considered predictive of achievement in STEM disciplines (Lamb, Firestone, & Mcmanus, 2017; Stieff et al., 2018). This is why mental rotation has been studied extensively for over five decades, establishing that the cognitive processes involved with mentally and physically rotating an actual object are analogously showing a positive linear relationship between angular disparity and response time; whereby the farther objects are rotated from one another, the longer the rotation takes to simulate (Stieff et al., 2018). This linear relationship enables its use in assessing the visual component of science literacy, or visual science literacy. Chemistry has been identified as one STEM domain where mental rotation has been observed to correlate with achievement on a variety of assessments and at multiple levels of instruction as well as on a variety of chemistry assessments (Stieff et al., 2018). Therefore, in this study, molecular structures were used for the mental rotation tasks because when making identity judgments in comparing molecular structures, mental rotation has been found to be a primary process engaged by novice students, requiring students to reason about given spatial information to perform mental spatial operations in order to make a successful comparative identity judgment (Stieff, 2011; Stieff et al., 2018); all visual and spatial skills necessary in being considered scientifically literate (Mnguni et al., 2016). Mental manipulations and transformations of images are a recurrent theme in the reports of imagery by scientists. Psychological research indicates that visual-spatial images are easily 58

susceptible to transformations: in the mind, or externally via concrete models, or on paper. Further, images can hold powerful metaphorical connotations that suggest relations and concepts extending beyond their concrete physical form (Ramadas, 2009). Machine Learning with Artificial Neural Networks An important consideration is how the different disciplinary languages between research and classroom practice interpret and understand each other, requiring transdisciplinary educators to provide links between the fields. This divide arises because behavior is measured in terms of choice and response time, whereas neural activity is measured by spiking activity or blood- oxygen-level-dependent (BOLD) signal in fNIRS, for example (Dempsey, Harris, & Shumway, 2015). One possible bridge between behavior and brain-systems is through the use of cognitive computational models such as the STAC-M (Student Task and Cognition Model) (Huys, Maia, & Frank, 2016; Lamb, 2014; Lamb, Annetta, Vallett, & Sadler, 2014). Cognitive computational models are mathematical formalisms that embody psychological principles, often evaluated by their ability to account for behavioral data. The mechanisms in these models can be related to both behavior and to neural measures, thus providing a possible bridge between the abstract cognitive data, practical outcomes the teacher can make use of in the classroom and in the development of supplemental resources. Cognitive computational models are often composed of artificial networks, a machine learning method, very loosely based on biological motivations with a natural proclivity for storing and using experiential knowledge to solve problems through the interaction of various processing elements (Lamb et al., 2016). A fundamental feature of an ANN is the hidden nodes which connect the input and output elements, therefore they are based on a level of abstraction which is a cognitive function unique to higher-level organisms, such as the neural systems the 59

technology is based on (Lamb et al., 2016). Therefore, ANNs serve as a model for biological systems, including intelligence, as adaptive processors in real-time, and as shown in this study, methods for data analysis. The authors also propose the future use of the ANN for assessment as a real-time signal processor. The modeling of an ANN is a function of the numerical values assigned to the nodes within the model. Using an algorithm, those numerical values are transmitted along with the network. However, while there are considerable similarities between ANN algorithms and ANN models, there are considerable differences within the terminology and the manner by which the data is used. ANN models are far more useful in the development of repetitive analyses necessitated by statistical algorithms as opposed to the more transient data of which ANN algorithms are appropriate; a robust ANN model may consist of multiple ANN algorithms. In this study, patterns in neurocognitive data were analyzed amongst participants completing mental rotation tasks to develop a predictive ANN to assess spatial aspects of a subjects’ visual literacy; an innovative assessment of science literacy using psychophysiological data. Developing the Artificial Neural Network in RapidMiner RapidMiner (www.rapidminer.com), formerly YALE (Yet Another Learning Environment), is a platform for machine learning, data mining, text mining, predictive analytics, and business analytics. Used in science, education, training, rapid prototyping, application development, and industrial applications, Rapidminer has an AGPL Open Source License and is a data analytics tool used in real projects. Ranking second in 2009 and first in 2010 according to KDnuggets, a data mining newspaper, RapidMiner offers a GUI for the creation of an analytical method (reading data from the source, transformations, applying algorithms). All GUI modifications are stored in an XML (eXtensible Markup Language) file, and this file is read by 60

RapidMiner to run the analyses. RapidMiner provides all tools for data processing analysis, data modeling, data and visualization of results. RapidMiner also allows links to a wide range of data sources, such as Oracle, Microsoft SQL Server, MySQL, and Excel, Access, as well as many other data formats enabling easy deployments of models at scale. Correlation-based filtering was used to remove features that had a very low correlation with the predicted effect and behavior constructs (correlation coefficient < 0.04) from the initial feature set. This method involved the calculation of Pearson’s Correlation coefficient between frequency data of each of the Optodes and the target ‘task success’ output. Features with correlation coefficients < 0.04 for all of the five affective states were then removed from the overall feature set of 26 variables. A total of six features were removed from the initial set hemodynamic output dataset, leaving 20 features that were ultimately used in the development of the prediction models. The feature selection for each detector was then done using forward selection within the Rapidminer platform, where each feature is evaluated individually. The first feature that results in the best performing model is chosen in the forward selection process, and then all possible combinations of the selected feature and the corresponding feature are evaluated. In this way, the following features are selected and the feature selection stops when the required predefined number of features is selected or when the model does not improve further with the addition of another feature. Rapidminer model comparisons. Once the mental rotation task data (ie Task, Complexity, Score) was merged with hemodynamic response data received from the fNIRS outputs. Models for each construct are built in the RapidMiner 5.3 data mining software, using common classification algorithms that 61

have been previously shown to be successful in building affect models: Naïve Bayes, Generalized Linear Model, Logistic Regression, Deep Learning, Decision Tree, Random Forest, and Gradient Boosted Trees. Using the first set of training data, each algorithm was systematically compared in order to identify the relative weights of the predictors and the most accurate predictive model for the data. The models were validated using 10-fold student-level batch cross-validation. In this cross-validation process, trials in the training dataset are randomly divided into ten groups of approximately equal size. A detector is built using data from all possible combinations of 9 out of the overall 10 groups and finally tested on the last group. Cross-validation at this level increases the confidence that the affect and behavior detectors will be more accurate for new students (Table 1). The output was evaluated by repeating the selection process on each fold of the participant-level cross-validation to determine how well the models generated using this selection method perform on new and unseen test data. The final models were obtained by applying the feature selection to the entire dataset. A precision-recall curve was developed to define the tradeoffs between precision and recall at the different confidence thresholds of the unproductive persistence model. The use of a precision-recall curve makes it easier to understand how well the model performs across its predictive spectrum rather than at only one stage and can also be used to pick the optimum threshold for interventions with different costs or benefits (Davis & Goadrich, 2006). Precision represents the proportion of instances identified as correct that are true instances of correct answers, while recall represents the proportion of instances of real correct answers identified as correct. To put it another way, precision indicates how good the model is to avoid false positives, while recall indicates how good the model is to avoid false negatives. Together, precision and 62

recall provide an indicator of the model's balance between these two forms of error (Davis & Goadrich, 2006). Table 1. Model Comparisons Algorithm Gradient Naïve Logistic Generalized Decision Random Deep Boosted Bayes Regression Linear Tree Forest Learning Accuracy Trees Model Classification 96.6 71.4 74.8 79.0 78.2 77.3 84.9 Error 3.4 28.6 25.2 21.0 21.8 22.7 15.1 F1 AUC 97.5 78.9 83.1 86.3 85.2 85.6 89.2 0.989 0.751 0.797 0.789 0.822 0.840 0.888 Classification and Error. The Gradient Boosted Trees (GBT) model had a higher overall classification accuracy (96.6%) than each of the other predictive algorithms with corresponding accuracies of 71.4% for Naïve Bayes, 79.0% for the Generalized Linear Model (GLM), 74.8% for Logistic Regression, 84.9% for Deep Learning, 78.2% for Decision Tree, and 77.3% for Random Forest. GBT also had a lower overall classification error (3.4%) than each of the other predictive algorithms, where Naïve Bayes had a 28.6% classification error, GLM had 21.0%, Logistic Regression had 25.2%, the classification error for Deep Learning was lower at 15.1%, Decision Tree's error was 21.8%, and the Random Forest was found to have 22.7% classification error. F1 Measure and AUC. The F1 Measure, also commonly used to evaluate the accuracy of a model by finding the balance between precision (number of times the model gets the same result) and recall (how often the model is correct) with a higher percentage indicating a consistently accurate model. 63

The Naïve Bayes model had the lowest F1 Measure with 78.9% followed by Logistic Regression at 83.1%, Decision Tree at 85.2%, Random Forest at 85.6%, Deep Learning at 89.2% and then the Gradient Boosted Trees at 97.5%. Evaluation of the Area Under Curve (AUC) will compare the Specificity of the model to the Sensitivity of that model. With a range of 0 to 1, and a greater value indicating a better performing model, the GBT's AUC of 0.989 is further evidence that it is the superior model as compared to Naïve Bayes, GLM, Logistic Regression, Deep Learning, Decision Tree or Random Forest with AUC ranging from 0.751 for Naïve Bayes up to 0.888 for Deep Learning. The AUC performance metric was computed on the original, non-replicated datasets. In our model performance measurements, the AUC was used as the primary measure of model goodness, as it is recommended that this metric be particularly suitable for skewed data (Jeni, Cohn & de la Torre, 2013). The model with AUC of 0.5 performs at random, and the model with AUC-ROC of 1.0 performs perfectly. It is worth noting that the AUC considers model confidence. A combination of features has also been selected from the forward selection process in each of the impact and behavioral models, which provide some insight into the type of student interactions that predict the particular affective state. Therefore, the algorithm used for the study’s ANN model was the Gradient Boosted Trees (GBT) algorithm due to its overall surpassingly precise predictions as it makes successively more accurate (gradient) weights for predictors by using the weights from the previous decision tree (Natekin & Knoll, 2013). The extra training set was then used to train the model after the initial training set determined weights of predictors to limit feature set selection bias, a common result of using the same training dataset in both selection and learning (Singhi & 64

Liu, 2006). After training the model with the second set of training data, the model was able to accurately predict with 99.9% accuracy mental rotation success with the first set of testing data. Developing the Confirmatory Model in SPSS In IBM’s Statistical Package for the Social Sciences v.25 (SPSS; https://www.ibm.com/products/spss-statistics) there are fewer algorithms available than in Rapidminer for complex neural network modeling since SPSS is a simpler statistic system. However, as SPSS is more commonly used by educational researcher, being able to develop an analogous ANN in the software would not only confirm the model developed in Rapidminer, but increase the potential for use by educational researchers. The two available algorithms in SPSS for classification in an neural network are multilayer perceptron (MLP) and radial basis function (RBF). The multilayer perceptron (MLP) or radial base function (RBF) network is a predictor (also known as inputs or independent variables) function that minimizes the predictive error of target variables (also known as outputs). These algorithms generate a predictive model for one or more target-dependent variables based on the values of the available predictor variables. Furthermore, most of these implementations have used the Multi-Layer Perceptron (MLP) style ANN models coupled with the error Back Propagation (BP) algorithm (Jayawardena & Fernando, 1997), the parameters of MLP, however, are strongly nonlinear. The BP algorithm that uses the steepest descent approach does not guarantee convergence to the optimal set of parameters globally. Therefore, selecting the \"best\" from a set of locally optimal parameters requires many trial and error attempts. An alternative to the MLP is the Radial Basis Function (RBF) network (Jayawardena & Fernando, 1997; Chen et al., 1991), which has linear parameters and finds applications in other 65

fields such as electrical and electronic engineering. Theoretically, Park & Sandberg (1991) proved that the RBF type ANNs are capable of universal approximations and learning without local minima, thus ensuring convergence to globally optimum parameters. Moody & Darken (1989) showed that the RBF-type networks learn faster than MLP networks for hypothetical situations which is why an RBF was used in the development of the ANN in SPSS. However, this robustness comes at a cost since RBF-type networks take longer to learn from the data and is not as easily deployed in a classroom setting. Rasch Analysis of Visuals The Rasch rating scale model (Rasch model) was used to evaluate the underlying psychometric properties of the instrument and a Rasch analysis was performed to rank the mental rotation problems in order of difficulty creating a hierarchy of visual attributes. A Rasch analysis is a confirmatory model to form a valid measurement scale where the incorporated data has to meet the Rasch model requirement. As a quantitative, probabilistic measurement system developed by Danish psychometrician and mathematician Georg Rasch (1901–1980), a Rasch analysis can be used to 'transform raw human science data into abstract, equal-interval scales,' (Bond & Fox, 2015, p. 7). Originally, the Rasch model was developed to be used with dichotomously scoring objects, for example, true/false or right/wrong, and is based on the early work of Thurstone and Guttman (van Zile-Tamsen, 2017) however, The Andrich Rating Scale Model (RSM; Andrich 1978) has developed the model into a variation which can be used for polytomous data such as is the case in this study. Information is provided about item difficulty, person-ability, and reliability as is the case with all Rasch models which assumes the dependency of task performance on both the ability of the person and the difficulty of the item (Bond & Fox, 2015). 66

A key feature of the Rasch model is a table of predicted answer probabilities designed to address the key question: when a person with this ability (number of test items correct) experiences an item of this difficulty (number of individuals who succeeded on the item), what is the probability that this person will get this item correct? (Bond & Fox, 2015). This is done by incorporating a method for ordering persons, in this case, students, according to their ability and ordering assessment items according to their difficulty; Interval-level measurement can be derived when the levels of some attribute increase along with increases in the values of two other attributes (Bond & Fox, 2015). For example, if I try to quantify a child's math ability, all I have is the number of things he scored correctly on an assessment that is not what I want. I have to go from what I have and don't want, in this case, a score, to what I want and don't have, his math ability in this case. That inference is the theory behind our attempts to measure latent traits, and the Rasch model is the tool we use to infer latent trait interval measurements from raw test items correctly (Bond & Fox, 2015). Rasch analysis, which is based on Item Response Theory (IRT; Bond & Fox, 2015), provides a very effective alternative for exploring the psychometric properties of measures and accounting for response bias (Bradley et al. 2015) thereby providing a better alternative for examining the psychometric quality of rating scales and informing scale improvements (van Zile-Tamsen 2017). In the physical sciences, the properties to scientific measurement or basic measurements, such as weight and length, can be physically concatenated along the measurement scale. Most measurement scales in physical sciences are derived so that although the measurement units can be iterated, the value itself (e.g. temperature and density) can not be connected together physically. Rasch measurement models are actually the nearest widely- available approximation of these fundamental measurement principles for the human sciences 67

(Bond & Fox, 2015). RStudio for Rasch. While Winsteps is commonly used for Rasch analysis, R is an open source programming language which has developed several packages for Rasch analysis including the eRm, ltm and mirt packages. R is an integrated suite of data manipulation, measurement and graphical display program facilities (R, 2012). Essentially, R is a software repository (called R website packages) that performs unique analyzes such as ANOVA, IRT, multilevel modeling, and latent class analysis, facilitates data manipulation, and produces data plots. Usually, these reports are written by professional statisticians who do not receive financial compensation, sent and reviewed to the R Core Team and then made available for use on the R website. R became publicly available as open-source software in 1995, meaning users are allowed to alter and distribute the software without limitations as long as they meet licensing requirements. R seems to be commonly used in academic settings but less so in business environments where commercial software still seems to dominate (Muenchen, 2014). RStudio version 1.2.5033 (https://rstudio.com/) is a user-friendly application that runs the R Statistical Environment used in descriptive and Rasch Analysis for this study due to the varied and custom analyses available. For this study, the eRm, ltm, and mirt packages were utilized for item analysis, the ggpubr package was used for correlational analysis along with the base R cor.test() function and the base function sapply() was used for descriptive statistics. 68

CHAPTER FOUR Results This section is organized by the research question and then by analysis type. These research questions answered are 1) What aspects of science literacy can be measured using cognitive computational modeling and how can a science literacy assessment artificial neural network (ANN) be developed? 2) How can visuals be ranked in order to determine a visual literacy level? In order to answer each of these research questions, multiple analysis was required and, in some situations, such as the Rasch Analysis and Artificial Neural Network, were used in part to answer multiple questions. The main results of this study include the theoretical connection between an individual’s performance on a series of mental rotation tasks and their visual science literacy level. A second result is a predictive artificial neural network developed as a neurocognitive visual science literacy assessment. The third result is the identification of a hierarchy in visual literacy tasks that can be used with the developed artificial neural network for the creation of immersive pedagogical tools. Research Question 1 Exploratory factor analysis findings from SPSS inform research question 1; What aspects of science literacy can be measured using cognitive computational modeling and how can a science literacy assessment artificial neural network (ANN) be developed? Exploratory Factor Analysis Results suggest that the null hypothesis (H0= Ai=0) is rejected. As chi square is very sensitive to sample size, given the large sample we have we expect a high chi square that is 69

significant. As seen in Table 2, the sampling adequacy measure in Kaiser-Meyer-Olkin was 0.674 and the Bartlett test of sphericity was significant (χ2=172802370, df=231, p<.001; Table 2) indicating that the data is appropriate for factor analysis and the initial scree plot inspection of eigenvalues (Figure 5) shows a deviation from linearity which coincides with a 5-factor solution. Table 2. KMO and Bartlett's Test KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .674 172802370.187 Bartlett's Test of Sphericity Approx. Chi-Square 231 df .000 Sig. Figure 5. Scree Plot 70

Ideally, the EFA loading coefficients would indicate a simple factor structure with a large loading coefficient on a single factor with the others loading as close to zero. By removing items from the model which does not illustrate a simple structure helps to develop a clean construct. Since large is a relative term, a cutting score of 0.30 was used as sufficient loading as per Portney & Watkins (2000). In the consideration of these criteria has resulted in the removal of 47 items and the five factors account for 28% of the total observed variance in the measure. 71

Table 3. Task- Factor Breakdown Artificial Neural Network A collection of integrated nodes (neurons) forms the artificial neural network formed to explain the interdependence between the cognitive abilities and the successful completion of tasks. The neurons establish the three ANN-input, hidden and output distinct layers. No computational function was given by the input layer, instead, stimulation was distributed into the neural network. The tasks serve as an input for the purposes of this model. The hidden layer represents the task assigned cognitive attributes, and the output layer consists of the probabilities 72

of success and failure. Artificial Neural Networks were developed within this study, an ANN using gradient boosted trees in Rapidminer and a multilayer perceptron in SPSS. Training of the artificial neural network in Rapidminer, as well as the confirmatory model developed in SPSS, used a random stratified 1/4n split data approach with two training sets and two testing sets. The first training set (trainingset1) was used for feature selection and for relative weights of possible predictors, while the second training set (trainingset2) was used to train the model. For each of the proposed attributes, an Artificial Neural Network derives propagation weights from random assignment to test set data (1/4n). As the signal travels from node to node within the network, the weights represent the strength of signal propagation. This GBT model found that the biggest determining factors or predictors to a successful mental rotation are the individual problem number of the 160 mental rotation items performed by each participant (80 Ball and 80 Dash), the Response Time and fNIR optode #16, located along the right prefrontal cortex which plays a vital role in processing visuospatial working memory (VSWM) (Suzuki, Kita, Oi, Okumura, Okuzumi & Inagaki, 2018) and episodic memory retrieval (Henson, Shallice, & Dolan, 1999). Table 4 displays the relative predictors for artificial neural network weighting. Once the network was configured using the random weighting technique, the network was then educated by providing it (ANN) with a number of examples from the 1/4n data set (1/4n= 1,533,643) to illustrate how the ANN would behave. Analysis of the results of the trained ANN with the calibration set indicates a precise behavioral indicator of the performance outcomes of the subject based on the cognitive attributes given. Table 5 offers key model fit statistics. The model shows an accuracy of 93.9 percent and an F1 score of 95.7% after running trainingset2, but since the F1 measure considers both model recall and accuracy, it 73

provides a more accurate view of the true model fit. Table 4. Relative weights of ANN predictors Variable Relative Importance Scaled Importance Percentage Problems 33131.33 1 0.692763 ResponseTime 8217.051 0.248015 0.171815 Hemoglobin16 1109.193 0.033479 0.023193 Hemoglobin12 701.6171 0.021177 0.014671 Hemoglobin1 616.2896 0.018601 0.012886 Hemoglobin8 491.6981 0.014841 0.010281 Hemoglobin4 464.4937 0.01402 0.009712 Hemoglobin9 407.9705 0.012314 0.008531 Hemoglobin5 375.0407 0.01132 0.007842 Hemoglobin11 330.0015 0.00996 0.0069 Hemoglobin3 318.8501 0.009624 0.006667 Hemoglobin2 318.4391 0.009611 0.006658 Hemoglobin13 283.7497 0.008564 0.005933 Hemoglobin6 260.8252 0.007872 0.005454 Hemoglobin15 234.7478 0.007085 0.004908 Hemoglobin7 219.2803 0.006619 0.004585 Hemoglobin14 183.9301 0.005552 0.003846 Complexity 160.3956 0.004841 0.003354 The developed Gradient Boosted Trees artificial neural network consisted of 140 trees with a maximum depth of 7 branches (Table 5). Each branch on a decision tree represents a choice or variable (ie. the type of task, task complexity, signal frequency reaching threshold) and the model uses dynamic boosting, where the trees get progressively more accurate because they learn from the preceding trees. As indicated by the high-performance level, the data fit the model; and since the individual visual task is the model’s biggest determining factor, hierarchal nature to visual literacy to be further examined with the Rasch analysis is revealed. 74

Table 5. Model Fit Statistics Neural Network Fit Statistic Mean Squared Error (MSE) 0.992783 R-square 0.7720772 Area Under Curve (AUC) 0.992783 logloss 0.19147338 Research Question 2 In order to investigate this hierarchical nature to the visual tasks and to address research question 2 by ranking the problems based on difficulty, we analyzed the measure with RStudio; using the sapply function for descriptive statistics, eRm, ltm, and mirt packages for item analysis and ggpubr along with the base R cor.test() function for correlational analysis. Descriptive Statistics Two types of representations of molecular structures were used for the mental rotation tasks; 3D Ball and Stick models (BL) and 2D Wedge and Dash models (dash). Ball tasks were completed 3440 times; the participant was correct 2275 (66.1%) times. Of the 3360 times in which Dash tasks were completed, 2321 times (69.1%) the participant was correct (Table 6). See Appendix A for full Descriptive Statistics. Table 6. Mental Rotation Task Successful Completions Mental Rotation Task Tasks Completed Tasks Completed Correctly Percent Ball and Stick (b) 3440 2275 66.1 Wedge and Dash (wd) 3360 2321 69.1 75

Rasch Analysis In order to address research question 1 and rank the problems based on difficulty, we analyzed the measure with RStudio; using the eRm, ltm, and mirt packages for item analysis and ggpubr along with with the base R cor.test() function for correlational analysis. Estimation of reliability for the measured constructs used the Separation Reliability and Pearson’s chi-square for Rasch goodness of fit. For this type of measure and with a separation reliability of 0.8679, the measured construct is estimated as adequate. Figure 6. Person-Item Map 76

Convergent Evidence Once the Rasch Analysis provided a hierarchy of difficulty for the molecular visuals, questions were split into three sections based on difficulty ranking. The questions indicated as simplest (lowest on the difficulty list) were encoded with a 1 for low difficulty, the middle third of the questions were encoded with a 2 for medium difficulty and the top third of the questions were encoded with a 3 for advanced difficulty. A correlation analysis was then performed using RStudio comparing the difficulty- encoded results from the Rasch Analysis and difficulties as indicated by the chemistry expert analysis. The association between the analyses is significant as per the Pearson correlational coefficient of 0.3770166 (p = 8.919e-07) and the Kendall rank correlation coefficient of 0.3479463 (p = 1.196e-06). Since p-values for both the Pearson correlation and Kendall rank correlation are less than the significance level alpha = 0.05, we can conclude that expert rankings (expertdiff; Figure 7) and item response data (raschdiff; Figure 8) of visuals are directly related and significantly correlated. Therefore, specific components of the visuals which are found to be more challenging can be identified beneficial to the development of educational resources to target the more challenging aspects of interpretation for science visuals. 77

CHAPTER FIVE Discussion This research has provided a rationale for integrating and expanding multiple distinct research areas relating to subject learning. The first is to use cognitive diagnostic methods to evaluate the learning of subjects as it applies to the cognitive attributes used during science processing. The second area is an analysis and modeling of the connection between processing factors that contribute to visual science literacy. The third is the use of an artificial neural network to predict cognitive diagnostics. The literature discussed in this dissertation incorporates work from multiple subject areas, such as science education, the psychology of education, measurement, and computational psychology. Research Question 1 To address research question 1: \"What aspects of science literacy can be measured using cognitive computational modeling and how can a science literacy assessment artificial neural network (ANN) be developed?\", we used the developed GBT ANN to predict the effectiveness of mental rotation in molecular models. We developed a GBT ANN using fNIRS (neurocognitive) data which predicted mental rotation performance accurately. The findings show the creation of a successful ANN model for the student's understanding of visual literacy in science, which provides useful data to educational researchers. The theoretical model gives an overview of the dynamics of the science classroom-based learning processes associated with visual texts. This Rapidminer model was validated with confirmatory results when designing an MLP in SPSS using the same training 78

data (Appendix B). Mental rotation is a visual literacy task in science and since the ANN had a high degree of accuracy in predicting mental rotation performance with the largest predictor being the individual problem number and its specific visual elements, the results of this study indicate that visual science literacy can be tested using an artificial neural network; where different aspects of visualizations that are more challenging for students are identified and more targeted interpreting resources are needed. Fewer objects and characteristics can be evaluated in neural network analysis than the traditional cognitive diagnostic tests. The use of an ANN as a statistical processor means that the probabilistic assessment of the data is included, especially the input nodes and output nodes. The input nodes will be recast as input patterns and the output nodes will be reworked to produce higher density samples, randomized probability estimates. This connection to inferential statistics enables researchers to connect ANN to functional representations of problems in the real world such as cognitive characteristics, probabilities of completion and the development of hierarchical relationships. In this context, an ANN offers answers to a variety of problems through its complicated statistical modeling with a focus on versatility. The ANN model shows a good fit and approaches human learning related to the perception of visual texts composed of scientific material. Model fit determined how many iterations resulted in convergence. Good model fit suggests a computational-cognitive model which describes the cognitive attributes underlying the visual sciences (H0, R2=0). Upon adding tasks as input nodes and attributes as the hidden nodes, a sophisticated model of cognition relating to science processing becomes possible. Given the subtle, complex and poorly understood cognitive evaluation of science 79

processes, an incremental approach like this one produces high-quality results. Factors developed through the conjunctive cognitive diagnostic model provide an overview of possible attributes underlying the processing of science. The network has initially trained on neurocognitive data sets to evaluate each factor's cognitive process. Subsequent runs of the model centered on using novel data sets to provide an ANN skill test for completing tasks. There was no overlapping of tasks due to the basic unidimensional existence of the factors, but individual attributes overlapped tasks and that overlap helps to explain the non-linear outcomes associated with learning. The model completed tasks correctly across a significant portion of the time thus validating the model and creating a predictive model of subject learning. System outcomes can be controlled with increasing and decreasing attributes as a function of interventions with task probabilities of task performance as the outcome and with manipulation of the distribution of cognitive attributes, one can experiment with the role each attribute plays in visual science literacy. Research Question 2 Rasch analysis was used to answer the second research question; How can visuals be ranked in order to determine a visual literacy level?; The standard dichotomous Rasch model, an IRT model, is one parameter logistic model which assumes that the only factors affecting test performance are the characteristics of objects, such as their level of difficulty or their relationship to the test-measured construction, and the characteristics of participants, such as their level of ability. The equation for test performance, when items have two or more possible response options, can be found in Appendix E (Equation 2). Using the Rasch logistic model, the values of the item complexity and the skill levels of the individual are measured and reported in the logits. A logit scale translates task complexity 80

and skill levels to interval level ratings and puts these two values on the same spectrum. This continuum reflects the Rasch model's primary premise, which is unidimensionality of scale. In other words, there is a dominant dimension that underlies the response of an individual. Secondly, we used principal component analysis to analyze the loading of the components. Eigenvalues imply one underlying dimension, showing positive residuals in the contrast plot. The study of residual patterns reveals that the residuals loaded onto the original subscales in one direction. The unidirectional charge is reflective of the construct's unidimensional existence. The purpose of this analysis was to achieve an interval level scale of visuals, and in doing so, use a Rasch model to determine the degree to which visual objects are organized along a hierarchy. The factors tend, however, to be basic components of the visual text. CFA also provides a means of determining that the five variables are locally independent and, consequently, IRT analysis can be used to parameterize components and model them. PCA offers a secondary confirmation of the number of factors, where the study of individual values shows five factors using the root > 1 criterion. Rotated solutions often show a simple structure with five factors which are linearly independent. EFA cross-validation using 1/4n CFA indicates that the data structure and factor loading suggested is representative of five latent traits or factors. As these attributes span multiple factors, these factors serve as an organizational structure for the underlying cognitive attributes. The factors, however, also help in conceptualizing relationships and organizing the input nodes within the ANN. The factor analysis also provided a means of eliminating tasks that would undermine the study's data structure. Poor factor loading resulted in the elimination of 47 factors. Such considerations concerned primarily activities associated with aggregating work objects through multiple tasks. The aggregate nature of the task items confuses the factor analysis because of 81

their association across multiple items not necessarily related to each other. Overall results in terms of reliability suggest that the task functions as an internally consistent task completion indicator linked to each of the five variables. This internal consistency thus ties the factors clustering to the tasks. The factors provide a systematic way of sorting out task relationships as a result of completing complex tasks. Analysis of the larger complex task is possible via the ANN thanks to the aggregation of the simple tasks. Limitations. Despite these findings, some limitations to the study should be considered. The inherent flexibility of ANNs sometimes leads to an overfit, which increases as variables increase, resulting in the randomization of components. This can lead to lower performance of future data or new test data. To control for this, it is important that the data have a similar or higher level of complexity and the application of backpropagation, as used in GBT, reduces the number and type of overfit errors (Ajwani et al., 2013). Another limitation of this study is that fNIRS tends to overrepresent the blood flow in the brain especially in a high-stress environment, like when testing (Racz, Mukli, Nagy, & Eke, 2017). In this study, a signal analysis was done in order to correct this. Other limitations to the study were due to the use of secondary data. Although the ability to extract new information from previously recorded data, may provide many benefits related to its unobtrusive speed of analysis and ability to triangulate and reveal serendipitous relationships of the variables, it is not without its negative aspects. One challenge presented by using secondary data in this study was due to the packaging if the stimulus images, the decentralized storage of the data, and the nonstandardization of dataset file types. As this data was from the NSF funded Project Leveraging Neuroscience for the Education of Science (LENS), it was 82

recorded by researchers at multiple universities. Although the data was shared with the NSF repository, there was limited ease of access for new secondary data researchers. Therefore, although some images for the individual mental rotation tasks were available for inspection for this study, the 3D ball and stick images were not. However, these images will be evaluated for future study. Secondary data science is an emerging field and one that is not without its growing pains. As secondary data becomes more utilized in mainstream research, a global standardization of standards and format will be necessary. For example, to access the EEG and eye-tracking data which is stored in the repository requires the secondary data scientist to have the software used to generate the data. By exporting all data in standardized formats such as CSV, or multiple applicable formats, would help data to be reevaluated by others to find meaning in the new patterns in the data. Future Implications One of the new areas of neuroscience is the study of the physiological and cognitive systems involved in visual perception (Felten, 2008). My research follows that strain and when working with autonomic systems, many dependent variables are obtained in real-time. This lends itself to the second perspective on valid measurements, where, as studied by Lamb et al. (2018) individual’s spatial ability is illustrated through the analysis of the multiple dependent variables during a set of similar tasks. The value of critical thinking and working memory in education has been supported by research (Allen, Baddeley, & Hitch, 2017), yet, despite increased attention to understanding learning processes in education, the rapid development of cognitive sciences has outpaced traditional educational research's ability to keep pace with the verification, development, and translation of new ideas from cognitive science to education. In 83

educational neuroscience and cognitive science, the main conceptual challenge is to link levels of analysis, connecting brain processes (functions) and behavior (actions) in a mutually coherent way (Goldwater & Schalk, 2016; R. Lamb, 2014). Implications for Researchers. This research developed an ANN to assess a latent cognitive trait utilizing neurocognitive measurements as predictors. In this situation, the latent trait was visual science literacy, however, this study supports the development of ANNs capable of assessing other latent cognitive trait such as consciousness (Sengupta et al., 2020) which can be used in sleep research (Farooq & Jain, 2017) as well as for individuals with conditions like locked-in syndrome (Dash et al., 2019). For example, researchers, comparing fMRI data from non-responsive individuals and from healthy volunteers, saw the same region in the brain light up when asking for the subjects to imagine supplementary motor tasks such as walking or playing tennis (Owen et al., 2006) . Since this ANN used data produced from the autonomic nervous system, these involuntary measurements can be potentially used to investigate individuals in non-responsive states such as in perpetual comas where self-reporting is an impossibility (Michel et al., 2019). The way that this ANN is able to predict a latent trait, is in machine learning’s inherent capacity to utilize Bayes Theorem in computations to develop Bayes networks. Bayes theorem bases the probability of an event happening based on the knowledge of conditions which are related to that event. A corresponding Bayes network would use those probabilities to model the relationships of the variables in order to answer probabilistic questions. This is the same process used by decision trees, regressions, and cluster analyses used to train machines to make predictions (Belland et al., 2017). 84

Model Development Software. While Rapidminer was able to develop a useful model, it is preferred by data miners and not the curriculum designers who may use this model in the development of pedagogical tools and resources. SPSS is usually used for statistical modeling in educational research (Connolly, 2007) which is why a confirmatory model has been built using this ubiquitous program. While SPSS programming can be considered by some to be tedious, the advantage of using SPSS over other statistical environments is due to its considerable graphical simplification which makes the point-and-click menus in SPSS one of the easiest to learn programs of statistics commonly used by educational researchers. Stata and SAS also have capabilities for point-and-click but need text-based syntax for more procedures. For education research, Stata is favored by those who do not have complex data management needs but need access to state-of-the-art statistical procedures; SAS is preferred by researchers who need to handle complex data sets and work many hours a day with this package, and SPSS is typically favored by those who do not need to perform complex data management activities or cutting-edge statistical procedures. Since this analysis uses secondary neurocognitive data, data miners would be more likely to be setting up the data for analysis and data miners often use specialty software (ie. Rapidminer) with resources to increase usability across deployments, however as educational researchers, who typically use industry standards such as SPSS, are likely to be using that information to drive educational pedagogy and theory, which is why ANNs were developed in both. The fact that it could be made in both with such accuracy and in multiple formats shows how flexible this form of analysis can be for neurocognitive data. Secondary Data in Educational Research. Due to advances in computational technology, internet-based technologies allow vast 85

quantities of data to be collected, and developed archives make data accessible to researchers worldwide, the practicality of using existing data for analysis is becoming increasingly important (Smith et al., 2011). Using this existing data provides a viable option for researchers who may have limited time and resources, as well as a great option to find new data collected for alternative purposes, saving time and resources and extracting as much information as possible from a dataset. An underutilized empirical research exercise, secondary data analysis follows the same basic principles of research as primary data studies and has steps to be followed just as in any research method. This methodology of systematic research classifies secondary data analysis as a viable method for engaging in scientific investigation (Johnston, 2014). The usage of secondary data has social, methodological, and theoretical benefits. Secondary analyses are an unobtrusive, democratic type of research. Ethically, though not gathering additional data from individuals and through respecting the right of an individual to be left alone, privacy is preserved. This, ethical benefit, is perfect for study into sensitive topics and into hard-to-reach and disadvantaged populations (Johnston, 2014) and the growing availability of low-cost, high-quality datasets means that secondary analysis will ensure that all researchers have equal opportunities for work that was only open to more wealthy research institutions (Smith, 2011) While a particular research question will decide how to interpret the secondary data, there is a systematic approach of procedural and evaluative measures that go beyond the statistical process or algorithm used to analyze the data. First, the question to be answered for research needs to be developed. It is accompanied by the selection of an appropriate dataset to be used, and finally, the validity, reliability, and generalizability of the data set will be tested before defining the research method (Johnston, 2014). 86

Secondary research provides countless possibilities for the reproduction, reanalysis, and reinterpretation of existing work. Secondary research can also allow data from other sources to be triangulated, for example by comparing sample survey results with census data or early study findings with more contemporary studies (Smith, 2011). The National Science Foundation (NSF) has started to focus on using secondary or big data in its Harnassing the Data Revolution ‘10 Big Ideas’ focus as of 2017, as it provides researchers with the opportunity to undertake longitudinal analyzes, research and understand past events, re-analyze primary studies with new perspectives, and engage in exploratory work to test new ideas, theories, and models for research design. This study, therefore, makes use of secondary data in exploring new information from previous research, funded by the NSF, on multimedia learning in science education. Implications for Educators. This neurocognitive computational model would be ideal for the personalization of immersive educational resources such as VR and AR which can educate students on topics that are too abstract or impossible to create or manipulate in real life (Hand et al., 2016). In fact, the use of neurocognitive measurements to guide VR programming and assess cognitive processes while learning is an emerging field. This future direction would be to take corresponding eye- tracking and EEG data and incorporate the streams of neurocognitive data as more levels of the ANN in order to develop a more robust model of student visual science literacy which would then be embedded within the backend processing of an immersive educative resource for a personalized learning experience. Studies using recent brain imaging techniques have shown that virtual VR encounters in the corresponding real situation activate the same brain areas as those activated (Campbell et al., 2009; Clemente et al., 2010). Although fMRI has been considered the gold standard in relation 87

to dynamic brain imaging, the required huge machinery dimensions, disturbing noise, and electro-magnetic interferences with other instrumentations, together with the horizontal and unnatural position of the participant during scans’ acquisition, constitute the most limiting factors in the use of the technology with VR paradigms. However, fNIRS non-invasive headband sensors can be worn with virtual reality head-mounted projection making it an ideal measurement to supply an immersive educative experience with the real-time neurocognitive data for the personalization of individual student learning. Therefore, this will require a new form of professional development for educators and curriculum designers. While the immersive educative experiences will be ultimately coded by engineers and many resources are being created to help create a global immersive educative network to be shared with educators around the globe. Educators will now need to understand the data being recorded by the neurocognitive devices or at least understand enough to provide interventions and support. Specifically, although the developed ANN is able to indicate a relative visual science literacy level, it is up to the educator at this point to indicate what the student should be doing to target those latent traits. However, if this ANN is used within an immersive learning environment, the data will guide the technology to the next steps to target the necessary skills and will automatically guide the student there but it is at that point the job of the educator to know the available immersive educative resources and how each could help the individual student to succeed. This drives the educator into more of a mentor role than a content resource. Implications for Science Educators. The field of education and specifically science education would benefit from the expansion of field measurement capabilities by providing a new means of measuring subject 88


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook