Home Explore EDULEARN_2016_16_after_submition

EDULEARN_2016_16_after_submition

Published by Maria Zhekova, 2023-06-27 17:17:17

Description: paper_after_submition

Read the Text Version

Pages:

1 - 10

AN AUTOMATIZED QUALITY ASSESSMENT SYSTEM BASED ON A NATURAL LANGUAGE INTERFACE FOR EXTRACTION AND PROCESSING OF RAW DATA OF THE EDUCATIONAL SYSTEM G. Totkov1, M. Zhekova2, HR. Kostadinova3, G. Pashev4 1,4University of Plovdiv “Paisii Hilendarski” (BULGARIA) 2University of Food Technology (BULGARIA) 3New Bulgarian University (BULGARIA) Abstract This study covers the techniques and tools suitable for creating a natural language interface (NLI) of a system for dynamic quality monitoring in education. The main objective is to automatize the generation of values of quantitative indicators based on raw data of the assessed objects. The implementation of such an NLI requires the creation of a) a model of the domain area (in this case – the context is the educational area); b) a model of a quality assessment system (including a hierarchical system of standards, criteria, qualitative and quantitative indicators, etc.); c) linguistic database and procedures ensuring the processing of texts written in natural language; d) methods and tools for automated data extraction from various information systems in order to form values of quantitative indicators, etc.) Models, methodology, and tools for creating an NLI for a specific digital environment to evaluate and accredit Bulgarian higher education are developed. Experiments with the evaluation of quantitative indicators (formulated in natural language) used in monitoring professional fields in Bulgarian higher education are provided. Keywords: automated data extraction, natural language interface, quality monitoring system, text-to- query, database 1 INTRODUCTION Quality assessment and accreditation of higher education institutions are areas that need automation of related processes. The research focuses on the processing of academic data collected by higher education institutions (HEI) to assess the quality of Higher Education (HE), particularly on the automatic extraction of quantitative values of indicators from a quality assessment and accreditation system. The information systems of universities, schools and departments continuously generate data of different natures and structures, the volume of which is growing daily. Consequently, the question arises: Can the requested data for the formation of values of quantitative indicators be extracted and filled automatically based on raw data from the information systems of the relevant HEI? This opens up new possibilities for the designers of a quality monitoring system in education. Namely, for the system to be able to generate a query to a database (DB) and extract the value of the target quantitative indicators by recognizing the primary (raw) data that build them through natural language text analysis. The system is designed to have a natural language interface (NLI) and has the following functionalities: - Creation and maintenance of a conceptual and linguistic model of domain area (DA); - Maintaining a hierarchical system of standards, procedures, criteria, and qualitative and quantitative indicators for evaluating the quality of HE; - Building a system of rules for evaluating quantitative indicators; - Building a system of procedures ensuring the processing of texts in natural language; - Building a system of rules for transforming language parameters into a request for a report to a university database; - Maintaining a set of templates of grammatical constructions with which to ask questions to the NLI of the dynamic monitoring system, in order to answer questions related to the level and quality of education in Bulgaria; The study is based on a general NLI model [1], which answers user questions to university IS in order to extract information and generate references related to students in a given university. The work presents a conceptual and computational model of DA, a linguistic model of DA, as a special case of

the general model, a script for processing natural language text questions for extracting values based on raw data and experiments. 2 MODELS, METHODS AND APPROACHES FOR QUALITY ASSESSMENT OF HIGHER EDUCATION Digitization, the introduction and development of new technologies, as well as globalization, are all processes that lead to the need for structural and functional changes in the educational process. Extremely relevant and necessary in modern conditions is the review of existing approaches to education management in order to ensure and improve the quality of educational services, which is impossible without the implementation of new methods and the improvement of existing forms of management. For years, researchers and organizations have analyzed the characteristics of training that guarantee high results, offering different definitions of the concept of \"quality of HE\" [2]. Some researchers [3] define quality in education as \"a set of input, process and output elements of the educational information system, providing services that fully satisfy internal and external evaluators by meeting their explicit and implicit requirements\". They offer 7 models for determining the essence of the concept of quality in the context of HE field – goal and specification, input resources, processes, satisfaction, legitimacy, lack of problems, and organization of learning [4]. In the article [5], quality is considered a complex system that includes learners, learning content, processes, learning environments and outcomes. Each quality management system performs its monitoring by extracting and post-processing the obtained results from various academic sources. A proven approach to ensuring the quality of HE is to use a mutually complementary system of internal and external forms of control and evaluation, which are carried out according to standardized rules, methodologies and procedures [4]. Another approach for evaluating the quality of various services such as e-learning, e-courses, programs, educational e- resources, software, etc. is through a system of questions (questionnaires). The iQTool, using questionnaires, assesses the quality of teaching and the quality of learning materials [6]. The integrated e-learning quality assessment system [7] also developed a questionnaire by evaluating the quality of learning depending on 5 factors: learning objects, design of learning objects, learner services, program presentation, and technological infrastructure. The quality system, developed for own needs by the University of Graz, is designed to follow the logic of the quality management cycle – setting goals, planning to achieve them, actions are taken, analysis of results and process improvement. The Quality assessment tool creates an assessment project with a tree structure of analyzed characteristics, assigned weights and questions for each characteristic with satisfaction levels. Based on the filled data, the tool generates a questionnaire and a subsequent report [8]. TESTA is a joint project with four universities that aims to improve student learning quality through curriculum-level assessment. The cross-university project approach has been tested with more than 100 study programs in over 40 UK universities and works with teachers, students and managers [9]. All considered systems and approaches collect, analyze and evaluate raw data (characteristics) of learning objects, but no tool or system has been found that understands the meaning of a target indicator and identifies the raw data that make it up. In Bulgaria, the creation of NLI systems for automatically extracting values from a free-entered natural language query is a poorly researched area. Attempts are being made to populate linguistic resources, train models, etc., but natural language analysis and procedures for processing natural language text and extracting the metadata that makes it up are unknown. 3 GENERAL MODEL OF NLI TO HIGHER EDUCATION ASSESSMENT AND ACCREDITATION SYSTEM The NLI is the most frequently mentioned approach that allows users to enter their queries to an information system (IS) in natural language [1]. User search natural language processing (NLP) starts with recognizing the explicit parameters in grammatical construction, followed by assumptions and checks to identify implicit information, and ends with generating a query and retrieving the information target.

Figure 1. A general model of a digital system with NLI for dynamic quality monitoring The system for monitoring and evaluating the quality of higher education institutions (HEIs) stores aggregated data according to established standards for each HEI, professional field (PF) or doctoral program. It performs analysis, monitoring and evaluation based on accepted and agreed standards, and procedures and maintains criteria and indicators for the formation of evaluations. Its database consists of quantitative and qualitative indicators, divided according to relevant criteria and formed with corresponding relative weights and numerical values. The comparison of these indicators enables a real comparison of the strengths and weaknesses of the HEI in this field, as well as their arrangement according to them. The goal is the unification and periodic collection, once or twice a year, of the data of the relevant universities in a single IS that serves all interested parties. The objectivity of the evaluation of the achievements of higher education in each specific PF is achieved by unifying the inspection periods for all HEIs. In order for a quality assessment system in education to automatically extract and fill in the quantitative dimensions of the assessment indicators in it, it must be able to identify and process the raw data, with the help of which the query is formed, to the university database who have this data. The process of ensuring quality monitoring of education in HEIs follows a certain logical sequence of actions. - The raw data is formed by requesting data from the higher schools regarding the accredited direction or doctoral program. They relate to a time period or moment (calendar or academic years, dates, etc.), and the evidence for their credibility is systematized in relevant tables; - The HE quality assessment system collects and summarizes this data; - On their basis, the values of the quantitative indicators are calculated, which in most cases are relative parts. A number of data from the tables of raw data can be retrieved in the form of requests to the information systems of the relevant HEI (if it has similar functionalities), as well as from national data sources for higher education such as the National Centre for Information and Documentation (NACID), the Rating System (RS) of higher education, and others. The work of the NLI is to cope with the task of extracting the meaning of an indicator (quantitative or qualitative) and reaching the raw data that make it up. For example, if the requested indicator is set as follows to the NLI, it would need to identify the two variables that make up the \"relative share in percent\" relationship, then perform the requested operation on the retrieved values: Indicator: n = Relative share (in percentages) of the number of disciplines compared to their total number in PD conducted in a foreign language. Raw data: A = disciplines conducted in a foreign language; B = their total number of disciplines in the professional field. Result: Val(n) = 100 |A| |��|

The research methodology is based on a hybrid approach that facilitates natural language search and translates human intent into instructions for the system's database. The work proposes a methodology for software modelling and the creation of information resources necessary for the synthesis of a reference request, based on raw data, in order to extract the target answer. The approach involves transforming indicator texts from a HE quality assessment system into natural language commands that extract the parameters from the text, and replace them with constants that correspond to linguistic units. Subsequently, they discover the time period of the requested request and if there are other specific conditions (such as professional field, university, program, or course) for which information is sought. The implementation of a similar NLI to a specific digital system for evaluation and accreditation of HE, in order to generate references, requires the creation of a) a model of the domain area (DA) (in this case – the Education System); b) model of a quality assessment system (including a hierarchical system of standards, criteria, qualitative and quantitative indicators, etc.); c) linguistic database and procedures ensuring the processing of natural language texts; d) methods and tools for automated extraction of data from various IS, in order to form values of quantitative indicators, etc. 3.1 CONCEPTUAL NETWORK OF CONCEPTS FOR DA To create a conceptual network of concepts from the DA, a logical scheme of nodes and arcs is built, imitating the way the human mind works. The diagram visually describes the connections and relationships between the participating elements in the considered DA. The specific field (Higher Education) includes concepts describing each of the participating objects, entities, resources and systems and the relationships between them, including the language units building quantitative indicators in the quality assessment system. Physically, the conceptual model can be implemented through a network or relational database. In this specific case, the network model was chosen, which has a number of advantages for managing data with a complex structure. Figure 2. Part of the conceptual map with the logical connections between objects and subjects in DA

For visualization, the CmapTools tool (Fig. 2) was used, in which the concepts are organized into nodes, and the connections between them are made through connected arcs, by means of linking words. Concepts on the web can be extracted from text documents; from the quality assessment system itself (from its interface and database), from IS of HEIs, educational frameworks, standards, etc. The network is not static, it can constantly develop and expand. The literals that make up the DA model are automatically extracted from it. 3.2 DOMAIN AREA MODEL The DA model contains characteristics and relations between literals from the conceptual map and concepts in a given natural language (in the specific case Bulgarian language, but it could be any other natural language). Each relation provides a means of transforming linguistic phrases with fixed elements (nodes and arcs) to discover the relationships between natural language concepts and the MapLiteral that correspond to them. Thus, in the presence of one or more language parameters that are known (explicit), the others standing next to them in the neighbourhood can be easily identified. The DA model is a kind of vocabulary (words and phrases) with linguistic characteristics for each of the concepts. The columns in the DA Model have the following designation: - NLPhrase contains the language unit‘s characteristic of the DA; - Synsets s a list of synonyms – words that can be interchanged; - MapLiteral is a constant (literal) that corresponds to a natural language word/phrase; - MapRole is the role of each concept in the conceptual scheme of the DA. Table 1. Model of the Domain area № NLPhrase Synsets MapLiteral MapRole 1 academic year school term, term, college year, academic term, grade AcademicYear node 2 assistant associate, appointee Assistant node 3 discipline subject, practice Discipline node 4 position post, seat, stand, rank Position node 5 primary contract basic contract, first contract, main contract PrimaryContract node … …………. ……………………………….. …………. …….. … …………. ……………………………….. …………. …….. 37 project plan, idea, preposition Project node 38 publication periodical, paper, article, book, scientific work Publication node 39 sanctioned penalty, sanction, sanction mark, penalized Sanctioned 40 speciality area, field, area of specialization, branch of knowledge Speciality node 41 student learner, undergfirealdduoaftset,ucdoyllegian, trainee Student node 42 participant participator, associate, contributor, colleague, member Participant node 43 curriculum a course of study, modules, schedule, studies, subjects uPlan node 44 faculty number number, id FacNumber node 45 education form teach type, educated type, train form EduForm node 46 habilitated teacher science degree Habilitated node The list of concepts is filled with standard terminology from the dictionary of the relevant natural language and allows integration, extension and updating of data and avoids duplication. 3.3 LINGUISTIC MODEL OF THE DA Through the DA model, language literals are parameterized to objects in the quality assessment database. With the available correspondences, it becomes possible to form requests and retrieve references from the system's database, based on raw data of the assessed objects. Table 2. Accordances between elements of the concept map and objects in the HEI database № MapLiteral DbHEI_1 DbHEI_2 ……… DbProperty DB_type 1 AcademicYear University.AcadYear AcadYear TableN Text Value Text 2 Assistant Teacher.PositionData.Position Position.Type Text Text 3 Discipline Discipline Subjects TableN Text 4 Position PositionData Position TableN 5 PrimaryContract PositionData.ContractType Contract.Type ColumnN

… …………. …………… Projects TableN Text … …………. ………….. ScienceWork TableN Text 37 Project Penalty.Type ColumnN Text 38 Publication Project TableN Text 39 Sanctioned Publications Speciality TableN Text 40 Speciality PenaltyType Users ColumnN Text 41 Student TableN Text 42 Participant Speciality Users.Role ColumnN Number 43 uPlan Student Curriculum TableN Text 44 FacNumber Project.Team Users.Number ColumnN Text 45 EduForm uPlan Curriculum 46 Habilitated Student.FacNumber Users.SciStatus EduForm PersonalData.IsHabilitated In Table 2. the accordance between language literals and objects from different databases of several HEI is shown. In this way, if the raw data in the database of the criteria system is missing, it can be easily retrieved and filled. 4 PROCEDURES ENSURE THE PROCESSING OF NATURAL LANGUAGE TEXTS One of the steps in the methodology proposed in [1] consists in compiling structural templates of text questions that can be set on the database of the HE evaluation system. Based on structural models of queries on the database of quantitative indicators for the purpose of automated extraction of their quantitative values (a part of which require calculations to obtain), several basic types of queries are categorized. Grammatical models of indicators are categorized into patterns (language constructions), according to the method of calculation and obtaining their numerical value. The value of the raw data is decisive for the calculation of the target quantitative dimensions, such as ratio, proportion, percentage, etc. To distinguish question patterns let us denote by Nind the set of all quantitative indicators. 4 (four) types of query templates Q1, Q2, Q3, Q4 are used. The calculation of the values of quantitative indicators of a given type and the corresponding evidence for them follow a scheme (Question_pattern) characteristic of the corresponding template (QP). For each of the quantitative indicators, data can be requested at a specific point in time or for a specific period. The Accreditation_type table (see Table 4.) systematizes the periods (moments) to which the raw data refer and are collected.  id – a unique identifier, the primary key for the table. The field type is an integer and it is required;  period – period or date to which the data refers. The field is of DateTime type and is mandatory. Table 4. Table Аccreditation_type – time period for which higher education data are submitted for storage id period 1 For accreditation period 2 On the specific date (of data collection) 3 For months 4 For academic years 5 For calendar years Table 5. Table Question_pattern – the models of questions that can be asked to the HEI to automatically fill in the values of the indicators QP Pattern description Formula type Indicators Conditions Q1 Number of elements of S ��(��) = |{�� ∈ ��}| A number of analyzes during S = {analysis of the market in the period of the labour market and of expectations of the period of labour and of personnel users, in order to identify requirements for the expectations of personnel training of trainees. users, with the aim of ……}

Q2 Number of elements of S with ��(��) A number of external audits of S = {external audits of PF by property P = |{�� ∈ ��: ��(��) = 1}| PF were conducted in the certification organizations } period by certification P is satisfied by elements of organizations other than S carried out in the period by NAOA. organizations other than the NAOA Relative part (in percent) of the Relative part (in percentages) S = {disciplines in PF at the number of elements in X, ��(��) of the number of disciplines moment} Q3 compared to their total number in = 100 |{�� ∈ ��: ��(��) = 1}| compared to their total number P is true for elements of S S, with property P held in a foreign language |��| in PF conducted in a foreign language. The average value of the ratings ��(��) The average value of the S = {teachers on the main of attribute A (according to a 1 citation index of the teachers contract currently participating Q4 certain norm Norm, in = |��| ∑ ��(��) on the primary employment in PF measurement unit E) of the contract, participating in the A = <citation index> elements of S ��∈�� training in the PF, at the For Norm(n, A) h-index is moment. used 5 EXPERIMENT The implementation of the software prototype is based on the proposed general model based on the main groups of components – DA model; linguistic model and rules for transforming the search into a request to specific university databases, a model of the quality assessment system (hierarchical system of textual standards, criteria, qualitative and quantitative indicators); a natural language interface, tools for acquiring and managing language metadata, and a query generation module. The linguistic module is powered by text examples extracted from the hierarchical database of the system for quality assessment and accreditation of HE. Table 3. Raw data that are part of texts of quantitative indicators The linguistic module parser uses regular expressions in the Python 3 language to load the data into memory. Functions are used to translate text into English, as well as to normalize text (Fig. 4).

Figure 4. English translation and text normalization functions Normalization is the process of removing non-meaningful words, removing punctuation, lemmatizing each word, and removing duplicate words, and this serves as the basis of the queryable data loaded into memory. From the sample data given above, an in-memory object with the following attributes is automatically extracted: {'teacher_primary_employment_contract': 20, 'teacher_primary_employment_contract_habilitated': 5, 'teacher_primary_employment_contract_sanctioned_result_appraisal': 3, 'teacher_primary_employment_contract_editorial_college': 7, 'teacher_primary_employment_contract_occupied_new_academic_position': 4, 'teacher_primary_employment_contract_participated_outgoing_mobility': 5, 'national_scientific_project': 4, 'international_scientific_project': 2 } The closest text match search relies on the same normalization of the user query and looks for lines with the highest base form word matches to those of the user query. The realization of the search is given in Fig. 5. Figure 5. Search for the highest match of words with those of the user query A linear search of the words closest to the query results in base form has been implemented. Due to the small number of primary data for a particular university, the speed of this algorithm is not relevant, therefore optimization for indexing and searching with complexity θ(log(n)) was not done either. As the number of primary data for a given university grows, one could think of quick-search indexes for a given keyword in normal English form and return only a list of those objects that contain it. If the query is of a special type, such as \"relative part\", the search value is calculated using the given formula corresponding to this query type. A test code is given in Fig. 6.

Figure 6. A fragment of code that calculates the value using the relative part formula It should be noted that with a \"relative part\" type question, there is no need for the user question to contain information about which object it is relative, since in the dynamic space of objects, this can be determined automatically by the parser, which forms an answer to the user. Clarification is only necessary in case of possible ambiguity. The table below shows some sample user queries and the response returned by the linguistic module. Table 6. Examples of identifying raw data from natural language texts of indicators Request Raw data Response QP Q3 Relative part sanctioned at Sanctioned as a result of attestation; The relative part of sanctioned teachers as a result of attestation compared to teachers on a Q3 certification Teachers on a primary employment primary employment contract is: 0.15 The relative part of members of editorial boards Q1 contract; compared to teachers on a primary employment Q1 contract is 0.35 Q1 Relative part of teachers on Members of editorial boards; Number of teachers sanctioned as a result of certification - 3 editorial boards Teachers on a primary employment Number of national scientific or scientific-applied projects - 4 contract; Number of international scientific or scientific- applied projects - 2 Number of teachers Sanctioned as a result of attestation; sanctioned during attestation Number of national projects National scientific or scientific-applied projects International projects International scientific or scientific- applied projects; If the user request is ambiguous, it returns all possible options (table 7.). Request Table 7. An example of an ambiguous query value: 4 projects value: 2 Response to the query national scientific or scientific-applied projects (national_scientific_project) international scientific or scientific-applied projects (international_scientific_project) 6 RESULTS An experiment was made with 20 text questions. In 18 of them, the raw data was identified and a response was returned to the user. Several problems have been noticed, and to address them, we propose to take measures to speed up and improve the recognition of language literals. To improve pattern recognition, it is recommended: a) use of a synonym dictionary of terms in base forms (this version does not have it); b) using a morphological dictionary for the particular natural language, instead of English and translation to English (as in this version). To speed up in the future, we propose to improve the model by a) using multidimensional indexing for speed (faster search); b) automatic construction (completion) of the linguistic model of the software when new data appear (eg new indicators, words, phrases), and not every time the script is run, etc. The dynamic loading of the information into local indexes and WordNet files could be done less frequently rather than every time the Python script is run. In addition, the Python Code itself could be compiled into binary code, further improving performance. In the current implementation, speed was not a primary goal. 7 CONCLUSIONS In conclusion, we can say that data modelling in an information system helps to search and retrieve information faster and easier. But at the same time, it facilitates the process of filling the database with new data, again using pattern recognition to identify their constituent parts and extract the required information.

The study explores and proposes procedures and techniques for recognizing raw data as constituents of quantitative indicators, in order to derive a target quantitative evaluation (value). The information target is in the form of a time variable value, and the latter is obtained (by a formula) based on raw data from the database of the quality assessment system. Models, methodology and tools for creating NLI for a system for evaluation and accreditation of HE have been developed. The created NLI was experimented with for the evaluation of quantitative indicators (formulated in natural language) used in the monitoring of professional fields in HEI in the country. ACKNOWLEDGEMENTS The research is partially supported by grant No. BG05M2OP001-1.002-0002-С02 “Digitalization of the economy in an environment of Big data”. REFERENCES [1] Zhekova M., (2022) Natural Language interface for an information system of 'e-university' type, PhD thesis, 144 pages, 2022. (in Bulgarian) [2] Gaftandzhieva S., Doneva R., Quality of higher education - provision, assessment, automation, Scientific works of the Union of Scientists in Bulgaria-Plovdiv, Series B. Technics and technologies vol. XII, ISSN 1311-9419, Plovdiv, pp. 230–233, 2014. (in Bulgarian) [3] Liu S. S., Cheng Kaiming, Williams, G., Tam Kam-chuen, Higher Education Quality in Hong Kong: The Stakeholders' Perspectives, A Working Paper of the Business Research Centre, School of Business, Hong Kong Baptist University, 1997. [4] Gaftandzhieva S., Model and System for Dynamic Quality Evaluation in Higher Education, PhD thesis, 177 pages, 2017. (in Bulgarian) [5] Uniceff, Defining Quality in Education, The International Working Group on Education Florence, Italy, June 2000. [6] Moumoutzis, N., Christoulakis, M., Arapi, P., Mylonakis, M., & Christodoulakis, S. (2009). The iQTool Project: Developing a quality assurance tool for e-learning, 2009. [7] Zaman W., P. Ghosh, K. Datta, P. N. Basu, A Framework to Incorporate Quality Aspects for e- Learning System in a Consortium Environment, International Journal of Information and Education Technology, Vol. 2, No. 2, April 2012. [8] Gaftandzhieva S., Totkov G., Doneva R., Quality and evaluation of education with good university practices, \"Paisii Hilendarski\" University Edition, ISBN 978-619-202-538-0, first edition, Plovdiv 2020. (in Bulgarian) [9] TESTA, Retrieved from http://www.testa.ac.uk/index.php/about, Available 04.05.2023

Maria Zhekova

EDULEARN_2016_16_after_submition

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

EDULEARN_2016_16_after_submition

Description: paper_after_submition

Read the Text Version

Maria Zhekova

TOP SEARCH

RELATED PUBLICATIONS