Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore electronic_detection_of_plagiarism_in_finnish_higher_education_institutions_2013_beta


Published by, 2016-04-19 09:26:42

Description: electronic_detection_of_plagiarism_in_finnish_higher_education_institutions_2013_beta

Keywords: none


Read the Text Version

ELECTRONIC DETECTION OF PLAGIARISM IN FINNISH HIGHER EDUCATION INSTITUTIONS 2013 Totti Tuhkanen Expert group: Dan Holm Markku Ihonen Anna Johansson Ole Karlsson Marja Kokko Pauliina Kupila Irma Mänty Anne Nevgi Elizabeth San Miguel Kari Silpiö Sanna Suoranta Sari Tervonen Arja Tuuliniemi Kaie Veiler Minna Vänskä RAKETTI

PLAGIARISM AS ACADEMIC PLAGIARISM MISCONDUCT STUDENT CHEATING Plagiarism detectible by means of electronic detection CSC - IT CENTER FOR SCIENCE RAKETTI Publications ISBN 978-952-5520-51-4 2

SUMMARY Finnish higher education institutions under- of indexing are created by the use of incompat- went a phase of acquiring and introducing ible saving formats, among other things: PDF, electronic detection of plagiarism on a large the most common format for distributing elec- scale in 2008–2013. A total of 34 higher educa- tronic publications, can be produced in nu- tion institutions are now using an electronic merous variants, some of which manipulate detection system. The two software tools cho- the text identification algorithms to produce sen as support systems for education and re- incorrect equivalence values. search are Turnitin of American and Urkund of Swedish origin. International comparisons Authentication and user role management can be developed so as to support the more flexible have evaluated these as providing the highest quality. and simultaneously more data secure linking of the systems to the processes of education Several groups cooperated in preparing the and research. acquisitions, but each organisation concluded licensing agreements individually. Most of the PD software can be used at the system level or acquisitions did not involve competitive ten- service level: in the latter case, the system is dering. integrated with the basic systems and control frames of the higher education institution in In anticipation of future contract periods, it can question. Solutions integrated into the e- be stated that there is room for development in learning platform and publishing systems have the acquisition procedure. Not all operational achieved the highest utilisation rate and influ- requirements that may be related to high- ence, as these make the detection of plagiarism volume use in production were identified in an operational element in the learning envi- the pilot projects that preceded acquisitions. ronment or publishing process. Particular at- The licensing periods should be prolonged in tention must be paid to the functionality of order to facilitate the systematic utilisation of data transfer interfaces and the overall data the benefits involved in accumulating refer- security level of the operating environment ence databases. On the basis of experiences when systems are integrated. gained, it would be useful to create a shared set of criteria for higher education institutions When linked to the core processes of education and a licensing template to meet the needs of and research, strong legitimacy and precise subsequent contract periods. usage rules are required from the originality check of study attainments. In addition, the The benefits related to quality and risk man- checking process is interwoven with the more agement, sought through PD systems, cannot extensive application framework of good sci- be achieved only by using electronic technolo- entific practice (GSP). There is room for de- gy without the support of a strong control sys- velopment in procedural guidelines: they tem. Mere control arrangements, without the should be tied more closely and comprehen- support of technology, are not adequate either sively to the regulations that steer operations. for achieving these benefits. The development In the course of examination processes, clear of plagiarism detection procedures requires authoritative relations, reasonable conse- coordinated cooperation between student af- quences in relation to the act committed, and fairs and information administration and li- privacy protection of the parties concerned braries. should be ensured. The harmonisation of prac- PD system reference databases involve critical tices can improve the equality among students, the fluency of processes and overall data pro- development needs. More information content tection. of Finnish and international origin that would serve higher education in Finland should be The use of plagiarism detection at all stages of included in the indexing of the PD systems. the study path imposes new content-related These reference resources should be devel- and perspective-related requirements on the oped through active customer guidance in line basic concepts of the control of research mis- with priorities defined by educational fields. conduct and the processes employed in pro- Without our initiative, the market-oriented cessing it. The survey project involved the reference strategy of international PD services production of basic definitions for these work- will not recognise, and thus cannot take ac- ing concepts and material to guide their appli- count of, the special needs of Finland. cation. In addition, the technical quality of reference data should be enhanced. Gaps in the coverage 3

Tables: Graphs: Table 1, p. 26: Integration of PD systems at higher education Graph 1, inside cover: The relationship between plagiarism, institutions. PD system survey 2-4/2013. plagiarism as academic misconduct and student cheating, Table 2, p. 51: Scale for grading the degree of intention in an and the electronic detection of plagiarism. Ref. Graph 9. act or attempted act Graph 2, p 12: Relationship between a PD system and PD ser- Table 3, p. 74: Proposals for action related to the survey and vice, and the control framework of plagiarism management. bodies responsible for their implementation, and estimated Graph 3, p 12: Links between a PD system to higher education volumes of work institution infrastructure and key operating instructions. Graph 4, p. 34: General process for agreeing on indexing. Images: Graph 5, p. 34: Turnitin process description for technical prepa- Image 1, p 14: Plagiarism Reference Tariff, national ration of indexing. guidelines for the sanctioning of plagiarism in the UK. Graph 6, p. 41: PD check of a thesis as part of the assessment, Image 2, p. 17: Digitoday news item on the introduction of approval and publishing process. Nalkki, 29 October 2007. Graph 7, p. 43: Procedures in Case of Academic Fraud at the Image 3, p. 23: Thesis originality check feedback session 7 University of Vaasa February 2014. Graph 8, p. 46: Three perspectives for examining the concept of Image 4, p. 38: A PD system check report view broken by a plagiarism. faulty PDF file. Graph 9, p. 51: Plagiarism, plagiarism as academic misconduct Image 5, p. 40: Side-by-side user interface image of Turnitin and student cheating. and Urkund assignments in Moodle. Graph 10, p. 54: Continuum of plagiarism related to a study Image 6, p. 60: Ethical guidelines for learning, University of attainment. Turku. Diagrams: Recommendations: Diagram 1, p. 15: Share of cautions issued for plagiarism and P. 61-62: Recommendations for higher education institutions fixed-term suspensions of all sanctions in higher education for training students to act ethically and for instructions institutions in Sweden in 2012. for dealing with cases of student cheating. Diagrams 2, p. 21: Distribution of PD systems in Finnish uni- P. 66: Recommendation for training in good scientific practice versities in 2013. at the various stages of study path. sities of applied science in 2013. P. 70: Recommendation for the assessment of ethical guide- Diagram 3, p. 21: Distribution of PD systems in Finnish univer- lines, PD guidelines and guidelines for GSP training based on questions from several actor perspectives. Diagrams 4, p. 22: PD service owners in higher education insti- tutions Diagram 5, p. 22: Providers of end-user support in higher edu- cation institutions Diagram 6, p. 22: Owners of PD services, providers of end-user support and actors primarily responsible for maintenance support in higher education institutions. Diagram 7, p. 24: Readiness of higher education institutions to cooperate in acquiring PD software. Diagram 8, p. 24: Schedule for the next PD related acquisition. 4

CONTENTS: SUMMARY .............................................................................................................................................................3 FOREWORD ...........................................................................................................................................................7 PART 1: TECHNOLOGY 1. PD SYSTEM, PD SERVICE AND METHODS FOR CONTROLLING PLAGIARISM ....................................................11 2. WHAT IS THE SCALE OF THE PROBLEM? ..........................................................................................................13 3. BRIEF HISTORY OF PLAGIARISM DETECTION SYSTEMS IN HIGHER EDUCATION INSTITUTIONS IN FINLAND .....................................................................................................................................................17 3.1 The first wave of PD technology ....................................................................................................................17 3.2 The use of PD technology becomes a quality system requirement ...............................................................18 3.3 Summary of the history of acquisitions .........................................................................................................20 3.4 Acquired systems, their owners and maintainers in 2013 ............................................................................21 4. DEVELOPMENT TARGETS IN THE ACQUISITION PROCESS ..............................................................................24 4. DEVELOPMENT TARGETS IN THE ACQUISITION PROCESS ..............................................................................24 4.1 Higher education institutions' preparedness for cooperation in licensing ....................................................24 4.2 Joint preparation of selection criteria ...........................................................................................................24 4.3 Ensuring good data protection and data security throughout the service environment ..............................25 4.4 Assessment of efficiency, usability and reliability .........................................................................................27 4.5 Service provider's responsibility for the quality of reference databases ......................................................29 4.6 Summary of development needs in the acquisition process .........................................................................30 5. USE OF PD SYSTEMS IN PRODUCTION .............................................................................................................31 5.1 Visibility of Finnish publications in reference databases ...............................................................................32 5.1.1 Charting of content-related requirements ............................................................................................32 5.1.2 PD as part of scientific publishers' publishing processes .......................................................................33 5.1.3 A uniform indexing process for various service providers......................................................................33 5.1.4 The use of theses and study attainments as reference data .................................................................35 5.1.5 Summary of measures for enhancing the quality of reference data in Finland .....................................37 5.2 Factors undermining the reliability and usability of PD systems ...................................................................37 5.2.1 Partial indexing and exclusive licensing ................................................................................................37 5.2.2 Invalid file formats ................................................................................................................................38 5.2.3 Manipulated records .............................................................................................................................38 5.2.4 Poor quality of language versions in user interfaces and instructions ..................................................39 5.2.5 Operational differences between direct use and integrated use ...........................................................39 5.2.6 Timing of originality check influences the thesis verification and publishing process ...........................41 5.2.6 Summary of the management of problems undermining reliability and usability ................................42 5

PART 2: ACTIVITIES 6. COMMON BASIC CONCEPTS IN PLAGIARISM MANAGEMENT .........................................................................44 6.1 Background and basis for definition ..............................................................................................................44 6.2 Ethical writing and permissible citation ........................................................................................................45 6.3 Plagiarism in a study attainment and plagiarism as misconduct ...................................................................46 6.3.1 Analysis of the study of the concept of plagiarism ................................................................................46 6.3.2 Definition of the concept of plagiarism in a study attainment ..............................................................48 6.3.3 Definition of the concept of student cheating .......................................................................................50 6.3.4 Definition of the concept of plagiarism as misconduct .........................................................................50 6.3.5 Forms of plagiarism related to study attainment ..................................................................................52 6.4 Recommendation for the harmonisation of basic concepts related to the management of plagiarism .......55 7. GUIDELINES FOR PLAGIARISM MANAGEMENT AND THEIR USAGE IN HIGHER EDUCATION INSTITUTIONS IN FINLAND ..............................................................................................................................................................55 7.1 The use of PD systems creates the need for harmonising procedural guidelines .........................................56 7.2 Targets of development in the content and structure of guidelines related to PD activities ........................58 7.3 Context of guidance ......................................................................................................................................59 7.4 Recommendations for higher education institutions for training students to act ethically and for instructions for dealing with cases of academic deceit .......................................................................................61 8. NEED FOR GUIDANCE AND ASSESSMENT OF THE GUIDANCE SYSTEM ...........................................................63 8.1 Why must good scientific practice be taught? ..............................................................................................63 8.2 Preventing plagiarism in teaching .................................................................................................................63 8.3 Good scientific practice in a curriculum and syllabus ....................................................................................64 8.4 Recommendation for training in good scientific practices at the various stages of study path ....................66 8.5 Protecting the teacher's work .......................................................................................................................67 8.6 The use of electronic plagiarism detection software in teaching ..................................................................68 8.7 Introduction of electronic detection of plagiarism and communicating about it in higher education institutions .........................................................................................................................68 8.8 Recommendation for the assessment of ethical guidelines, plagiarism detection guidelines and guidelines for good scientific practice based on questions from several actor perspectives ................70 9. SUMMARY OF OBSERVATIONS AND PROPOSED MEASURES ..........................................................................63 Recommendation for training in good scientific practices at the various stages of the study path ....................73 Protecting the work of teachers ..........................................................................................................................73 APPENDICES ........................................................................................................................................................74 APPENDIX 1: Problems in text identification of PDF files and recommendations for how to proceed ...............74 6

FOREWORD The report on the introduction of electronic detection of plagiarism in higher education institutions in Finland examines the technology for plagiarism de- tection (PD technology) from the viewpoints of usability, reliability, impacts and acceptability. Simultaneously, the questions are posed of how this tech- nology could support the core processes of education better than at present, and which special requirements the high quality use of PD systems sets on the operating environment and planning of guidance. Both the observations of teachers in Finnish higher education institutions and the results of a recent EU-level comparison reveal a number of developments needs in PD systems 1 and particularly in guidance for their use. In this report, we draw attention to problems that occur if the use of electronic PD technology is planned on an excessively narrow scale, ignoring the devel- opment of operational guidance structures. A perspective that emphasises technology may for instance distort the operational culture: students are guid- ed to avoid plagiarism formally while the basic issues of the proper scientific use of information and the relation of discipline-specific description methods 2 to the general standards of ethical writing are largely ignored. In this report, we propose drafts for good practices in controlling the key problems involved in PD technology and PD processes. In line with the policy of the Ministry of Education and Culture, Department for Higher Education and Science Policy, this report was prepared by working groups and as part of the RAKETTI (Information Management as a Structural Support) project for higher education institutions' teaching and research ad- ministration. The support measures that the extensive use of new technology would seem to require were jointly defined and prioritised. In addition to technological aspects and rules, the survey process has involved determined organisation, as those responsible for the maintenance, training and admin- istration of PD systems have joined the networked ecosystem of RAKETTI ac- tors. In fact, our report is primarily a working memo for expert networks re- sponsible for the acquisition, maintenance and development of PD services. The use of PD systems involves transaction processes between higher educa- tion institutions. The management of these should be jointly agreed on. A com- petent owner will also need to be appointed for some of the national develop- ment targets in the future. The chapters of this report include several pro- posals for resolving ownership issues, and for those responsible for resolving them. 1 Impact of Policies for plagiarism in Higher Education Across Europe ( 2 Erika Löfström & Pauliina Kupila (2011): Plagiaatintunnistusjärjestelmä oppimisen ohjaami- sen välineenä. Peda-Formum 2/11, 17; A. R. Abasi & B. Graves (2008): Academic literacy and plagiarism: conversations with international graduate students and disciplinary professors. Journal of English for Academic Purposes , 221–233. 7

This report was completed with the support of a background process that in- volved many phases. In spring 2012, the University of Turku proposed that the Ministry of Educa- tion and Culture should pay attention to national-level issues that had emerged due to the boom in acquiring PD systems: how to organise the exten- sive sourcing of Finnish reference materials, how to ensure the sufficient uni- formity of investigation processes concerning suspected cases of plagiarism, how to produce legitimate concept and procedure definitions for these new operating processes, and how to improve the quality of the acquisition criteria for PD technology? The Ministry of Education and Culture's Department for Higher Education and Science Policy took charge of these issues and delegated the further prepara- tion of the matter to the steering group of RAKETTI projects. The steering group recorded the conclusion that to ensure the credibility of higher educa- tion institutions, it is vital to secure the high quality of theses with regard to research ethics. However, no sole owner for this quality aspect exists in Fin- land. As a further measure, Senior Planning Officer Totti Tuhkanen was appointed to investigate the matter until the end of 2013. His duties included the objective of establishing a network to create the basis for cooperation between PD ser- vice users and instructors. The investigation process was kicked off at the seminar titled Control of pla- giarism and electronic detection of plagiarism in higher education institutions on 18 April 2013 at Hanken School of Economics in Helsinki. A small-scale sur- vey in February-April 2013 on detection systems and the organisation of their maintenance preceded the seminar. The introductions and panel comments, describing the expectations and needs of higher education institutions, the Ministry of Education and Culture, the Finnish Advisory Board on Research Integrity, the National Library of Finland, science publishing houses and PD service providers, highlighted the key prob- lems and acute development needs in controlling plagiarism. Four working groups to continue the work were organised in connection with the kick-off seminar. The analyses, specifications and action proposals included in this re- port were prepared with their support. The four working groups have implemented several data collections, support- ed by the student affairs administration's OHA network and the National Li- brary's FinELib actors. Issues related to defining the key concepts of plagia- rism detection technology, system integrations, assessment criteria, investiga- tion process models, ethical guidelines, training in good scientific practice and control of plagiarism were handled at 14 workshop meetings. Simultaneously, higher education institutions have implemented several PD systems, published new procedural guidelines and revised former instructions. The working 8

groups' benchmarking and information exchange forum has sought to support these processes. Project materials were maintained in CSC's RAKETTI wiki. The report addresses problems related to the functionality or usability of PD systems as follows: problems are described and defined, possible paths for how to proceed in resolving them are presented alongside assessment models or piloted operating models and, if necessary, proposals for development tar- get owners for future purposes are given. The first part of the report focuses on issues related to the acquisition and introduction of detection technology, while the second part presents issues examining the operational preconditions of end users. The report includes sections prepared by the administrator and working group chairpersons, some of which are placed as supplementary information in CSC's service pages. Because the working groups have refined all contents, the report is signed by all members of the working groups. Survey report results were assessed by the OPI Synergy group of the RAKETTI project, OPI steering group and RAKETTI steering group, resulting in recom- mendations that enable the further preparation of development objectives presented at the end of this report. 26 February 2014 Administrator Totti Tuhkanen, University of Turku Group chairpersons Experts Anna Johansson, Aalto University Dan Holm, Åbo Akademi University Markku Ihonen, University of Tampere Ole Karlsson, Åbo Akademi University Kari Silpiö, Haaga-Helia University of Applied Sciences Marja Kokko, University of Jyväskylä Minna Vänskä, Aalto University Pauliina Kupila, University of Helsinki Irma Mänty, Laurea University of Applied Sciences Anne Nevgi, University of Helsinki Elizabeth San Miguel, Haaga-Helia University of Applied Sciences Sanna Suoranta, Aalto University Sari Tervonen, University of Eastern Finland Arja Tuuliniemi, FinELib Kaie Veiler, Hanken School of Economics 9


PART 1: TECHNOLOGY 1. PD SYSTEM, PD SERVICE AND METHODS FOR CONTROLLING PLAGIARISM The first part of this report examines PD system acquisitions and the possibili- ties for developing PD services used in production. The relationship of system and service perspectives is defined as follows in the review: • A PD system is an entity comprising the detection software and the refer- ence databases it uses, and the user management solution. • A PD service refers to a PD system localised for the use of the customer: it is connected to the higher education institution's IT systems, instructions for use localised in accordance with the operating environment are pro- duced for it, rules for use are confirmed and a training and information plagiarism management • scheme, including maintenance resources, has been established. is im- In context beyond the detection process, plemented on the basis of an operational policy based on the values and objectives of good scientific practice, and high quality supervision of aca- demic work. As an investment related to the quality system of education, a PD system is a dual target. Technologically, it is an easy to introduce cloud service – the basic elements can be deployed in a matter of minutes – and licensing costs are rela- tively low. However, for the student affairs and central administration, the introduction of a PD system is invariably a more large-scale investment: the roll-out stage typ- ically takes one or two years, and preparatory work easily requires more than one workyear. All users of the system need training and induction, and this type of support is required on a permanent basis due to the turnover of teach- ers and students. The localisation of a PD system to create a PD service re- quires resources by the general administration, a lawyer, higher education specialist and expert resources in ICT support for education. Moreover, the control of expectations and the organisation of a committing value discussion require the support of communications and decisions by the rector. The costs of this preparation process involving multiple variants, and on the other hand, the benefits related to the qualitative development of operations, are difficult to measure. – The most concrete indicators are probably to be found in the risk manage- ment table of each higher education institution's quality manual. Every single thesis that is proven as plagiarism tarnishes the higher education institution’s quality image in a way that may involve both economic and image-related after effects. In a higher education institution, the user base of a PD service is broad. Stu- dents, teachers, thesis supervisors, faculty investigators, editors of publication series and informaticians in libraries all represent differently emphasised user needs and the related needs for user support and instructions. Since the ser- 11

vice also involves close normative guidance, the challenges of a diversified, large volume support service are combined in PD service maintenance. – Graph 2: Relationship between a PD system and PD service to the control framework of plagiarism management. Since the detection of plagiarism is linked to several different types of control and assessment processes, the use of a detection tool is regulated by rules maintained by several different responsible bodies. The central administration of each higher education institution decides on the normative guidance of use. A network comprising representatives of the student affairs administration, research administration, IT administration and the library may be responsible for user training, application instructions specific to each discipline, and tech- nical support. The service provider may produce user management for the service, or it may involve a direct HAKA registration, specific for each higher education institution or one implemented through a learning platform integra- tion. Because a PD service is linked to so many different functions in a higher educa- tion institution, it usually takes two years to prepare its operating environ- ment, and the implementation requires diverse expertise. During the introduc- tion of a PD service, approximately 80 per cent of resources are tied to the op- erational architecture and 20 per cent to solving IT issues. Graph 3: Links between a PD system to higher education institution infrastructure and key op- erating instructions. The use of a PD system is linked to the core pro- cesses of education and research. Therefore, its use and purpose of use is recorded in a number of regulations and instructions that steer opera- tions. Several technical interfaces may also be involved. The benefits and requirements of a long life cycle are emphasised in the acquisition of an IT system this closely embedded in the organisation. The University of Turku provided the example case. 12

2. WHAT IS THE SCALE OF THE PROBLEM? In this context, we are unable to provide an overall assessment of the extent and significance of plagiarism in Finland, because no statistics are available on the issue. Basic research is available that involves concept analysis to facilitate processing of the phenomenon, the results of which are discussed in Chapter 6. Several studies, both based on extensive survey materials and a case-by-case approach, have examined the attitudes and motives related to plagiarism, the impacts of academic misconduct on the community, and the role of electronic 3 detection of plagiarism in steering scientific writing. Despite the lack of a statistical overall image, it is possible to conclude that the situation in Finnish higher education institutions does not significantly differ from that in their European and North American counterparts: the indications of local studies of student attitudes, observations and personal experiences correspond with the average results from international studies. Indicative figures for reference are provided by Sweden, with its established practice of compiling annual statistics on cases in higher education institutions resulting in a disciplinary procedure. In the period under review, from 2001 to 2011, the category monitored for student cheating, including cases of plagia- rism, increased more than fivefold. The trend stabilised in 2012, for the first time in a long while, as the number of plagiarism cases in Sweden even de- creased from the previous year. That may be an indication of successful measures for controlling plagiarism. Sanctioning that is perhaps more severe than in Finland is a remarkable feature in the follow-up material from Sweden: of the 801 cases of student cheating recorded in 2012 (less than 500 of which 3 A total of 1,600 students in three higher education institutions participated in the surveys by Puusniekka and Eskola (2004) and Suokas (2010). In the course of their studies at a higher education institution, the respondents reported having committed acts that can be regarded as plagiarism as follows: 8% had returned an exercise completed by a fellow student in their own name, 13% had received ECTS credits for teamwork without having completed their jointly agreed share, 19% had used a text taken from the Internet under their own name, and 23% had used the same exercise several times on different courses. The survey by Löfström and Kupila (2013) targeted the observations of students (N=104) and teachers (N=30) in seven University of Helsinki faculties on the forms of and reasons for pla- giarism and the significance of electronic detection of plagiarism in teaching of scientific writ- ing, and work supervision. Views of how common plagiarism is were remarkably different: 3% of students and 23% of teachers considered it fairly common. Plagiarism was not considered acceptable but the capacity for identifying its different forms varied, while students expressed their fears for unintentional plagiarism. Contextual plagiarism emerged as a significant phe- nomenon alongside intentional and non-intentional plagiarism. This unethical method was justified as a survival strategy under the pressure of lack of time and overlapping study at- tainments. The advantages that PD systems can provide for achieving good academic writing skills were not recognised by students. Löftröm (2011) observed frustration and cynicism among students. In one of the interviews, a “Teachers really lose their minds on the topic of student summarises the situation as follows: plagiarism. Nobody can say or write anything without APA citation anymore. Damn you”. On a case-by-case basis, based on a teacher's observation journal and thematic interviews, the pressures and emotional reactions in a teacher's role when encountering and dealing with cases of plagiarism are studied at the University of Helsinki. These processes may burden the teacher even more that the student and the related stigmas are reflected in teaching work for a long time. Cf. Löfström (2013). 13

involved plagiarism), 621 examined cases (81%) resulted in fixed-term sus- 4 pension and 180 in a caution. Another statistical reference scenario is provided by the United Kingdom, where more than 10,000 cases of academic misconduct are examined annually in higher education institutions. The number of cases per year in one large university alone may correspond to the total number of cases of academic mis- 5 conduct processed in all higher education institutions in Sweden. These re- sults give rise to the question: to what extent is the situation in different coun- tries comparable if the criteria and practices of dealing with plagiarism are so 6 different, as the currently are? Image 1. The Plagiarism Reference Tariff provides national-level guidelines for the sanctioning of varying degrees of plagiarism. In the United Kingdom, there is awareness of the challenge and a lengthy pro- cess of development resulted in the introduction of national guidelines, the Plagiarism Reference Tariff7 for the purpose of harmonising procedures. The 4 Universitetskanslersämbetet, Rapport 2013:6: Disciplinärenden 2012 vid universitet och högskolor, 8. 5 David Barrett: The cheating epidemic at Britain’s universities. University cheating league ta- ble. The Telegraph 05 Mar 2011, cf. e.g. Sheffield Hallam 2009/10. 6 The summary of findings for the United Kingdom in the IPPHEAE project states that no broad-based consensus has been reached about the extent or trend of the plagiarism problem. Monitoring data is too inconsistent to facilitate comparison of higher education institutions and recording and processing practices vary between units in higher education institutions. Cf. Plagiarism Policies in the United Kingdom. Executive Summary, 2.2, 2.3. 7 Plagiarism Reference Tariff 2010. 14

Tariff defines the penalties applicable to cases of plagiarism on the basis of the joint impact of the extent of plagiarism, the level of studies and possible previ- ous history of deceit. In accordance with the UK Tariff, even a brief text with identical wording may in certain circumstances form the basis for a sanction, whereas in the Nordic countries, the focus lies on the evaluation of the overall 8 significance of the plagiarised section related to the entire text. The low degree of distinction is the basic problem of statistical monitoring ma- terials on plagiarism. As the annual summary figures in Sweden, for example, include both universities and universities of applied sciences, the results can- not be analysed on the basis of the line of education and the stage of studies based on these nationwide statistics. A more specific comparison must be car- ried out on the basis of master data. The risk of deceit involved in different forms of study attainment cannot be assessed on the basis of the statistics ei- ther, nor can the number of plagiarised research plans be revealed in the re- cruitment of students. Diagram 1: Warnings (Varn.) issued in Sweden for plagiarism and fixed-term suspensions (Avst.) constitute more than one-half of all sanctions included in statistics. Of the 460 cases of plagiarism dealt with in 2012, 81% resulted in the fixed-term suspension of a student. Source: Disciplinärenden 2012 vid universitet och högskolor, 11, Diagram 5. In Finland, the Finnish Advisory Board on Research Integrity supervises the state of research ethics. It does not have available monitoring data comparable to that of Sweden. The monitoring of cases of cheating during studies is re- 8 The IPPHEAE survey report for the UK states that one of the problems involved in the pre- vailing practice of sanctions is that the sanctions imposed on students may be more severe than disciplinary procedures in workplaces for similar offences. Cf. Plagiarism Policies in the United Kingdom. Executive Summary, 2.15. On the other hand, the application of this norma- tive, strict system seems to vary by university. – Irene Glendinning, the author of IPPHEAE re- ports, estimates that UK universities have acquired PD systems on a large scale, but do not use them systematically. – Irene Glendinning in an e-mail message to Kari Silpiö 16 December 2013. 15

quired alongside monitoring related to research. Statistical monitoring indica- tors should be defined on the basis of quality control objectives set for higher education, and taking into account the quality audit model of the Finnish High- er Education Evaluation Council (FINHEEC), for example. A party responsible for carrying out monitoring should also be specified. The monitoring of thesis quality can also be enhanced and harmonised by way of data and business architectural background solutions. Slovakia provides an interesting example. The IPPHEAE comparison of countries indicates that Slo- vakian students have the best cognitive skills for avoiding plagiarism: “Almost all Slovak students (99%!) become aware of plagiarism before or during their bachelor studies. The EU average shows that 20% of students become aware of plagiarism during their masters/PhD degree or are still not sure about it.” 9 The national CRTD thesis database, established by the Ministry of Education, Science, Research and Sport of the Slovak Republic in 2010, is regarded as the underlying factor that explains the excellent results. It also constitutes Euro- pean good practice. All higher education institutions in Slovakia are obliged to 10 save all theses related to degrees and postgraduate degrees in this database. 11 In connection with the saving process, theses undergo a plagiarism check. In Slovakia, issues concerning the commensurability and transparency of the monitor- ing of plagiarism, technical comparability, parallel publications, comprehensiveness of reference materials and the up-to-date status of data collection are combined into one, centrally coordinated entity by means of a national database solution. The European Commission awarded the European Prize for Innovation in Pub- lic Administration to the Slovak Centre of Scientific and Technical Information , responsible for maintaining the joint database, for its widely influential service solution. Room for development remains in the Slovakian model in national legislation and the work supervision processes of theses, but the practical in- centives of an ethical mode of operation, and the basics of monitoring are bet- 12 ter identified in Slovakia than in other European countries. 9 Tomas Foltyne (ed.): Plagiarism Policies in Slovakia. Full Report. IPPHEAE 2013, 7. 10 Ministry of Education, Science, Research and Sport. 11 The service description of the Central Registry of Theses and Dissertations CRTD states the following: “The primary goal of the registry is to manage collection and archiving of final ex- amination works and second doctorate works (theses, dissertation and qualification works) from universities and colleges in Slovakia; The CRTD, together with selected Internet re- sources serve as a comparative corpus for the System for plagiarism detection, called also the Anti-plagiarism system (APS). Once the work is checked for plagiarism, an output protocol is produced for each work, which serves the exam commission as tool for deciding about the work originality. Both the Central registry of Theses and Dissertations and the Anti-plagiarism system are operated by the SCSTI. It is expected that every year around 70 000 of new final and qualification works is uploaded in the registry; The works are preserved in the system including the metadata, e.g. the author´s name, university/college name, etc. and will be stored for the period of 70 years. At present, all colleges and universities are obliged to send their final and qualification works to compare them for plagiarism in the CRTD/APS system.” 12 The same, 7 – 8, cf. Peta Lee: ”Student anti-plagiarism measures reap rewards.” University World News 26 February 2014. Issue 308. 16

3. BRIEF HISTORY OF PLAGIARISM DETECTION SYSTEMS IN HIGHER EDUCATION INSTITUTIONS IN FINLAND 3.1 The first wave of PD technology Broad-based electronic detection of plagiarism is necessary and possible only in the operational context of an open Internet. The first web-based services for the detection of plagiarism were developed for productional use in the United States soon after the World Wide Web was launched. John Barrie at the Uni- versity of California, Berkeley was one of the pioneers in this field, establishing 13 the service in 1995. is still operating as an in- ternational information service, but, developed on the basis of it, was to become more important. Development of the electronic detection of plagiarism began in Finland in the 1990s. For instance, the Plakki software, developed in-house at the Depart- ment of Software Engineering of Tampere University of Technology, was de- ployed in 1998, for the checking of programming exercises. The application was found useful: initially, recycled plagiarised works accounted for dozens of per cent of electronically checked exercises, but as the use of Plakki became 14 established, their share fell to a few cases per year. Plakki operated in a closed environment. When TUT needed a PD system that would recognise material more comprehensively in the open In- ternet, it introduced Turnitin for trial use at the turn of the millennium. At the time, the small amount of Finnish-language reference materials and deficiencies in managing Scandinavian char- acters made the results less reliable, even though the system's general usability was con- sidered good. TUT continued the development of an in-house PD software tool. In 2008, Nalkki was published for productive use and plans were in place to launch it in international markets in 15 years to come. Several other universities introduced detector programmes of domestic and international origin at a unit level, with economic sciences and mathematical subjects being most active in this respect. Experiences of pilot use were bench- marked, such as at a working seminar of the Finnish Virtual University on 7 May 2009, and 13 Stephanie Vie (2013): Turnitin’s History; Barrie, John M. (2008): Catching the cheats: How original. The Biochemist, 30(6), 16-19.; iParadigms Board of Directors. 14 Interview with Kirsi Ala-Mutka in the publication Veikko Surakka et. al (2003): Plagiointi opintosuorituksissa, p. 10. 15 The same. In addition, “Nalkki awaits plagiarising students in Tampere”, Digitoday 29.10.2007; TTY Pilottihankkeet 2008. 17

the first university-level deployments took place on 1 January 2010. At the time, the originality check of theses became obligatory at Åbo Akademi Uni- versity and Hanken School of Economics, and as a recommendation recorded in procedural guidelines at the University of Oulu. Oulu, Helsinki and Turku introduced the Swedish system Urkund, published by Uppsala University in 2000 , provided by Prio Infocenter Ab. Hanken School of Economics also ob- 16 tained a licence for the Turnitin system for parallel use, particularly in the checking of doctoral dissertations. Increasing plagiarism in electronic operating environments was observed as a quality problem in Finnish higher education institutions as early as in the late 1990s. Solutions for controlling the problem were sought both in control tech- nology and preventive guidance. In many cases, good instructive practices and clear operating instructions were trusted to achieve a better quality impact than mathematical detection algorithms. Research results have later validated this choice: the use of a plagiarism detection system alone without other sup- porting steering structures does not reduce the number of cases of plagiarism. On the other hand, even powerful steering structures have failed to eliminate 17 the problem of plagiarism, because of the large number of underlying causes. The operational history of the University of Tampere, traditionally active in issues regarding the control of plagiarism, illustrates the nature of basic prob- lems that is hard to change. In 2002, the University of Tampere appointed an expert group to evaluate and compare the impacts of technology-based and guidance-based methods on the prevention of plagiarism and the implementa- tion of good scientific practice in university studies. The preliminary report of 18 this expert group in 2003 mentions most of the practical challenges that still, in 2013, at the time of conducting this survey, can be identified as key prob- lems in the evaluation of study attainments and the use of PD systems. Based on the report's conclusions, the use of operational instruction tools only con- tinued in the University of Tampere for a decade, before a PD system was ac- quired. 3.2 The use of PD technology becomes a quality system requirement A major step forward in the general capacity for plagiarism detection in Fin- land was the joint PD system acquisition organised by the AMKIT consortium of libraries of universities of applied sciences in 2008–2009. This measure aimed at considerable synergy benefits in the development of user support and domestic reference databases. At present, 20 universities of applied sci- 19 ence hold an Urkund licence. The acquisition did not involve competitive tendering as such, as each organisation concluded a separate licensing agree- ment for a period of one or two years. Metropolia University of Applied Scienc- es chose Turnitin and Diakonia University of Applied Sciences opted for Epho- rus. 16 Prio Infocenter: Om Urkund. 17 E.g. Walker 2010, Jocoy & DiBiase 2006. 18 Surakka et. al 2003. A more extensive final version of the report was not published. 19 SystQuery2013 statistics of Finnish higher education institutions, 11 December 2013. 18

In the early 2010s, the university sector did not find a joint owner for the de- velopment of PD infrastructure, even though public debate about plagiarism problems in an international multimedia environment was lively. New system acquisitions were still prepared individually, however, with the support of a peer network of several universities. The University of Helsinki was the first to arrange a competitive tendering process for its PD system in 2012. It was also the first system acquisition based on open requirement specifications. Good references for the competitive ten- dering were provided by the requirement specification and competitive ten- dering implemented by 30 universities in Sweden the previous year, in which Turnitin and Urkund tied for first place. The University of Jyväskylä was supported by the experiences of the University of Helsinki and strived to modify the selection criteria to make them more dis- tinguishing. An even competition between Turnitin and Urkund was repeated. Both Helsinki and Jyväskylä finally selected Urkund. The University of Turku and the University of Tampere prepared their acquisi- tions together. The competitive tendering processes by the Swedish university consortium and the Universities of Helsinki and Jyväskylä had reduced the market prices of PD systems so dramatically that annual costs became less im- portant as a criterion for selection. The model of the University of Helsinki and the University of Jyväskylä, quite heavily modified, provided the reference for requirement specification. The selection criteria of the University of Turku and the University of Tampere emphasised requirements related to the data secu- rity of the process of use, the interactive nature of use in the supervision of work and the academic data content of reference databases. In an attempt to ensure the commensurate comparability of tenders, background documenta- tion and screenshots of the system's mode of operation were required in con- nection with each critical requirement. The aim of exceptionally detailed re- quirements was to ensure, in advance, the commensurability of information provided, because previous competitive tendering processes had met with challenges in interpreting the data provided. In August 2012, the University of Turku and the University of Tampere selected Turnitin. Urkund did not submit an offer. The fourth competitive tendering process was implemented by a consortium of Aalto University, the University of Eastern Finland, Lappeenranta University of Technology, University of the Arts Helsinki, Tampere University of Technol- ogy, and the University of Vaasa. Preparation was based on the acquisition cri- teria of the University of Turku and the University of Tampere. The relative proportion of compulsory features was increased in an attempt to further streamline the use of the criteria. The importance of data security criteria and the criteria of integrated use was increased further and price was also more important than in the previous competitive tendering process. The process re- sulted in the acquisition of Turnitin in autumn 2013. In between these competitive tendering processes, the University of Lapland acquired Urkund directly in November 2012. The value of the acquisition did 19

not exceed the threshold for competitive tendering. The National Defence Uni- versity is still examining the suitability of PD systems for its use. 3.3 Summary of the history of acquisitions As the detection of plagiarism became a legitimate part of educational quality control in the 2010s, the examination of PD system acquisitions in the frame- work of data architecture in support of the core processes of education began. This perspective altered the nature of acquisitions. Extensive advance testing and requirements in line with overall architectural design including integrability, verifiability of the data security of usage pro- cesses, the functionalities of interactive control and the validity of reference materials began to be emphasised as acquisition criteria. Because the systems were connected to higher education institutions' other information systems and operational processes in many ways, lifecycle aspects of the system had to be evaluated as well, the aim being more long-term contracts of use than be- fore. Longer contract periods introduced the obligation to subject acquisitions to competitive tendering, and motivated the development of the comprehen- siveness of system requirement specifications. The University of Helsinki carried out significant groundwork in 2011–2012 in preparing a set of acquisition criteria suitable for the Finnish operating envi- ronment. All consequrent acquisition processes have benefited from it. The ac- quisition criteria of the University of Turku were also used extensively later on. The cooperation network has facilitated the free exchange of experiences and specification data. The information basis related to the prioritisation of operational requirements has become stronger in Finland. Division of duties has also developed within the cooperation network in the handling of issues concerning various areas of expertise. 20

3.4 Acquired systems, their owners and maintainers in 2013 Finnish higher education institutions use three different PD systems at the or- ganisational level: the Swedish Urkund, the American Turnitin and the Dutch Ephorus (used by one institution only). Urkund has dominated the market since the 2010s. With the exception of two cases, Urkund has been acquired without competitive tendering. Turnitin has become more commonly used by universities in the 2010s: it has been selected in eight of the most recent com- petitive tendering processes. One university is still using the Nalkki system of domestic origin as an auxiliary system. Universities Univ. of applied sciences N/A Diagrams 2 and 3: Distribution of PD systems in Finnish higher education institu- tions in 2013. Urkund is the most common system in universities of applied sciences, Turnitin in universities. Source: Higher education institutions' SysQuery2013 + questionnaires of the survey project 2-4/13 and 8-9/13. Shares of ½ in universities refer to Hanken's solution to use two systems. Nalkki (¼) is a supplementary tool in TUT. Training, local support and enterprise resource planning services related to student affairs administration play a key role in PD system maintenance. Ac- cording to a survey conducted in spring 2013, 75% of PD systems are owned by student affairs administration, with the same share being responsible for end-user support. Libraries and teaching technology systems for which they are responsible own 25% of PD systems and 14% of end-user support. Tech- nical system maintenance and administration is the responsibility of IT admin- istration in 61% of cases, student affairs administration in 20% and the library in 15%. One higher education institution has outsourced technical mainte- nance and end user support to a service provider. 20 20 Responses to questions about service ownership and maintenance responsibilities were re- ceived from 23 higher education institutions. The results are skewed by the lack of data from several universities of applied sciences: in them, libraries play a more crucial role in the provi- sion of PD services than in universities. At the time of the survey, the roles related to responsi- bility for PD services remained undefined in many universities. 21

SAM=Student affairs management Diagrams 4, 5 and 6: Owners of PD services, providers of end-user support and ac- tors primarily responsible for maintenance support in higher education institutions. The main responsibility roles of various units influence the contents of PD ser- vices and the ways PD systems are used to a certain degree. Some organisa- tions support more guidance-focused (proactive) plagiarism management while others prefer a controlling (reactive) method. Key roles of PD system administration are one target of development: at present, the activity suffers from insufficient resourcing, so much so that the risk limit is imminent. Typi- cally, one or a few experts are responsible for the maintenance and develop- ment of the PD system, without the support of substitutes: “The following tasks are centralised with the service’s main user in student af- fairs administration: monitoring of use and reporting, handling of user feedback and development needs, monitoring of the service's life cycle plan, organising the provision of user training and serving as main trainer, 24/5 online support in problem situations, documentation about the service for the quality system and regulations steering education and research, informing about the service and promoting cooperation.”21 “The main responsibility and technical support are centralised with one person: the information service manager of our university of applied science, who issues user codes, responds to technical questions and forwards development ideas, contacting Urkund staff if necessary. Thesis coordinators in each unit bear the main responsibility for providing software user training for our personnel.”22 21 PD technology survey 2-4/2013, open question about the service organisation model, uni- versity case. 22 The same university of applied science case. 22

The use of the PD service is closely linked to the degree verification process in many higher education institutions. In this case, the PD system is required to provide the same degree of stability of use and accessibility as the basic IT sys- tem in student affairs administration. The cost benefits of a one-person maintenance model should be assessed considering the operational responsi- bilities and basic requirements of risk management. Foto: Totti Tuhkanen 7 February 2014 Image 3: In Finland, plagiarism checks are carried out on the basis of the confidence principle. The feedback session of the originality check of a thesis is often the last work supervision ses- sion. In this picture, Rector Kalervo Väänänen of the University of Turku has analysed a PD re- port and finds that in the manuscript by Mirkka Hirvonen (MA), source information has been used in line with good scientific practice. Simultaneously, he is providing last-minute instruc- tions for the articulation of results. Even the slightest correspondence findings may indicate text sections that involve immature use of source data. The originality check is a guidance situation where the thesis supervisor conducts a comprehensive review of the entire thesis and may issue the writer with the final proposals for review before submitting the thesis to the official pre-evaluation process. Provid- ing induction for all teaching and research personnel in using the PD system is a major initial investment which involves indirect benefits as the requirements of the guidance process be- come more precise. 23

4. DEVELOPMENT TARGETS IN THE ACQUISITION PROCESS 4.1 Higher education institutions' preparedness for cooperation in licensing In accordance with a survey conducted by IT Directors' Lisenssi-SIG (special interest group for licences) network in January 2014, higher education institu- tions have a cautiously positive attitude towards expanding cooperation in ac- quiring PD technology in the next licensing periods. Of the 21 higher education institutions that responded, 13 (62%) are interested in cooperation. Five (24%) cannot say and three (14%) do not find cooperation topical in acquisi- tions. In 2014, two higher education institutions (10%) will prepare for the next acquisition/competitive tendering process, six (29%) in 2015 and one (5%) in 2016. One higher education institution is preparing for the acquisition without a planned schedule. Approximately one-half (52%) of respondents 23 could not say when a new acquisition process would happen. Diagrams 7 and 8: Preparedness for cooperation in acquisitions and an assessment of when the preparation of the next PD acquisition would happen. On the basis of survey results, the conclusion can be drawn that the majority of higher education institutions would seize the opportunities provided by a joint acquisition if a centralised acquisition process were implemented. 4.2 Joint preparation of selection criteria The acquisition process of a PD system is still considered to be an exceptional- ly challenging IT project. The resources of one higher education institution alone have not necessarily been sufficient to enable the good management of criteria to cover the legal, technical, process-related, pedagogic and operation- al requirements of the acquisition. The challenging nature of the requirement framework has come up both in the handling of tenders and commissioning tests: the question remaining open may have been how to ensure the genuine functionality of the features prom- ised in the tender when they are realised in use scaled against hundreds or thousands of transactions and users. More extensive cooperation in acquisitions, and granting its ownership to a permanent cooperation body, is considered to be a good solution in terms of managing the quality of operations, and costs. 23 SysQuery2013. 24

4.3 Ensuring good data protection and data security throughout the service environment The key requirement for PD system acceptability is the certainty of strong data protection, resistance against security breaches and the use of saved reference materials only in the manner permitted by the customer. In addition to public materials, research and innovation material not yet published is saved in PD systems. The scientific and financial value of such material may be very high. Turnitin and Urkund, the widely used PD systems in Finland, are globally op- erating cloud services. The methods they use for data transfers and customer data maintenance must meet the requirements defined in EU legislation and be audited by a third party as regards data security, security breaches, quality processes and business stability. With regard to database entries carried out outside the EU area, they must be committed to the US–EU Safe Harbor framework. So far, data security requirements defined in competitive tendering have been regarded as met. A lot of attention has been focused on the specification of se- lection criteria related to general data security standards in PD acquisitions, but the rapid development of cloud technology requires that higher education institutions constantly monitor and develop the framing of questions in as- sessing data security. Discussion is polarised on adequate data security standards of cloud services. According to one view, the only sensible option is to trust international rules and be committed to compliance with frameworks and the protection of sanc- tions included in contracts. Another view emphasises the risks involved in this service model and the almost limitless possibilities of signal intelligence, be- yond the reach of legality control or in its grey area. Both cloud-based PD ser- vices used in Finland are domiciled in a signal intelligence superpower. Con- sideration of the different basic risks involved in cloud services operating in Europe and the United States is mainly ungrounded. When preparing the fol- lowing PD technology acquisitions, consensus should be reached on the basics of data security ideas, principles and the way higher education institutions in- terpret the requirements set by EU legislation. As the core system's data security standards have been verified in competitive tendering of acquisitions and deployments so far, in future the impact of other functionalities and operating processes integrated in the core system must be assessed. Service providers should be required to provide more detailed de- scriptions of the usage rules of their products and the plugins they support. However, active data security and data protection risks are probably related to the customers' habit of using integrated PD system too “flexibly” in the various training and guidance processes. The processing of originality reports and management of result listings, and the maintenance of various monitoring data involve data protection issues: these may constitute personal data files. Strict rules apply to their usage. 25

PD integrations IDM: 56% E-learning platform: 84% Publishing system: 15% Table 1: Integrations of PD systems at higher education institutions. PD system survey 2-4/2013. Finnish higher education institutions often use PD systems integrated with the Moodle or Optima e-learning platforms. Of the PD systems of higher education institutions that responded to the PD system survey in spring 2013, 84% were integrated/were being integrated with the higher education institution's e- learning platform 24 . This procedure is sensible with regard to operations and pedagogics. The problems related to the operational reliability of e-learning platforms in document management have been largely ignored in the course of integration processes. When all written coursework of students and other checked research publica- tions are saved in connection with the PD process in the databases of e- learning platforms (or other data warehouses specific to the higher education institution in question), their data security and data protection should at least meet the security standards of the PD system and general data security rec- ommendations. It should also be ensured that the systems are actually used in line with these recommendations. Each higher education institution's Archives Formation Plan (AFP) defines the normative retention periods of theses and exercises. Records are deleted from the databases of detection tools in compliance with a standardised process that should be applied in line with the annual schedule defined in the AFP. The corresponding procedure is probably not in use in any higher education insti- tution for e-learning platforms: materials remain in the database for as long as the instance is used or the course module is reset. E-learning platforms are fundamentally more vulnerable than PD systems in terms of data security. The features of e-learning platforms that logically contradict with the data se- curity concept of the PD system integrated with the e-learning platform in question may also involve data security risks. For instance, the metalinking connected to course module management tools in Moodle involves inherited user roles between courses in a way that breaks the role-based data security model in Turnitin. As many as 56% of the PD systems included in the status review carried out in the spring 2013 were integrated or were being integrated with the higher edu- cation institution's IDM . There is room for development in the implementation of user management as well. Arrangements that allow the simultaneous use of a PD system directly via the systems own user management and via the higher 24 In most cases, PD systems used by the entire organisation are integrated with the e-learning platform, but those acquired for the use of departments are not. Two higher education institu- tions have integrated the system with two e-learning platforms used side by side. 26

education institution's HAKA registration or that related to the e-learning plat- form integration, result in overlapping processes that burden administration. Such double usage is surprisingly common because it involves operational benefits insofar as the features of integrated user interfaces and direct user interfaces do not fully correspond with each other. Data transfers function slightly faster via the native user interface than when using the system via the e-learning platform. In direct use, the PD system may also support more saving formats than in integrated use, where the internal restrictions of the e-learning platform also become restrictions for the PD sys- tem. On the other hand, integration with Moodle, for instance, provides useful additional features in comparison with direct use, both in Urkund and Turnitin. – Therefore, the most active development of additional features seems to be targeted at integrated user interfaces. The solution with two parallel registration paths may facilitate the use of the PD system even after the student or staff status has ended, because the higher education institution's former e-mail address and password pairing work in the user management of the PD service even after the e-mail account has been closed. The problems involved in direct HAKA registration are related to scheme re- strictions or the diverse usage of attributes in user role management. Compli- cated situations of use occur in particular for postgraduate students whose line of duty involves not only studies and research, but also a varying degree of teaching assignments. Updating of the FunetEduPerson scheme (launched on 12 December 2013) will probably facilitate the resolution of problems encoun- tered here. PD integrations related to the electronic publishing (archive) system of theses (cover 15% of the systems) would seem to have been implemented without problems. 4.4 Assessment of efficiency, usability and reliability In the preliminary assessment and pilot projects related to PD system acquisi- tions, the main focus of attention has been on the user interface features of systems, the processability of result reports and the specification of guidance needs and usage rules to be prepared. In fact, the specification requirements of PD system process management can be considered advanced, whereas data security and vulnerability have only been examined on the basis of technical documentation, without empirical tests. In all cases, the processing efficiency of result reports has been found to be “sufficient” without separate published testing. Moreover, detection results are assumed to improve as Finnish refer- ence material is saved in reference databases. Because no published reference data about tests conducted in Finland is available, the results of tests conduct- ed in neighbouring countries should be summarised here. PD systems were put out to extensive framework tendering in Sweden by co- operating universities in spring 2011. Some of the universities completed the 27

process through their individual further competitive tendering processes, in- cluding specific tests to meet the needs of these universities. The detection test included in the joint tendering process involved three mul- tidisciplinary assembled texts based on the most used materials of university libraries. Wikipedia content was also included. Participating systems' accuracy of detection in 50 materials varied greatly: from Turnitin's 37 (74%), Urkund's 18 (36%) and GenuineText's 0 sources. Total analysis times were as follows: Turnitin 8 min. (factor: 1) Urkund 37 min. (factor: 4.6) and GenuineText 67 25 min. (factor: 8.4). Stockholm University tested efficiency, accuracy and usability with more ex- 26 tensive sets of seven assemblies of test material. They represented data con- tents from various disciplines, testing the ability to analyse uniform texts and fragmented ones compiled from many copied sources. In the test by Stockholm University, the detection ability of compared systems varied by type of material. In legal texts and humanities, Urkund and Turnitinin produced the same accuracy of detection, whereas in social sciences and natural sciences, the results were 10–25% in favour of Turnitin. Of the 27 167 references representing combined materials (database and publication materials, students' exercises and materials from the Internet), Turnitin found 50.2% and Urkund 37.7%. The ratio of accuracy between the systems was 28 1.0/0.75. The highest measured difference between the systems was revealed in the rate of the identification process: Turnitin spent a total of 15 minutes processing seven test materials, whereas Urkund spent 1002 minutes and Genuine Text 29 1026 minutes. This difference in speed does not, however, indicate a dra- matic difference in the usability of the systems, because in most actual usage situations, “by tomorrow” is adequate. Turnitin was considered to produce reports that were easier to interpret than Urkund. The Stockholm test group reached the following conclusion: “Data suggests Turnitin as the best suited originality check system for the needs of uni- versity lecturers. This system was superior in terms of all three major areas eval- uated: a) time for searching, b) identification of plagiarised text, and c) the de- sign of the originality report.”30 Hochschule für Technik und Wirtschaft Berlin is acknowledged for testing pla- giarism detection technology in Europe. It has published extensive compara- tive tests focusing on different areas of detection since 2004. The general com- parison of 2010 covered 27 PD systems. Both systems used in Finland were 25 Resultat av utvärdering, dnr SU 814-1861-11. 26 Frey Appelgren Heyman, Mattias Olofsson, Henrik Hansson, Jan Moberg, Ulf Olsson: “Can we rely on text originality check systems? Evaluation of three systems used in higher education and suggestion of a new methodological test approach”. Stockholm University 2012. 27 The same, Table 1. 28 The same, Table 4. 29 The same, Table 3. 30 The same, Conclusion 1. 28

31 placed in the best result category: Turnitin came second, and Urkund fifth. The 2012 test focused on the ability to detect collusion (dishonest cooperation between students) in exercises. In the comparison of 18 systems, Turnitin was 32 the best, Urkund did not participate. For the 2013 general test, 15 systems were selected. Critical realism character- ised the summary of results: “Most troublesome is the continued presence of false negatives – the software misses plagiarism that is present – and above all false positives. … So-called plagiarism detection software does not detect plagia- rism. In general, it can only demonstrate text parallels. The decision as to wheth- er a text is plagiarism or not must solely rest with the educator using the soft- .” ware: It is only a tool, not an absolute test 33 In this comparison by HTW Berlin, Urkund's detection efficiency was slightly better than Turnitin's, while the system usability value was higher for Turnitin than Urkund. The response to a specifying question regarding ranking be- tween the systems, by Professor Debora Weber-Wulff, in charge of the tests, was as follows: “There is no ranking! The systems were grouped in three groups, Urkund and Turnitin are in the same group, both have major usability issues.”34 The test criticised the functionality of both Urkund and Turnitin. On the other hand, the general conclusion to be drawn on the basis of HTW Berlin's results is that higher education institutions in Finland have introduced two PD sys- tems of high technical quality. 4.5 Service provider's responsibility for the quality of reference databases Some competitive tendering processes conducted in Finland in 2012 request- ed service providers to provide a roadmap for the accumulation of domestic reference materials for the use of the system. Descriptions received were con- vincing. However, the execution of roadmaps requires long-term development to be implemented jointly by the customer, service provider and scientific pub- lishers. Results of operations cannot have been verified to any significant ex- tent within the timeframe defined by the acquisition procedure . On the basis of preparatory work aiming at the development of productive use, presented in Chapter 5, it would be possible to define procedures for enhanc- ing the transparency and assessability of reference resources maintained in PD systems, and the commitment of the service provider to their development with publishers and customers. 31 Debora Weber-Wulff: Softwaretest 2010: Results of the Plagiarism Detection System Test 2010. HTW Berlin. 32 Debora Weber-Wulff , Karin Köhler, Christopher Möller: Collusion Test 2012: Collusion De- tection System Test Report 2012. HTW Berlin. 33 Debora Weber-Wulff: Softwaretest 2013. Results of the Plagiarism Detection System Test 2013. HTW Berlin. 34 Debora Weber-Wulff's e-mail message to Totti Tuhkanen on 22 October 2013. 29

4.6 Summary of development needs in the acquisition process Cooperation in PD acquisitions between universities and universities of applied sciences at the national level • One of the national-level actors, such as IT Directors' Lisenssi-SIG network, takes responsi- bility for the competitive tendering of next generation PD systems, and the joint prepara- tion of licensing agreements. • Experience-based information on acquisition processes is documented in the higher educa- tion institution project library maintained by CSC ([email protected]) in connection with selec- tion criteria used in acquisitions. These experiences help in pinpointing further develop- ment needs in the definition of criteria and quality requirements. • User support across the borders of higher education institutions is developed because in- adequate resources are allocated to the administration of PD services. In many higher edu- cation institutions, the compulsory PD check of theses is connected to the degree verifica- tion process, and the usability and stability of PD systems is one of the quality factors in the core process of education. Improving the data security and quality of PD systems and operational processes • The data security and data protection requirements applicable to PD systems are extended to cover the processes managed across integrations more precisely; plugins are included in the quality audit. • The regulations for the maintenance and deletion of reference materials in higher educa- tion institutions are harmonised by tying the retention periods of materials to the higher education institution's AFP, for example. The application of regulations both to PD system databases and learning platform databases is ensured. • User management models are standardised and the use and development of the Fu- netEduPerson scheme is ensured in a way that meets the role management-related needs of PD systems. • Propose that higher education institutions' PD procedures are included in the Finnish Higher Education Evaluation Council (FINHEEC) quality audit. Development of PD system testing procedures • A new broad-based testing model for PD systems is developed in Europe-wide cooperation. An international testing model might improve the usability of reference results and create the basis for stronger customer guidance in system development. Testing could be expand- ed to include the assessment of usage process data security and vulnerability, for example. Development of Finnish reference data materials • Define a target programme and procedures for the development of reference materials in higher education institutions. • Higher education institutions' student affairs administration or CSC's OPI activities, with the support of the National Library's FinELib, could be responsible for the coordination of this development area. • Obligations related to the development of reference databases are set on PD service pro- viders. 30

5. USE OF PD SYSTEMS IN PRODUCTION Higher education institutions in Finland have acquired an almost comprehen- sive PD infrastructure, but only indirect information is available about its us- age and utilisation rate. The usage of PD systems in universities of applied sci- ence can be characterised as established, with the majority of deployments having taken place as early as in 2009. However, the collection of information about instructions in connection with the preparation of Chapter 7 of this report proved that the usage policy of PD technology varies considerably from one university of applied science to an- other. The usage method can also be characterised in a polarised manner, ei- ther in a system-based or service-based way (cf. Graph 2 p. 12). High variation in instructional practices and operating methods may explain the qualitative variation that Senior Lecturer Erja Moore pays attention to in her study of 91 theses, saved in the Theseus archive of publications. Of the theses analysed by Moore, more than half met the general quality criteria of good academic text, but every eighth thesis showed serious defects in referencing or they con- tained plagiarism. Moore summarised her results as follows at the IPPHEAE conference in Brno: “The study shows evidence that partly plagiarized theses are accepted and published. … This evidence is in sharp contrast with the educational discourse about high quality in higher education.\" 35 In universities, the introduction of PD systems is still in the introductory stage. No qualitative study about the usage of reference data that cover several uni- versities, as Moore's Theseus analysis does, has been made, but the need is imminent. No estimates are available related to the coverage of PD checks and the degree of maturity of procedures. However, instructions regarding the checking procedure are on average more comprehensive in universities than in universities of applied science. The need for national-level guidelines on good scientific practice in teaching, to supplement the Finnish Advisory Board on Research Integrity's guidelines of good scientific practice, is recognised in 36 several different contexts. 35 Erja Moore: Sloppy Referencing and Plagiarism in Students’ Theses. Conclusions. IPPHEAE conference, Brno 12.-13.6.2013. 36 The Acatiimi journal has upheld critical discussion about the problems in controlling plagia- rism. Professor Helena Hurme (Åbo Akademi University) opened the debate about issues re- lated to rules for the process of recruiting postgraduate students in her article “Plagiaattoreiden sisäänmarssi”. Acatiimi 2/2012 (The entry of the plagiarists). Post-doctoral researcher Ari Tuononen draws attention to the poor functioning of detection processes in his Acatiimi article “Tohtoritehtailusta seuraa eettisiä ongelmia”, 1/2014 (Churning out of doctors results in ethical problems). Chairperson of the Finnish Advisory Board on Research Integrity, Chancellor Krista Varantola continues the discussion opened here, under the heading “Whose Acatiimi integrity, whose responsibility? 2/2014 The article by Tuomo Lappalainen, “517–2”. Suomen Kuvalehti 50/2013, 35–39, seeks justification for the incommensurability of deceit detection data in higher education institutions in Finland and Sweden, and the current role of the Finnish Advisory Board on Research Integrity. The need for comprehensible ethical guid- ance standards and development of the operational architecture is emphasised in Maarit Nie- melä's article: ”Vilpin entistä lyhyemmät jäljet” (The ever-stronger scent of deception). Tieteen tietotekniikka 4/2013, 15–17. 31

The productive use of PD systems cannot be yet be characterised as having reached a mature stage. Both the reliability and comparability of detection processes and detection systems should be enhanced. 5.1 Visibility of Finnish publications in reference databases Reference databases are a primary target of development as regards the relia- bility, usability and acceptability of the electronic detection of plagiarism. Ur- gent development needs are involved in the indexing of scientific publications published in Finland, and research and thesis materials produced in higher education institutions. In these content areas, the automated data collection and content acquisition model in search of volume benefits do not work. From the viewpoint of global PD service providers, Finnish materials are mar- ginal. In addition, the low number of students in higher education institutions in this country, particularly divided between several blocs of PD systems, un- dermines our importance as customers. Nationwide cooperation is called for in order to strengthen the role of Finnish higher education institutions as custom- ers. When preparing this status report, some pilot projects were organised in the autumn term of 2013 in order to examine the potential for active customer guidance. The structure and results of these pilot projects are described below. 5.1.1 Charting of content-related requirements The first measure was to chart the source material entities prioritised as cru- cial in various disciplines. Higher education institutions’ libraries hold moni- toring data concerning the use of electronic publications and the National Li- brary's FinELib service is responsible for their licensing. In addition to this highly generic usage data, information about the reference needs of teachers actively using PD systems was collected by higher education institution. The preparation group opened a continuous data collection website as the pi- loted service model , for teachers in universities and universities of applied science and maintainers of PD services to register their preferences for mate- rials to be indexed. If the content in question is subject to a FinELib licence, the request for indexing will be submitted at the latest when agreeing on the next licensing period with the publisher. PD service providers are also informed about the indexing requests. An established indexing request service could op- erate under the National Library's FinELib service. Results: For the time being, few teachers were able to point out materials whose ab- sence from the PD system references hampered the use of the detection sys- tem. In the future, data collection should be organised so that an indexing re- quest can be saved in a fixed address whenever necessary, as more experience is gained from using the systems. 32

5.1.2 PD as part of scientific publishers' publishing processes When the survey was conducted, the publishing sector was informed in many ways about the new plagiarism detection procedure of Finnish higher educa- tion institutions and the reference requirements it created. The FinELib ser- vice sent a questionnaire to Finnish scientific publishers concerning possible cooperation agreements concluded with PD service providers. Individual pub- lishers and publishing associations were contacted in connection with national book fairs during the autumn. Information about the key role of the new PD infrastructure in the quality work of Finnish higher education institutions was disseminated at partner meetings. The preparation group opened a briefing page about PD technology and its ac- ademic use on CSC's service platform as the piloted service model . A text template, complete in several language versions, is available for download from the page for presenting PD operations, whenever higher education insti- tutions propose the indexing of published materials to publishers. Results: The Finnish publishing sector's attitude is positive towards introducing elec- tronic originality checks. So far, systems have not been introduced to publish- ing processes, but the need for more specific information about usage process- es, reliability and financial benefits is high. The significance of personal con- tacts was emphasised at the piloting stage as no experiences were gained about the benefits of the advisory website. 5.1.3 A uniform indexing process for various service providers The short history of PD services in use proves that scientific material pub- lished in Finland – particularly material in the Finnish language – will not be indexed in reference databases without active customer guidance by higher education institutions. Service providers favour international publication por- tals as contract partners, because they offer a framework which facilitates mass indexing to be organised for several publication series with one prepara- tory phase. Published e-literature can only be included as reference material on a contract basis. Agreements require internationally valid contract models. The confirma- tion of a contract and the preparation of data transfers related to indexing in- volves a complex process in several stages. The fact that employees in the pub- lishers' editing and publishing chain must be trained to use a PD system in- creases the amount of work required, unless indexing is implemented as sub- sequent batch runs from a database for interim storage. Publishers must in- form their cooperation partners and customers about the indexing procedure, and establish it as a sub-process in the actual publishing process. Urkund has not published information about the number of its contract part- ners and volumes of reference materials. The service interface of iParadigms 33

includes a search function enabling customers (not students) to find out whether a publication series is already included in the indexed materials. Piloting service model: The third measure involved the piloting of two index- ing processes launched by university customers in order to analyse the burden involved in the process as a whole. Publishing portals with a broad-based con- tent offering and a dominant position in the Finnish publishing business in one branch of science were chosen as partners for the pilot. Meetings with publishing editors and technical staff were prepared for the purpose of charting the special characteristics of publishing series. Represent- atives of customers, service providers and publishers attended the meetings. Afterwards, the publisher and service provider were prepared to confirm a bi- lateral contract and implement an indexing procedure in line with the service 37 provider's process model . Pilots used the iParadigms database but it was simultaneously agreed that in accordance with a schedule to be specified later, all material processed in the pilots could also be indexed in Urkund's database. At first, a joint workshop is The service provider agrees with In future, the publisher will have Conclude a partnership agreement customer to introduce the PD Agree on the indexing indexing, and technical details. Establish operations That will facilitate real-time process arranged for the service the publisher on the data access to its own detection tools transfer schedule required for provider, publisher and as part of its publishing process. system and specify common indexing of materials. • Mass indexing allows for the processing of goals with regard to benefits. In the entire publishing history data in one go. For the purpose of mass indexing, a folder addition, the workflow is agreed structure is set up for the publisher to save on. new material in, e.g. by each quarter. • Data transfer secured for indexing takes place from this folder, not the basic system. • The customer's representative identifies the publisher's materials whose indexing is Urkund) can download the material from considered to be of primary importance, the same folder. After indexing, the folder is and informs about them. emptied. • The service provider handles the indexing agreement and procedure issues with the • Several service providers (e.g. Turnitin and publisher. • The publisher provides the service provider with information about the format of the material that requires indexing, and the addresses it is available at. Figure 4 (above): General process for agreeing on indexing. The pi- loted process is short but it re- quires 4 – 8 hours of work by each responsible actor. Work required for the intermediate storage and indexing of information is not in- cluded in this time estimate. Figure 5 (to the left): Turnitin process description for the tech- nical preparation of indexing.38 Results: Pilot projects started as request initiatives were sent by the Faculty of Medi- cine and Faculty of Law at the University of Turku to Duodecim Medical Publi- cations Ltd and Edita Publishing's Edilex portal. Duodecim's publishing series includes not only the Duodecim journal but a wide range of Finnish medical materials, medical textbooks, handbooks, practical guidebooks and journals. 37 Turnitin Process for adding content to Database. 38 Turnitin Acquisition Process. 34

Edilex, the most comprehensive digital legal information service in Finland, includes legislation and legal science publications from periodicals to high- quality theses. Both publishers' materials are utilised as study materials and source materials for theses. Both parties allocated human resources worth ap- proximately two working days, which was adequate for creating the basis for cooperation in indexing on a continuous basis. From the viewpoint of higher education institutions, it is important to secure equitable development preconditions for systems sharing the markets. In or- der to maintain genuine open competition for all PD system developers, higher education institutions and the publishing sector should agree on a procedure to ensure access to prioritised materials for all parties involved in indexing, on equal terms. On the other hand, it could be difficult to motivate publishers to adopt a model involving several parties in indexing, because recurring index- ing processes would result in an extra burden on them. 5.1.4 The use of theses and study attainments as reference data The key motive for first generation PD system acquisitions in Finland was to control the large-scale recycling of exercises on courses with a large number of participants. Even today, in the 2010s, other students' texts are a key source of plagiarism. Therefore, higher education institutions need to develop internal reference databases for exercises and unpublished manuscripts. Not only intellectual property rights but also higher education institutions' ar- chiving regulations that specify the statutory retention periods for each type of study attainment and document may constitute legal obstacles to the large- scale collection of study attainment based references, Material must be de- stroyed after the statutory retention period. Theses related to degrees are retained permanently but in most higher educa- tion institutions, long-term archiving takes place in paper format only. If there are no decisions concerning the possibility of a parallel electronic archiving system, theses should be deleted from the PD system reference database after a certain deadline. The retention periods specified for other study attainments are usually 1–2 years. Even if electronic retention were permitted, the general instructions specified in the AFP often conflict with the development logic of the PD service: plagiarism typically targets either the most recent or clearly older material. Many investigation processes in recent years have involved plagiarism related to materials dating back more than ten years. Analysis of the status of long-term electronic retention of theses in high- er education institutions and its practical implementation: As a basis for further measures, rules for the electronic publishing of bachelor's theses, mas- ter's theses, licentiate's theses and doctoral dissertations, saving formats and publication archives used, are analysed. In addition, address format practices (URL/DOI/other) and the saving formats used, and the validation of records will be charted. 35

Results: The analysis of theses will be completed in the spring of 2014. With regard to other study attainments, it can be stated that the need to allow for longer re- tention periods than the ones specified in AFPs is imminent – otherwise we will not be able to start the development of reference databases comprising peer-to-peer materials. The value of these materials that are not freely acces- sible is crucial for PD systems. The needs of plagiarism detection should be taken into account when determining the rules for AFPs regarding retention periods. On the other hand, the retention of materials may involve copyright- related problems. In order to avoid these, study attainments are not directly recorded in the PD system database in the University of Helsinki. Instead, they remain as reference for later checks only if the author actively selects a “per- manent” record status for the work. A cost-efficient model for the extensive use of theses as reference data could be a national repository such as the Slovakian CRTD thesis database (see p. 16), in which originality checks and content indexing are integral to the pub- lishing process. If necessary, several PD software tools can be concatenated in a centralised publishing process. The parallel publishing of scientific articles can be legitimised through statutory actions. Otherwise, restrictions of parallel publishing could undermine the quality of national reference data. In Slovakia, materials are retained in the thesis database for 70 years. Benchmarking of the Slovak model could support the development of reference databases in Fin- land. 36

5.1.5 Summary of measures for enhancing the quality of reference data in Finland Active development of PD system customership • A consortium of PD customers is established as part of the cooperation between IT, stu- dent affairs and library administration. A customer representing 227,424 FTE students (2012) is a strong coordinator when targeting the development of the service. A uniform indexing model for service providers • A uniform indexing process developed with publishers is applied to different service pro- viders. The procedure enhances the efficiency in implementing indexing projects and promotes the equitable development of reference databases in higher education institu- tions' PD systems. National PD information website • A Finnish PD information page is launched. A consortium of PD system customers is re- sponsible for its maintenance. • The website serves as a source of information about the quality procedures of Finnish higher education institutions and the goals of PD activities, indexing requests concerning material entities (accumulated as an online service) and a joint indexing model shared by • service providers with Finnish publishers. The website channels regular newsflow between different developing communities. Development programme for open publication archives and parallel publishing • The recording procedures of theses in open publication archives are harmonised. The parallel publishing of research reports in higher education institutions' publishing ar- • chives is supported. The possibilities for using electronic recordings of study attainments as reference materi- als for PD systems longer than specified in AFPs today is examined. This practice would • enhance the accuracy and impact of originality checks. As part of this project, Slovakia's centralised CRDT service model is benchmarked. 5.2 Factors undermining the reliability and usability of PD systems PD systems offer good standards of general usability and usage stability. How- ever, the reliability and user path management of originality checks involve potential error factors that can be prevented by way of operating recommen- dations and the appropriate development policy. 5.2.1 Partial indexing and exclusive licensing There is room for development in the way reference material content indexing is implemented and described. User experiences include cases in which only a part of publication contents, indicated as part of an extensive reference re- source, serve as active references. In database entities of published publica- tions, this may have referred to the indexing of a text abstract while the actual monograph or article remains outside indexing, due to contracts regulating user rights, and possibly a locked saving format. 37

The content of reference databases is a competitive asset with a more broad- based impact for PD service providers than the technological implementation of the service. The use of exclusive contracts in the indexing of materials that are most important in terms of competition can be regarded as a factor that distorts markets. Such procedures undermine the preconditions of open com- petition and active product development. Finnish higher education institu- tions' operating policy should only favour indexing and licensing solutions that are open to several actors . 5.2.2 Invalid file formats The Finnish practice involves the checking of all parts of academic theses, in- cluding those that have previously passed a peer review, and published arti- cles, using a PD system as part of the overall evaluation. The PD check of electronic publications involves the potential of technical er- rors. The PDF file format, a highly common underlying feature for these, covers a number of different formats for saving files, all in compliance with the ISO standard. Some of the PDF printer drivers on the market produce Image 4: A PD system check report view broken by a faulty PDF file. In this case, the LaTeX editor has pro- duced a PDF that fragments the re- port view in Turnitin and makes it unusable. The same document pre- vented reporting in Urkund. Instruc- tions for LaTeX PDF management are provided in appendix 1. files that are only partly saved in line with the standard. These can be opened with web browsers or readers for on-screen reading but do not support fault- less text detection. Appendix 1 includes a summary of problems resulting from file formats and possible ways of resolving these. 5.2.3 Manipulated records Deliberately manipulated files constitute another group of problems, compa- rable with detection errors caused by PDF files. The open Internet includes ar- ticles, videos and “hate sites” that offer different ways for misleading PD sys- tems. Many of these approaches are so arduous and easy to identify that they do not save time for anyone attempting to cheat. However, some hidden coding pro- cedures are difficult to detect and are not visible as changes in the document's word count or file size – but usually slightly change the character count. The ability to filter error code varies by PD software, depending on the coding method. Without a broad-based series of tests it is not possible to assess the 38

general vulnerability of Urkund and Turnitin detection algorithms. A small- scale test proved, however, that hidden codes can influence detection results 39 by up to several dozens of per cents. More attention should be paid to the advance validation of texts checked, even in this respect. 5.2.4 Poor quality of language versions in user interfaces and instructions PD system user interfaces, pop-up instructions of various features, form tem- plates and corresponding parts of the user interface integrated with a learning platform include text in different language versions which involves semantic problems. Users do not always know what the instructions refer to and what should be done on their basis. The user interfaces of PD systems linked to the electronic infrastructure of student affairs administration should use good standard language and established, relevant vocabulary. – Finnish student af- fairs administration vocabulary was largely standardised in connection with the OKSA process coordinated by the Ministry of Education and Culture, and the development and maintenance of the National Research Data Initiative co- ordinated by CSC. 5.2.5 Operational differences between direct use and integrated use Integration with the e-leaning platform enhances the efficiency of the daily use of a PD system in the checking and guidance of various study attainments. In- teractive coursework emphasises the benefits of a PD system in practising source criticism and even as a tool for searching new source material. For the guidance of exercises and theses, PD systems can provide annotation tools that are more advanced and agile than the original tools of the e-learning plat- forms. However, the link with the e-learning platform involves challenges. Many teachers and researchers working in higher education institutions do not use e-learning platforms for distributing study materials and guiding students. The introduction of a PD system involves a more dramatic change in operating cul- ture for this target group than for teachers who have used e-learning platforms for a longer time. If the introduction of an integrated PD system involves hav- ing to learn how to use an e-learning platform and the PD system simultane- ously, the change in work routines is significant. In such cases, active attention must be paid to sufficient local support. The threshold for introducing new systems can be lowered in a number of ways. For instance, the University of Turku offers users a start-up package: when you order a new Moodle course, you gain access to Turnitin exercises, 39 The test was carried out on 21 November 2013 in cooperation between Haaga-Helia Univer- sity of Applied Sciences and the University of Turku. The test material was an article published on Wikipedia, The text was copied into a Word docu- ment and the specification “Extract from Wikipedia” was added to the beginning. As expected, Turnitin gave the equivalence as 99% and Urkund 98% – the difference was not significant. After that, hidden code was included in the same text. The number of characters increased a little while the word count remained unchanged. A repeated equivalence check decreased Turnitin's result by 2% (the result was influenced by one word being hyphenated and not rec- ognised as the original), and in Urkund, 30% of equivalence disappeared. 39

complete with the parameters recommended by the university. The introduc- tion of this entity has proven painless and technical consulting has focused on producing customised solutions to meet special needs. Image 5: Turnitin and Urkund are in- tegrated side-by- side in Hanken School of Econom- ics' e-learning plat- form Moodle. User interface paths are identical in the course view. Users can choose the tool they need. The e-learning platform must be able to accommodate different solutions for distributing information. This flexibility may also facilitate operating methods that are problematic in terms of data security and privacy protection. The safe usage of PD systems is based on user paths that are strictly specified in ac- cordance with user roles. In an integrated environment, these two operating methods, the flexible and the regulated, may conflict. Section 4.3 describes an example of a role man- agement problem due to the use of metalinks in Moodle. The difference be- tween free and regulated operating methods is obvious in the storage practic- es of materials: PD systems apply high level protection to the management of checked documents, while duplicates saved in e-learning platforms remain in the databases “for the time being” and without equally strong protection against security breaches, for example. Because an integrated operating environment not only involves more features for PD system user interfaces but also increases complexity and the number of data transfer interfaces that slow down operations, some teachers prefer using a PD system as a separate service without interim layers. In this case, the crite- ria are fast and uncomplicated use. However, an alternative is necessary. This is proven by Hanken School of Economics' plan to introduce iThenticate along- side Turnitin, for instance. It uses the same reference database but lacks Turnitin's e-learning platform features, making it a very quick and straight- forward analysis tool. The decision to choose either integrated or direct use is reflected in the role allocated to the PD system as one of the support tools of teaching and research. Integrated use is often linked to favouring a pedagogic approach to guidance, whereas direct use is linked to optimising speed in the checking of the final versions of study attainment texts. 40

5.2.6 Timing of originality check influences the thesis verification and publishing process The question of the “best” time for an originality check of a thesis involves as- pects related to pedagogics, process management and risk management. The procedure in which the originality of a thesis is checked only when a complet- ed thesis that passed the approval process is saved in the higher education in- stitutions archive of publications involves high risks. – If a thesis is proven as plagiarism at this stage, the maximum amount of work by the student, supervi- sor and student affairs administration is wasted. Sanctions are unavoidable and corrective measures must be taken through the management of change related to decisions. The recommended approach is to have the student submit the manuscript to intermediate checks at different stages. In this case, the risk is low for a manu- script submitted to evaluation in the faculty which includes erroneous or de- ceitful use of source material. When the check is carried out as part of the work supervision process, the provision and processing of guiding feedback is flexi- ble and the threshold for responding can be low. The guidance process can be weaved into the phases of electronic publishing using the principle of a “conditionally completed” publication, which involves the work proceeding to preliminary inspection for assessment at the same time the work supervisor performs a PD check on it. Graph 6: The originality check of a thesis as part of the evaluation, approval and publishing pro- cess. Process model by the University of Tampere. Description: S. Hautakangas & P. Kytöharju. 41

5.2.6 Summary of the management of problems undermining reliability and usability Verifying the integrity and quality of indexing • A recommended procedure is enclosed with manuscripts sent to international scientific publishers, stating that if the text is indexed in a reference database, then the entire publi- • cation must be indexed. This procedure can reduce the impact of source distortion created when the records of • actual publication texts are locked and only abstract texts are indexed. Prepare a recommendation of a notification enclosed with manuscripts for further pro- cessing e.g. by the working committees of Arene and Unifi. Joint saving of Finnish publication materials in PD systems • Promote the indexing of published materials by discipline and prepare every round of mass indexing for inclusion in both Turnitin and Urkund. • This procedure forms part of the national development programme of PD system ac- counts. Harmonisation of user interface vocabulary • Harmonise terminology used in PD system interfaces and instructions to reduce the work- load of online support and facilitate the production of general procedural guidelines. • Service providers implement updates of user interfaces in Finnish and Swedish. Comparison of e-learning platform integrations • Benchmark Turnitin and Urkund integrations with e-learning platforms to the extent nec- essary. Prioritise user interface development needs and inform service providers of them. User management instructions • Prepare joint instructions for usage rules of two registration paths in the higher education institutions that allow two parallel user management solutions. 42

PART 2: ACTIVITIES Graph 7: Procedures in Case of Academic Fraud at the University of Turku. 43

6. COMMON BASIC CONCEPTS IN PLAGIARISM MANAGEMENT Kari Silpiö The use of electronic detection of plagiarism at all stages of higher education studies has created the need to expand and specify the vocabulary for deceitfully produced study attainments. Without specific terminology in work concepts, in- structions cannot be provided for the processes of using PD systems so as to facil- itate the equal assessment of study attainments in order to reach the correct conclusions. The comparison of higher education institutions' instructions for processing cas- es of cheating, carried out as a background study for this report, we detected variations in the way the basic concepts of ethical usage of information are ap- plied. Variation is natural, because these basic concepts of research ethics are used for structuring different phenomena in different contexts. This chapter ex- plains which application instructions and subsidiary concepts are required for the basic concepts of the Finnish Advisory Board on Research Integrity guide- lines. Responsible conduct of research and procedures for handling allegations of misconduct in Finland when applied (as all higher education institutions in Fin- land do) not only in contexts of research, but also in teaching and general admin- istration. This chapter describes the links, definitions and significance of key concepts in controlling plagiarism in a way that we believe supports the parties responsible for the preparation and updating of ethical guidelines in universities and univer- sities of applied science. 6.1 Background and basis for definition Literature emphasises the diversity of plagiarism as a concept and its depend- 40 ence on different contexts of interpretation. The content of the concept of plagiarism is defined through the chosen perspective and the related goals. Even if the standard dictionary definition of plagiarism seems clear, no ap- proved common definition of plagiarism and its various forms exists or is uni- formly understood in the academic sector. Underlying factors include the writ- ing traditions in different cultures and differences of emphasis specific to each discipline. Moreover, plagiarism is considered to appear in the form of various acts that characteristically differ from each other, and plagiarism may involve not only texts but other types of output as well. This chapter analyses and defines the concept of plagiarism specifically in the higher education context . The basis for definition and the starting point for the text in this chapter are the master's thesis report of the chapter's author, pub- lished in 2012 and the presentation material of the seminar Control of Plagia- rism and electronic detection of plagiarism in higher education institutions414243 . 40 For example, Carroll 2002; Whitley & Keith-Spiegel 2002; Mäkinen 2006; Surakka 2003; Sutherland-Smith 2008. 41 In some text sections herein, the author cites his aforementioned texts word-for-word. 42 Silpiö, K. 2012. Opiskeluvilppi ja plagiointi korkeakoulujen opintosuorituksissa. Kirjallisuus- katsaus ja käsiteanalyysi. Tampereen yliopisto. Pro gradu -tutkielma. 44

As background to defining the concept of plagiarism, the basic principles of ethical writing and permissible citation are analysed first. Thereafter, the defi- nition of the concept of plagiarism is structured by analysing the following three perspectives for the purpose of studying the concept: the general per- spective, the perspective of research and the perspective of teaching . The actual definition of the concept focuses on the perspective of teaching in particular to produce instructions for application and subsidiary concepts to complement the guidelines drawn up by the Finnish Advisory Board on Research Integrity. The main concepts in concept definition are plagiarism in a study attainment and plagiarism as misconduct. student cheating The concept of is also defined to the extent necessary to facilitate the definition of the main concepts. 6.2 Ethical writing and permissible citation 44 According to Roig , an implicit contract underlying ethical writing allows the reader to assume the following: (a) that the author is the sole originator of the text, (b) that any ideas and texts borrowed from others are clearly identified as such by way of appropriate references, and (c) that the aim has been to include any indirect citations without distorting the factual content of the original. References required by ethical writing are included in a systematic manner in accordance with the rules of a generally known reference system45 . The basic elements of a reference system are a bibliography and references included in the text. The systematic use of a reference system requires the following: • All direct and indirect quotes are indicated with references • References are marked with the required annotation. • The bibliography is compiled in accordance with requirements. • All sources mentioned in references are included in the bibliography , 46 • The text includes references to all sources mentioned in the bibliography. 47 Permissible citation requires that • all quotes are marked with the appropriate reference • the reference indicates the extent of the quote so that it is distinguishable • from the author's own thinking/knowledge in the case of a direct quote included without permission by the author of the source, the right to quote applies. 43 Silpiö, K. 2013. Plagiointivilppi Suomen korkeakouluissa. Miten laajasta tutkitusta ilmiöstä keskustelemme? Työkäsitteiden määrittelyä. 44 Roig 2006. 45 Depending on the branch of science and/or publisher, the reference system used is either one in which references are given as numbers or one in which the reference mentions the au- thors, year of publication and page numbers in brackets. 46 or, depending on the reference system used, in footnotes. 47 Permission given by the original author does not as such suffice to meet the criteria of the concept “permissible citation”. 45

In accordance with the right to quote , a direct quote may be included in the text without the author's permission if all of the following conditions are met: • The work used as source is published (made available to the public by • permission of the author). The quote is made in accordance with good practice, which requires that (a) the quote is clearly separated from the personal rendering of the per- son making the quote, (b) the source is mentioned in the appropriate man- ner, and (c) the contents of the quote are acceptably connected to the per- • sonal rendering of the person making the quote. The quote is only made to the extent required for the purpose. Basic problems related to deficient references to sources include the following: • poorer quality of the author's own presentation • distortion of information (as one's own and quoted information become • mixed) violation of copyright (depends on the case) • plagiarism (when a quote is presented in one's own name) . 48 6.3 Plagiarism in a study attainment and plagiarism as misconduct 6.3.1 Analysis of the study of the concept of plagiarism In defining the concept of plagiarism for the purpose of this study report, the following three perspectives were specified for analysing the concept: the gen- eral perspective, the perspective of research and the perspective of teaching . Graph 8: Three perspectives for examining the concept of plagiarism. 48 Specific harm caused by plagiarism is analysed in the source Silpiö (2012, 93). 46

The general perspective focuses on the goals, common for all perspectives, of respecting authorship and copyright. The following dictionary definitions by 49 the Institute for the Languages of Finland, Kielitoimisto , represent the gen- eral perspective: plagiarise to present another author's text, music or other such artistic or sci- entific production as one's own, to steal plagiarism a literary or artistic theft, a plagiarised piece of work; a work or part of it based on such Note about the terms “plagiarism” and “a plagiarised piece of work”: that they are not legal terms used in Finnish legislation. They are not present in the Criminal Code, Copyright Act, Universities Act or Polytechnics Act in Finland. Legal terminology defines “theft” as an act that involves movable property. From the viewpoint of copyright, plagiarism may expressly involve a violation of intellectual property rights. In the narrowest sense of the word, a plagiarised piece of work may comprise 50 one quote without quotation marks. In relation to this, Mäkinen , for example, emphasises that plagiarism limited to one sentence is not permitted any more than plagiarism that comprises more extensive entities of text. The perspective of research places particular emphasis on the production of 51 scientific information and scientific writing. In the RCR guidelines drawn up by the Finnish Advisory Board on Research Integrity and prepared for the purposes of scientific research52 , plagiarism is classified as one of the four sub- categories of research misconduct . The others are misappropriation, fabrica- tion and falsification (misrepresentation). The following definition in the RCR guidelines represents the perspective of research in defining plagiarism: Plagiarism , or unacknowledged borrowing, refers to representing another person’s material as one’s own without appropriate references. This includes research plans, manuscripts, articles, other texts or parts of them, visual mate- rials, or translations. Plagiarism includes direct copying as well as adapted copying. In the RCR guidelines, representing another person's unpublished material as one's own is included in the subcategory of misappropriation as follows: Misappropriation refers to the unlawful presentation of another person’s re- sult, idea, plan, observation or data as one’s own research. 49 Kielitoimiston sanakirja 2006. 50 Mäkinen 2006, 160–161. 51 Finnish Advisory Board on Research Integrity 2012. Responsible conduct of research and procedures for handling allegations of misconduct in Finland. 52 Suspected violations of RCR in scientific theses are also examined in line with the RCR guide- lines from higher university degrees upwards. In addition, the RCR guidelines apply to the work of teachers and teaching materials. The guidelines oblige higher education institutions to teach good scientific practices and research ethics in both basic and further university educa- tion. However, the RCR guidelines were not prepared for the assessment of actual study at- tainments of basic university degrees. 47

In the RCR guideline, self-plagiarism is classified as part of the concept of “ dis- regard for the responsible conduct of research 53 ” and defined as follows: “pub- lishing the same research results multiple times, ostensibly as new and novel results (redundant publication, also referred to as self-plagiarism).” The concept of authorship in the scientific community and research work can be considered to have more stringent requirements than the legal concept of authorship, because in principle, legal copyright is limited only to the produc- 54 tion of the text of an article. The perspective of teaching learning, the assess- places particular emphasis on ment of learning and competence, and the assessment of study attainments55 . If a student has not performed the work he or she represents as his or her own, the evaluation does not in this respect target the personal learning and compe- tence of the student in question. Theses constitute a special intersection for the perspectives of teaching and research. The teaching perspective can be regarded as involving the strictest require- ments of all of the three perspectives studied above, because it covers all types of output represented by the student for the purpose of assessing the study at- 56 tainment and in addition to reviewing the output, it emphasises the question of who performed the actual work resulting in the output presented for assess- ment . 6.3.2 Definition of the concept of plagiarism in a study attainment The definition of the concept “plagiarism in a study attainment”57 is based on the needs of the process of teaching–studying–learning in particular. When a student represents work produced by another as his or her own, the assessor of a study attainment is misled as to whose learning and competence is being assessed and who produced the work on the basis of which he or she decides on whether to approve the study attainment and which grade to award. The competence objectives of ethical writing and good scientific practice are influ- ential in the background in addition to the goals that are specific to the study attainment in question. 53 “Disregard for the responsible conduct of research manifests itself as gross negligence and carelessness during the research process.” 54 Risteli 2001. 55 The term “assessment” as used in this chapter covers both diagnostic, formative and sum- mative assessment and the assessment of study attainment as referred to in the Polytechnics Act and the Universities Act. The aim of diagnostic assessment is to determine the initial level of a student. Formative assessment is the continuous assessment during the study attainment, the goals of which include motivating the student and steering the learning processes towards the learning objectives. The aim of summative assessment is to find out what a student has learned. This information is used in the assessment of study attainment when deciding on the approval of and grade awarded to the study attainment. 56 Even such minor individual exercises like mathematical problems, the answers to which re- quire a schematic approach, are included. 57 Student plagiarism” “ is the corresponding term generally used in English. 48

In this report, the concept of plagiarism in a study attainment is defined as fol- lows: plagiarism in a study attainment 1) the intentional or unintentional representation of work performed by another as one's own in a manner that misleads the assessor of the study attainment, 2) representing the same work for assessment as part of more than one study attainment in a manner that misleads the assessor of the study at- tainment. The expression “ work performed by another \" expands the number of sources related to plagiarism in a study attainment beyond scientific and artistic pro- duction and it does not require the source to be of a standard that exceeds the threshold of work. The expression “ intentional or unintentional \" is added to the definition to emphasise that the act considered as plagiarism can also take place unintentionally. Even unintentional plagiarism produces a plagiarised piece of work . The expression “ in a manner that misleads the assessor of the study attainment \" is intended to emphasise the fact that even a careful assessor of a study at- tainment may end up with an incorrect notion of the authorship of a work or part of a work represented by a student for assessment. It also excludes situa- tions in which an unfinished intermediate version of work is assessed or the assessor of a study attainment otherwise finds the presented work to be so in- complete that it is not subjected to more detailed review before the student has completed it to a sufficient degree, for example. In addition, the assessor of 58 a study attainment may consider the context . The text of an exercise written by a novice (which includes errors) and that of a completed doctoral disserta- tion (must be precisely in order) are to be read and evaluated in different ways, The definition of the concept “plagiarism in a study attainment” also includes so-called self-plagiarism59 as follows: “representing the same work for assess- ment as part of more than one study attainment in a manner that misleads the assessor of the study attainment. ” The principle followed in higher education institutions allows students to use the same work in more than one study at- tainment only subject to a case-by-case agreement with the assessor of the study attainment on the possible way of utilising the work. Even when quoting one's own work with permission, the origin of the quote must be indicated. In studying the scope of the concept of plagiarism, self-plagiarism is a border concept . As such, it is not compatible with the highly general definitions of the concept of plagiarism. However, it is often included in the scope of the concept of plagiarism in international reference literature, particularly in the context of higher education. It is a matter related to the utilisation of sources and refer- ring to them. In the case of self-plagiarism, the original source is not used in 58 Such as the initial skills of a student group and the requirements applicable to each study attainment and exercise. 59 The Finnish term “itsensä plagiointi\" is the Finnish-language equivalent recommended by the Institute for the Languages of Finland for the term “self-plagiarism.\" 49

the appropriate manner. The inclusion of self-plagiarism in the scope of the concept of plagiarism in a study attainment emphasises the importance of un- derstanding the issue and compliance with good scientific practice, even when utilising one's own previous work. 6.3.3 Definition of the concept of student cheating 60 The New Dictionary of Modern FInnish defines the Finnish word “vilppi\" as follows: vilppi (cheating) a dishonest means or act in particular to mislead someone, diversion, deceit. Cheat, engage in deceit in a test. Cheat someone. 61 In this report, the concept of student cheating related to study attainment is defined as follows: student cheating a dishonest method or act in order to mislead the assessor of a study attainment In the case of student cheating, the aim of misleading the assessor of the study attainment is to gain benefit for oneself or another student related to the as- sessment of the study attainment. In this report, student cheating also includes attempts at student cheating. 62 Some acts regarded as student cheating can be unambiguously considered as intentional, including cheating in an exam, taking an exam on behalf of another student, falsifying attendance and representing an answer to an exercise prob- lem or assignment performed by another student in one's own name. 6.3.4 Definition of the concept of plagiarism as misconduct Regardless of the fact that plagiarism is not acceptable in any form, not all pla- giarism in study attainments constitutes student cheating. In order to clarify the terminology related to discussing this issue, the concept of “ plagiarism as misconduct” is introduced in this report. This concept is limited in scope to ap- ply only to plagiarism regarded as student cheating. intentional plagiarism aggravated disre- and plagiarism resulting from Both are classified as plagiarism as misconduct , excluding plagiarism resulting gard from minor negligence, plagiarism resulting from a justified misunderstanding and plagiarism resulting from the lack of skills and knowledge exceeding the 63 minimum requirements specified for a study attainment, for example . The examination of an act meeting the characteristics of plagiarism involves the assessment of the author's intent and compliance with the obligation to act with due care. Some acts regarded as plagiarism can be unambiguously con- 60 New Dictionary of Modern Finnish - Kielitoimiston sanakirja 2006. 61 student cheating” is the corresponding term generally used in English. “ 62 Acts regarded as student cheating are listed in the source Silpiö (2012, 46–49). 63 In the case of a novice, the minimum required level may involve mentioning of the source, for example, while the use of the technical details of a reference system is being practised. 50

Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook