Back to Hannah's C&IT in the curriculum project page.

The use of electronic tools to support plagiarism detection

Contents

Introduction

Some students cheat. The aim of any plagiarism detection process is to ensure that copied material is detected, and that no student is unfairly accused of copying. Software exists to help achieve this aim, but (obviously) no system is perfect, and no fully automatic plagiarism detection system exists that will identify all and only cheaters.

Before we start to discuss the ways in which computers can be used to help identify students who have plagiarised, it is worth defining a few terms. Usually, when marking students' work, we start with a collection of submissions. These form a corpus or body of texts, that may or may not contain work that we would consider to be plagiarised. The two main ways in which students could 'copy' work are either from another student, or from a source outside the corpus of submissions. Those who copy from each other are difficult to distinguish from those who collaborate with each other. Excessive collaboration is often considered to be a more minor offense than outright plagiarism. A case of plagiarism where one student has copied from another in the same submission is known as intra-corpal plagiarism, and an instance of plagiarism where a student has copied material from an external source (book, journal article, world-wide-web) is known as extra-corpal plagiarism.

It is also worth noting the types of technique that students use to conceal copying. These include...

A model of the plagiarism detection process

Fintan Culwin of South Bank University is one of the leaders in the field of electronic plagiarism detection. He has proposed a four stage model for the process of plagiarism detection which provides a useful framework for considering the problem.

The first stage of the process is collection. Before we can start to consider whether or not work has been copied, it has to have been submitted: we have to have a copy of the coursework to inspect. In most places in Leeds, work is submitted on paper. However, before electronic plagiarism detection software can be used we clearly need to have an electronic copy of the work. With most schools requiring work to be word-processed, it is safe to assume that electronic copies exist somewhere. However, methods for acquiring these vary in their reliability, cost and efficiency.

The second stage of the process is analysis. Traditionally this is done by the marker - when someone marking coursework notices similarities between two or more papers, or notices changes in writing style or work uncharacteristic for the student, alarm bells ring. There now exist electronic systems that can automatically compare courseworks both within a class (intra-corpal) and with documents on the web or stored in other systems (extra-corpal). This stage of the plagiarism detection process can now be carried out entirely automatically for many types of assignment.

The third stage is confirmation. Once a piece of work has been detected as being suspicious, there needs to be a confirmation stage. This has to involve the input of teaching staff, as what occurs here is confirmation of the offence - a decision as to whether or not plagiarism has taken place. Typically this will involve comparing the students' work with the "original" document, and also finding the "original" document (if the plagiarism is extra-corpal). Electronic tools exist to assist with this process, but it obviously is not possible to automate it fully.

Finally, there is the investigation stage. This is what happens once a case of plagiarism has been detected and is determined by University rules - electronic tools cannot assist at this point except to provide evidence.

Electronic Submission facilities: Stage One

If computer-based systems are going to be used to detect plagiarism then electronic copies of the work are required. However, although such tools exist there are not any available at Leeds, except a home-grown system in use within computing to automatically detect similarities between computer programming assignments. Thus, the use of systems to perform a blanket analysis on a corpus of student submissions is not really an option.

However, systems exist that can assist in the identification of extra-corpal plagiarism (once a piece of work has been detected as a suspicious submission) and intra-corpal plagiarism on a small scale (if there are significant similarities between two pieces of work then there are tools that can help determine the extent).

Given these considerations, the decision to require submission in electronic form is not a straightforward one. There are several possible ways of doing this here at Leeds:

SystemProsCons
The Nathan Bodington Pigeonholes
  • Quite easy to set up
  • If staff and students already use the Nathan Bod., it is a familiar setting
  • Files kept out-of-the-way on Nathan Bod. machine
  • If staff and students don't already use the Nathan Bod., may require time to teach the students what to do
  • Files not particularly easy to download in bulk
Electronic mail attachments
  • Very easy for students to deal with
  • With mail-filtering, work can end up in a folder for easy access
  • Unweildy, especially for large numbers of students
  • Filters not easy to set up
The Lizard Learning Network
  • Easy to use web interface for staff and students
  • Students' work easy to access either individually or in bulk
  • New software
  • Requires a server to host the system

Even if you do not collect electronic copies of student work in bulk it is possible to use some of the tools described here - for small sections of text that arouse suspicion simply typing them in is a possibility. For larger sections of text, scanning the document can be a relatively fast way of obtaining electronic copy.

Analysis: stage two

Although tools exist that can run on an entire submission corpus and identify suspicious cases, these are not (yet!) available in Leeds. There are commercial packages which offer this facility, but the cost of these is prohibitive. Anyone interested in the mass processing of documents might want to consider Plagiarism.org's http://www.turnitin.com. This comes quite highly recommended and allows the uploading of a class of texts, reporting back around 24 hours later on any similarities found. It also offers a free trial which allows you to process 5 documents.

Therefore, we are stuck with the traditional method for stage two: using a human being. There are a few things it is worth bearing in mind whilst performing this task - hints about whether or not the work has been copied. American spelling (especially in one section and not others), use of strange vocabulary, and in some cases even changes in font can all indicate that work has been copied. But the most reliable indicator seems to be change in style of writing, and this is something that computers are not yet very good at spotting.

Confirmation: stage three

Where computers can really be useful is for the evidence gathering that accompanies the confirmation stage, especially in cases where work has been copied from the world wide web. Various systems will tell you what percentage of the document can be found elsewhere, and also allow you to view the original document alongside the student's "work".

The easiest of these to use is findsame, http://www.findsame.com. Most search engines search on keywords - you enter a word or two and they search their database for pages which contain those words. Findsame however searches on content, and it claims to match fragments of text longer than about a line. Where Findsame really stands out is in the design of its interface. The opening page lets you enter the text for which you are searching, which can be anything from a paragraph up to, well, I am not sure of the upper limit - it can definitely handle 30 pages. Once the student's work has been pasted in, you click on 'search' and are presented with a modified version of the document. The search takes very little time - around three minutes for thirty pages - and is instantaneous for small fragments of text like a paragraph.

The exceptionally useful feature of Findsame is at the very bottom of this page. Each document that it has found a "match" for appears as a hyperlink with the option of viewing "side-by-side". This allows you to look at the original and the student's work next to each other and compare.

Unfortunately Findsame does not find absolutely every match (I suspect its database is not as extensive as that of other search engines) so finding no matches with Findsame does not imply that no matches exist. If there is a short sentence or fragment of text that you think is quite unique, you can try Google. Google is a search engine with a very extensive database - and it can search on keywords, or for short phrases. Their advanced search page is at http://www.google.com/advanced_search. This can be used to search for exact phrase matches, but it cannot be used with long documents or whole paragraphs in the same way that Findsame can.

Networked databases available through the library web site http://www.leeds.ac.uk/ROADS/database.htm can help track down cases of plagiarism where the source is not the world wide web. These are subject specific so I will not discuss them at length, but most allow for keyword searching and can return the abstracts of papers, which would then have to be looked at in the library itself (unless the library subscribes to an electronic version of the journal)

Resources: links to relevant websites

References, Bibliography and suggestions for further reading

"Plagiarism in natural and programming languages: an overview of current tools and technologies" Clough, P., (2000) Department of Computer Science, University of Sheffield

"Plagiarism prevention and detection", "Towards an error free plagiarism detection process", pre-publication papers, Culwin, F., and Lancaster, T., (2000) South Bank University.

"A review of electronic services for plagiarism detection in student submissions" Culwin, F., Lancaster, T., (2000) LTSN-ICS conference.

"Undergraduate Cheating: Who does what and why?" Franklin-Stokes, A., and Newstead, S. E., (1995) Studies in Higher Education Vol. 20, No. 2

"Individual Differences in Student Cheating" Newstead, S. E., Franklin-Stokes, A., Armstead, P., (1996) J. Educational Psychology Vol. 88, No. 2