|
|
|
Diversifying Assessment 2: Setting standards Section 2: Setting standards
Reproduced with permission from Brown, S. Rust, C. and Gibbs, G. Strategies for Diversifying Assessment in Higher Education Oxford: Oxford Centre for Staff Development (1994)
If assessment is to achieve any of the functions listed in the previous section, it is vital that it has validity. Whether it is intended as feedback to students on their progress or as part of the grading process for a final qualification, common standards must be applied. However, this is easier said than done; assessment is by no means a pure science. Nevertheless, marking - of essays and exams in particular - is something virtually all academics have to do and most rapidly gain a good deal of experience in it. The marking exercise below draws on this experience.
2.1 An essay marking exercise
Assume that the two essays reproduced below were written by students on an introductory course in technology and its social implications. Mark each of them out of 10 (with 4 as a pass mark). In the spaces indicated at the end of each essay make comments on their strengths and weaknesses, and give advice to the students on how to improve their essays .
Assess the noise pollution problems caused by Concorde around airports.
| Answer 1
The sound limit at Kennedy airport, New York, is 112 PNdB*, and at Heathrow, London, 110 PNdB. The manufacturers of Concorde (Sud-Aviation and the British Aircraft Corporation) have promised that Concorde will range between 104 and 108 PNdB, depending on its weight at take-off.
At the start of Concorde operations at Heathrow, 21 of the first 35 departures exceeded 110 PNdB, and in the first eight months of operations 72% of the 97 departures exceeded 110 PNdB. Overall in 1976 there were 109 infringements of Heathrow's limit by Concorde. These measurements of Concorde were about 7 PNdB lower than during its early endurance trials. At the same time there were 1,941 infringements by subsonic jets. Concorde rarely features in the list of the ten noisiest take-offs each month at Heathrow, 3 and subsonic aircraft at Kennedy have been recorded at 121 PNdB - twice the limit.
At Dulles airport, Washington, Concorde has averaged 119.9 PNdB at take-off and 117.8 PNdB on landing. This is 12-13 PNdB higher than the averages for subsonic aircraft. The noise levels have been j going down, and with them, the number of complaints. In September 1976 the average level was 121.3 - PNdB and there were 186 complaints (29 of these to one take-off). In October the average was 117.4 PNdB and there were 101 complaints. During this time polls of opinion concerning Concorde's trial period at Dulles showed an initial opposition of 36.9% drop to 26.2%. In New York, opposition to Concorde landing at Kennedy has dropped from 63% in January 1976 to 53% in April 1977.
While 500,000 people are affected by aircraft noise in Washington, 2,000,000 are affected at Kennedy. It has been estimated that 40,000 extra people will be affected by noise if 80 Concordes serve 12 US cities. This represents a 1% increase. Bumps in the runway at Kennedy force Concorde to take off closer to heavily populated areas, but due to advanced flight control characteristics Concorde can begin to bank at an altitude of 100 ft. compared with an average of 480 n. for subsonic aircraft, and so can turn away from heavily populated areas sooner after take-off.
*PNdB means Perceived Noise Decibels - a logarithmic scale of noise |
Strengths:
Weaknesses:
How to improve your essay:
Mark out of 10: |
| Answer 2
Opposition to Concorde based on arguments concerning noise pollution takes two main themes. The first is concerned with the 'sonic boom' - a phenomenon of supersonic flight unique to Concorde amongst commercial aircraft. The second is concerned with noise levels around airports caused during take-off and landing. This second theme is common to all aircraft, and the issue at stake is whether Concorde is significantly noisier than subsonic aircraft.
Comparisons with other aircraft are complicated by the changing nature of jet fleets. Early jet aircraft (e.g. the DC8 and 707) used turbojet engines, and whilst these have been quietened, they are much noisier than second-generation fan-jet engine aircraft (e.g. DCl0 and Jumbo 747). Eventually those older aircraft will be phased out, but at the moment Concorde is being compared with them.
There are also problems of measurement. Objective measures (meters giving a reading in decibels) cannot give any impression of 'shrillness' or subjectively experienced nuisance. An aircraft giving higher decibel readings may not be experienced as 'noisier' by someone hearing it take off.
Subjective measures also involve problems such as 'noise', which is a multi-faceted phenomenon, and different people use different criteria in assessing it. There are dangers, also, in questionnaire surveys of reactions of people living around airports. Average ratings of 'nuisance' change over time without any changes in objectively measured decibel levels or frequency of aircraft movements and so other factors must be involved. These factors can be political. Boeing took care to subcontract for parts for its SST at factories surrounding Kennedy airport, so that votes concerning whether SST's should be allowed to use the airport would be influenced by residents' concerns for their jobs! Workers at Filton and Toulouse would hardly try to ban Concorde landing near their homes, however, noisy it is!
Finally, there is a variation in recorded noise levels dependent on the skill of the pilot, and load factors of ~ the aircraft. Subsonic aircraft have been measured at twice the legal noise level, struggling to take off with heavy loads in adverse conditions. Concorde has been flying under-loaded, with skilled pilots, who have even been reported banking away from noise monitors.
Given this variety of problems, it would seem likely that Concorde causes even more noise pollution than data suggests, and that in comparison with subsonic jets will become comparatively worse as time goes on. |
Strengths:
Weaknesses:
How to improve your essay:
Mark out of 10:
|
This exercise was previously published in Gibbs, G. Module 3 Assessment, Certificate in Higher Education by Open Learning, OCSD (1989).
Commentary
Many hundreds of teachers from all subject disciplines have marked these two essays. There are broad similarities in their comments and their marks, but also important differences. While teachers are often doing something quite similar when they are assessing students' work, they can also disagree profoundly.
Essay 1 is usually given a mark in the range 3-7 with an average of about 5. It is generally considered to have no introduction, no main body, and no conclusion. The order in which the content of the essay is presented could be changed and it would make little difference, because there is no developing argument. The essay consists of facts in a list with no analysis. It is usually thought not to answer the question. The student who wrote it is often commended for having done some relevant background research but criticised for having made little appropriate use of this research.
In contrast Essay 2 is usually given a mark in the range 5-10, with an average of 7. Tutors comment that it goes beyond the scope of the question and considers the nature of the question itself, and the difficulty of reaching simple conclusions. It has a clear argument, draws a conclusion of its own, has a beginning, a middle and an end, and raises and considers a number of issues along the way. It is often considered to be well written.
In short, the author of the first essay is regarded as a poor student and the author of the second a comparatively good student. Whatever our subject expertise, we seem capable of making such general assessments of students and their work.
However there are complications. The general unanimity in the tutors' assessment of the two essays masks some sharply divergent views. The range of marks given for each essay is wider than the difference between the essays. Some people give Essay 1 7/10 almost a first. And some give Essay 2 a bare pass. Some even give higher marks for Essay 1 than Essay 2. This variation extends to the comments teachers write. Some think that Essay 1 does all you could reasonably ask: it provides a good range of facts upon which a decision can be made. Essay 2 is then described as being all over the place, providing little evidence to back up its irrelevant assertions. To some extent, those who feel this way tend to come from the 'hard' sciences. It would appear that sometimes, but not always, different sets of values and expectations operate in different subject areas.
These similarities and differences illustrate some important points about assessment:
- there is broad agreement as to what constitutes quality: about what we should be looking for when we assess students in higher education;
- there are wide differences between teachers in their standards and judgements: often the difference between teachers' appraisal is greater than the difference between students' performance;
- there are identifiably different values operating, to some extent related to subject disciplines;
- these similarities and differences have little to do with the knowledge the authors of the essays displayed: these general characteristics are exhibited independently of the specific content of the essay.
2.2 Making the criteria explicit
You were able to mark the essays in the previous exercise because you had certain criteria in your mind. The reason why teachers in higher education vary in the marks they award to the two essays is that they are using different criteria or, if they are using the same criteria, they are giving different weightings to them in terms of importance. This exercise underlines the importance of working to explicit criteria.
There are good reasons for making criteria explicit.
To be fair to the student. When you set a task or assignment, make clear the criteria on which it will be assessed, so the student can tailor her or his response to your requirements.
To avoid 'academic drift'. This is the term used to describe student responses where the criteria are unclear; they assume the key criterion to be an emphasis on content.
To encourage staff to adopt common standards in marking. If a number of different staff members are involved in marking a piece of work, clearly stated criteria should help to bring closer the standards to be applied. It also gives a clearer basis for discussion of any disagreements either between staff or between students and staff.
Assessment criteria can be introduced to the students in a number of ways. Some of these are suggested below.
Simply specify them
Opposite is an example of a marking protocol where the assessment criteria have been standardised for a whole degree programme within a modular structure. This grid is reproduced in the front of each module's student handbook. While not every criterion would necessarily be relevant to every piece of work in every module, where the criterion is relevant the standards for the different grades are clearly defined. They can then be applied in exactly the same way in each module. The student's overall grade for the piece of work is then based on the column in which the statements are concentrated.
Marking criteria protocol for Health Care Studies degree in a modular programme
| Refer/Fail |
Grade C |
Grade B |
Grade B+ |
Grade A |
Failure to address the actual question asked/task set. |
|
|
|
Overall presentation shows a professional and innovative approach to the topic. |
Purpose and meaning of assignment unclear.
Language, grammar and spelling poor. |
Meaning apparent, but... language not always fluent, grammar and spelling still poor. |
Language mainly fluent. Grammar and spelling mainly accurate. |
Thoughts and ideas clearly expressed. Grammar and spelling accurate and language fluent. |
Clarity of expression excellent. Consistently accurate use of grammar and spelling with fluent professional/ academic writing style. |
Fails to demonstrate understanding of the subject/ topic area. |
Attempts a logical and coherent understanding of the subject area. |
Demonstrates understanding in a style which is logical, coherent and flowing. |
Consistent understanding demonstrated in a logical, coherent and lucid manner. |
Work shows a well co-ordinated, grounded and reasoned understanding of topic and its relevance to practice. |
Significantly under/over required length as specified in module guide. |
|
|
|
|
Referencing inaccurate or absent. |
Referencing present but had inconsistencies and inaccuracies. |
Minor inconsistencies and inaccuracies in referencing using the Harvard System. |
Referencing relevant and mostly accurate using the Harvard System. |
Referencing clear, relevant and consistently accurate using the Harvard System. |
Inaccurate or inappropriate content/theory. |
Appropriate selection of content/ theory but some key aspects missed/ misconstrued. |
Most key theories included in work in an appropriate manner. |
Insightful and appropriate selection of content/theory in key areas. |
Assignment demonstrates considerable innovation in the handling of content/theory. |
Little or no evidence of reading around the subject. |
Evidence of some limited reading around the subject. |
Clear evidence and application of readings relevant to the subject within the text. |
Ability to appraise critically the theory and literature from a variety of courses, developing own ideas in the process. |
Has developed own ideas and justified using a wide range of sources of theories and literature which has been thoroughly analysed, applied and tested. |
Makes no attempt to address module focus, aims or themes of the assignment. |
Some of the writing is focussed on module aims and themes of assignment. |
Mainly focussed on aims and themes of the assignment. |
Clear focus on module aims and themes of the assignment. |
Module's aims and themes are integral to the assignment. |
No attempt at evaluation within assignment. |
Some attempt at evaluation within assignment. |
Evaluation reasonably well carried out. |
Good clear evidence of evaluation carried out within assignment. |
Evaluation within assignment rigorous and appropriate. |
Unsubstantiated/ invalid conclusion, based on anecdotes and generalisations only. |
Limited evidence of findings and conclusions supported by the literature and theory. |
Evidence of findings and conclusions grounded in theory/ literature. |
Good development shown in summary of arguments based on theory/ literature and beginnings of synthesis. |
Analytical and clear conclusions well grounded in theory and literature, showing development of new concepts. |
Lack of critical thought/analysis/ reference to theory. |
Some evidence of critical thought and rationale for work. |
Demonstrates applications of theory/critical analysis to the topic area. |
Clear evidence of application of theory/critical analysis. |
Assignment consistently demonstrates application of theory/critical analysis integrated. |
Failure to apply topic to personal, societal and professional practice. |
Superficial application to personal, societal and professional practice. |
Begins to show application to personal, societal and professional practice. |
Appropriate application to personal, societal and professional practice. |
Application of topic to personal, societal and professional practice relevant and innovative. |
(For oral presentations) Unsatisfactory speed of delivery and audibility in presentation. |
(For oral presentations) Speed of delivery and audibility fluctuate during presentation. |
(For oral presentations) Well paced delivery. |
(For oral presentations) Well paced and clear and confident delivery. |
(For oral presentations) Excellent clarity, pace and confident delivery. |
Inability to stimulate/ facilitate discussion. |
Some ability to facilitate discussion but tendency to miss opportunities or be directive. |
Some ability to stimulate and facilitate discussion. |
Clear evidence of ability to stimulate, facilitate and summarise discussion. |
Excellent enabling pacing and summarising of discussion. |
| Consider whether a version of this grid might be useful on your courses. What changes might you need to make? What problems might there be? | Specify the criteria, and get the students to use them in a 'marking' exercise
However clearly criteria are stated, students may not immediately understand them, or appreciate which are the most important, especially at the beginning of a course. A marking exercise can help to overcome this.
Get students to mark work. Take one or two examples of work, possibly from a previous year, and copy them for this year's cohort to mark individually using your criteria. They can then discuss each other's marking in groups and negotiate an appropriate grade. The experience of applying the criteria and discussing the outcomes is likely to reveal any lack of understanding and help students to see what is required of a 'good answer'.
Set a marking assignment. Give students the task of marking three pieces of work from a previous year's students. Grades for this can be based on how closely their marks for the three pieces correspond to those given by the tutor. This may sound harsh but the tutor on one engineering course where this has been done argues that as he will be marking much of their work for the next three years, it is important that students understand what he is looking for!
If you do not currently involve your students in any kind of marking exercise, consider how and where it might be possible to introduce it on your course.
Run an exercise for students to generate criteria
- For inexperienced students, this could be as simple as asking them to identify the characteristics of 'the perfect essay' individually or in pairs.
- Consider using the 'First class answer' overleaf.
This is an exercise you might use to help your students see the standards, values and criteria that are being applied in the assessment of their work. Try it for yourself.
| Consider the following essay question:
Compare and contrast the consequences of blindness and deafness on language development.
The marks students will achieve in responding to this question will be determined to a considerable extent by whether they answer the question. Or to put it another way, the standards students achieve will be judged by whether they undertake the task implied by the question or whether they respond as if to some other task. The accounts below describe the tasks students appear to have set themselves, and record how this was reflected in the grade awarded.
A First Class Answer Identify the consequences of blindness and deafness for language development. Compare and contrast these consequences, drawing conclusions about the nature of language development. Comment on the adequacy of theories of language development in the light of your conclusions.
Upper second Identify the consequences of blindness and deafness for language development. Compare and contrast these consequences.
Lower second List some of the features of blindness and deafness. List some of the consequences for development, including a few for language development.
Third Write down almost anything you can think about blindness, deafness, child development and language development. Do not draw any conclusions.
Take an essay or exam question, laboratory report, project or other assessed task which you set your students and rewrite it in a similar way: define the task that students are actually responding to when they achieve each grade.
Question or task as set:
First:
Upper second:
Lower Second:
Third:
From your definition of the task you expect of a student producing a first class answer, it should now be possible to identify the assessment criteria. |
(This exercise was previously published in Habeshaw, T., Gibbs, G., and Habeshaw, S., (1987), 53 Interesting Ways of Helping your Students to Study. Bristol TES. )
Once your students have generated their own criteria, you then have to decide whether you use them, or ignore them and use your own. A compromise may be to have certain criteria which are compulsory and non-negotiable, but to allow the students to add their own to these.
For example, in assessing an essay your three non-negotiable criteria might be:
- length of between two and three thousand words;
- use of at least three referenced texts;
- attention to appropriate syntax, punctuation and spelling.
Feedback sheets
Making the criteria explicit not only makes clear to students what is expected of them but also offers you a way of giving detailed feedback in a relatively undemanding way, e.g. by turning the listed criteria into a feedback sheet, as in the example [below].
| Additional comments
Feedback sheets can also form a useful starting point for any dialogue with the student in a subsequent tutorial.
A feedback sheet can help to make explicit the allocation of marks to particular criteria within the overall mark. This indicates to the student which are the most valued criteria and which therefore they need to concentrate on. The use of these sheets is considered in more detail later in the section on mechanising assessment (Section 8.3). |
The strategies outlined in this section are all designed to make explicit to students the criteria by which their work will be assessed. These strategies stem from the conviction that students will then be better placed to deliver work likely to fulfil these criteria. Detailed feedback on the extent to which students have been successful in meeting the criteria then becomes the point of departure for the next 16 phase of the student's learning.
|
|