Assessment is always going to be an imperfect tool. Gaming the system will always be possible; students can cram for tests and then forget things that they really should remember; a test, necessarily, tells us only part of the bigger picture. Part of the problem with assessment, I think, is that we aim for the one perfect system: National Curriculum levels, at least in their deployment, are supposed to provide us with a ‘cradle-to-grave’ (well, not quite) system that we can use to assess pupils throughout their schooling.
Aristotle famously wrote on this problem, though in the context of politics and not education. The ideal form of government may well be that of monarchy in which a virtuous, benevolent individual rules in the best interests of his people. The problem, according to Aristotle, is that this model is easily corrupted into tyranny, where one individual exploits his power for his own gain. No form of government can be ideal and so our best option is to hedge our bets and adopt a mixed constitution. In politics, for Aristotle, this involved a mix of monarchy, aristocracy and constitutional government.
I would suggest the same is true in assessment. The problems of trying to have one, all-encompassing, ideal assessment system are well studied, though I think the key point to bring out is that assessment necessarily serves multiple purposes. We want, as teachers, to give helpful feedback to pupils that allows them to get better at the thing we are teaching them. We want, as parents, to know how well our children are getting on (particularly in comparison to other children!) We want, as schools, to identify pupils who are falling behind so that some kind of intervention can be made. We want, as senior managers, to use data to make judgements about teacher competence. We want, as inspectors or the government, to hold schools to account. We want, as a society, to be able to make decisions (Should I employ this person? Should I let them in to university?) based on prior assessments. I simplify on all these fronts, but it is well recognised that assessment gets dragged in multiple directions and these demands modes of assessment that are not always compatible with one another.
I think the only answer to this is to ditch completely the idea that we might have one, all-encompassing assessment system. National Curriculum levels, for example, worked tolerably well as end of Key Stage assessments that might help with school accountability measures, but they are hopeless for giving formative feedback or providing parents with a sense of how pupils are getting on (so they were a Level 5 at the end of Year 7, they were a Level 5 last term, and they’re Level 5 this term…) There is a great deal of discussion at the moment as to what will replace levels: to my mind, another version of levels would be completely inappropriate. We need something else.
What can this be? I have written a few posts on this now. In one I argued that we need to decouple formative from summative assessment. In another, I argued that we need to use our subject expertise, and not a mark scheme, to give formative feedback. In another, I argued that we need to use task specific mark schemes for marking individual pieces of work. How might this all look in a model? I don’t have the answer here, but I am, every day, getting a clearer sense as to what this might look like, and I rather suspect the kinds of people who could drive this forwards are those reading this post. So here’s my first attempt: if you would like to help me make it better, then drop me an email or, even better, start a conversation on Twitter or in the comments so that everyone else can join in.
A mixed assessment constitution
This is slight development of my model in my article for the Historical Association.
Mode 1: frequent, low-stakes, testing of chronological knowledge
I speak here for history and I’m not sure how it works in other subjects, but I think we should have regular quizzes, timeline tests and so on as part and parcel of our teaching practice. Importantly, these tests should not just cover what was done in the previous lesson or week, but should test pieces of information learnt throughout schooling. Such tests are quick to do and easy to mark.
Purpose: diagnosing where pupils are ‘chronologically lost’.
Data produced: weekly scores out of ten.
Useful for: teacher (to diagnose ‘holes’ and possibly to plan interventions); pupils (what they need to revise).
Mode 2: milestone pieces of work at the end of a sequence of lessons
In history these are typically essays but can involve a number of other pieces of work as well. These should be marked using task-specific mark schemes. The piece can be given a summative mark (see this blog post on task-specific mark schemes) but it should be understood that a mark in this task is not connected to a mark in the previous half-term or next half-term. The marking can also be norm-referenced (i.e. how does a pupil’s work compare to others in this year and previous years).
Data produced: mark (e.g. Pass, Merit, Distinction) for a particular performance (e.g. an essay).
Useful for: parents (how is Jimmy getting on); teacher (what was not understood in the previous scheme of work); pupils (how well can I answer that particular question)
Model 3: end of year exams
I have been having a number of discussions recently about the model of music exams. The comparison is not perfect, but I think the model is good. There is, for example, no understanding that if a pupil gets a distinction at Grade 2 piano then they will necessarily get a distinction at Grade 4. Importantly, too, music exams test more than what was covered in one year: an examiner might well ask someone to perform a simple scale in a Grade 8 exam. Music exams are in themselves mixed constitutions: pupils perform scales, aural and oral tasks, sight-reading and practiced performances. I think this is a model that could work well for history, though I (and those I have been talking to) have yet to work out the fine details.
Data produced: end of year mark (e.g. Pass, Merit, Distinction)
Useful for: schools (particularly in identifying the pupils who fail so that additional support can be provided); parents (a summary of performance at the end of a year). Teachers and pupils probably do not find out anything more from this exercise in addition to what they already knew from Modes 1 and 2.
That’s the model so far. You will immediately notice that I have not included public exams in this. In part this is because I think public exams have dominated what goes on in classrooms for too long: we have ridiculous situations where children begin studying GCSE History (and its narrow range of topics) at the age of 13 for three years before taking the exam, with those three years very heavily focused on exam performance. I do not think it would be too difficult to have the system above with no public exams up to the age of 14.
The problem with my model, I think, is that it does not provide an obvious means for accountability. My end of term exams are measures of attainment and not achievement: just because someone gets a Merit in Year 8 does not mean they should necessarily get a Merit in Year 9. If this were to be turned into an accountability measure (e.g. what percent of pupils get a distinction) then we would be back with a measure that favoured schools with socially and academically selective intakes. If someone can solve this problem for me, then do let me know.
I should add, too, that I am not an expert on the systems used to measure ‘two levels of progress’. I have never been a senior leader, or an inspector, or a data manager. I do have serious concerns about the idea that, because a pupil gets a ‘Level 5’ in Year 6, then they should be getting a ‘Level 7’ in Year 9 and an A at GCSE: at least for history, I cannot see how a meaningful linear progression model could be created which would make this possible. Again, if someone can enlighten me on this, then I’d be most grateful.