Can we find out ‘what works’?
Consider the following three statements:
- “Converting secondary schools to academies improve GCSE attainment”.
- “Peer coaching is an effective form of teacher CPD .”
- “Learning to learn strategies lead to improvements in pupil attainment.”
These are all claims that have been made about education. The question is, do we have any empirical basis on which to decide whether these claims are true or false?
One option is just to say “no, we cannot”. For some, the social world is so complicated that any attempt to compare an intervention in any unique environment is doomed to fail. Every school is different; every teacher is different; every pupil is different. Therefore, we have no way of deciding whether or not any given change we make is effective, and so it is just down to everyone in their own context to work out what works for them.
There are some serious philosophical defences of this position, but it is worth noting that some people try to cut it both ways. If you are claiming that we cannot claim that ‘direct instruction’ is best practice because of the complexity of social world, then similarly you cannot claim that making a school an academy does not improve results: either we can measure the efficacy of an intervention, or we cannot.
Alternatively, your view might be “it is possible to decide on the efficacy of an intervention, but this requires a clear statement regarding measurable outcomes and a very carefully designed experiment”. This is probably my position.
My concern with empirical work in educational research is not that it is impossible to compare the efficacy of interventions: I do think it is possible to compare the same intervention in different contexts and to reach an overall generalisation about its efficacy. My concern, rather, is that empirical educational research is often poorly conceived, with weakly-bounded and loosely-defined interventions being used , with the result being that no one is really any the wiser by the end of the study. Any study into the efficacy of ‘group work’ or ‘teacher-led instruction’ is doomed to fail, as these terms can mean a wide range of things, and as such replicating them is very difficult to do.
If we’re serious about empirical research on ‘what works’, we need to be far, far more specific in terms of what we are trying to find out.
I think the answer to your title is not even yes or no, but “boop” as in the question does not compute. If you insert a “can” into each of your 3 statements then they each become true. Conversely, you only need one badly implemented or misplaced version of a given intervention to render any general statement as to its efficacy false. This does not undermine the intervention in itself – it could either be a bad idea, or a good idea badly implemented, or a good idea implemented well in an ill-fitting context. I agree that we need better defined terms, but those terms have to include an account of context. As soon as you make a generic claim that “intervention x works”, you are automatically wrong.
Hmm, not convinced here, because all this does is put us in a position where anything *can* work, so we’re not in a position to make judgements about whether or not something is worth doing. We’re no better off than we were before. The whole point of experimental research is not to say “intervention x works, 100%, every time, every place”. The point is to say “if you do x, the chances of you seeing the same result are y”. When “y” is high, the intervention is worth considering.
Reblogged this on The Echo Chamber.
If my tone appeared patronising, it was not intended – I think perhaps in recognising the importance of context and implementation I sometimes write with the zeal of a recent convert. However, I stand by my point. You said “The point is to say “if you do x, the chances of you seeing the same result are y”. When “y” is high, the intervention is worth considering”, and I don’t think this is sufficient – sound though the logic may appear.
I don’t agree either that the EEF toolkit is so generic as to be meaningless. For example in the section on feedback it says “Before you implement this strategy in your learning environment, consider the following…” and lists some points for consideration which include some really quite specific guidance as to what kinds of feedback are the most effective. However I think point 3 (“Have you considered challenge of implementing feedback effectively and consistently?”) is perhaps the most important and overlooked factor. Implementing an apparently effective strategy in such a way that its benefits are replicated in your setting is no mean feat – there are so many factors to consider.
On this note, your Mevagissey example is a rather flimsy straw man. There are many perfectly reasonable factors that you could and should consider when implementing an initiative or comparing your school to a research context. To cite just a few: the students’ prior and current attainment; the school’s current results and trends; student demographics; staff turnover/morale; behaviour; what other initiatives are already in place; whether the leadership will consult with staff, or just announce it “my way or the highway”; whether it is trialled first – etc etc.
In a talk a couple of years ago, Dylan Wiliam gave an example of an initiative in Trent which resulted in a 40% increase in graduation rates. But these gains were only possible because of an existing deficit. The same initiative wouldn’t necessarily work elsewhere, where practices were already more efficient. Does this make sense?
We’re not really on same page at all, which itself says an awful lot about the state of the research field! You see, I would look at Point 3 on the EEF guidance and say that, too, was hopelessly vague: who knows what ‘effectively’ and ‘consistently’ mean here? It’s a very good example of question begging: “What do you have to do for this strategy to be effective? You have to do it effectively!” We don’t gain anything from this!
And, although we don’t want to be in Mevagissey, my argument is that research needs to be more towards that end of spectrum with more carefully defined interventions in more carefully controlled environments.
And the Wiliam example is simply that it is easier to improve something which is already well below average. Heart transplants are more likely to increase life expectancy of those with heart defects than those without. Wiliam’s point is kind of what I was getting at with your school’s special measures.
I can see where you’re coming from because I used to think the same way. But it doesn’t work like that. Recently people seem to have settled on the idea that one of the most significant factors is the quality and quantity of the teacher’s feedback. For example feedback ranks highest in the EEF toolkit, where it is described as giving “high impact for very low cost”. And yet in one meta analysis of >600 feedback-based interventions, 40% of them – 40%! – had an effect size of less than zero. In other words, if you base your decision making only on probability as you describe, regardless of context, then even with what is widely seen as an effective intervention you have only a 60% chance of it being better than business as usual. The numbers game is necessary, but not sufficient; you need to look at where an intervention worked and where it didn’t, and then compare these contexts to your own, and *then* calculate the your odds of
I can also see where you’re coming from, for I used to think the same way, but it doesn’t work like that. Sorry if that last sentence sounded patronising! What you’ve constructed here is a good argument for exactly what I’m talking about. The problem with many of the EEF studies (and same with Hattie) are that they are so broad and generic that they are virtually meaningless. Asking “Does feedback improve learning?” is like asking “Does giving pills improve health?” It’s far too broad an idea to produce meaningful results (I would put things like ‘learning to learn’ in this bracket, but I know you’ll disagree).
So what we have to do is construct interventions that are far more tightly defined.
But the other end of the spectrum is equally vacuous. “An intervention into the impact of feedback on the learning of children born in 1999 in bottom set Year 8 in the Cornish village of Mevagissey on a Thursday afternoon in mid-May when it was sunny outside and the pupils had just come from Maths” is a waste of everyone’s time (in the context of this kind of research) because it cannot be replicated.
The trick is to find tightly defined interventions that can be transferred from one location to another. Engelmann’s Direct Instruction study is a good example of this: it’s very clearly defined and very easy to replicated in its precise form in different contexts, allowing you to make broader judgements about its efficacy.
Does this make sense?
I agree with everything you said in your post Michael, but I think there are two more things that could be said:
The first – following-on from your comment to pedagoginthemachine above – is that whilst we might indeed be able to identify that if we do x, then the chances of it working are y, and (taking your point further) that this is better than the chances of ‘z’ working, there will nevertheless be times when z WILL work better in a situation than x, and (I would argue), the professional teacher is probably going to be in a better position to spot this than the necessarily blunt normalised research statistics. I wonder how often it is that teachers have so little guiding information that they are best just relying on statistical comparisons when picking a technique…?
Secondly, if we allow our gaze to escape from the cosy security blanket of seeing education as simply being about schooling for exam results, then the research evidence will leave us with very little to adhere to at all anyway – our choice of technique could have plenty of desirable and perceivable, but hard to quantify, long term positive effects.
None of this is to say that we should do away with valuing scientific research by the way – I simply think we can’t allow ourselves to be fooled into just looking one way. We should be ‘evidence informed’ – balancing up evidence from above, from below and from within.
Thanks for commenting Chris. I’ll take each point in turn.
(1) Yes, relative importance is crucial, and you can make the case that teachers should go in knowing what is most likely to work, but then switch to z if x is not working, provided z has also been shown to be effective. The only problem one encounters with this line of argument is if it’s a curricular rather than a pedagogical matter: the best example of this is the phonics debate, where phonics is not so much the technique, but rather the thing to be learnt itself. But other than in these cases, I take your point.
(2) I wouldn’t use public exam results for research: I don’t trust their reliability or validity, and there are so many factors affecting attainment that one is left without a real sense of whether it was intervention x, or other factors A, B and C that were responsible. I think any experimental design needs to construct its own outcome test, which goes through complex reliability and validity checks, and then uses these to measure effectiveness.
Although the above post might not convey this, I am generally sceptical about empirical work in education: I think most of it is a complete waste of time. A lot of questions in education are normative (i.e. about what ought to be the case) and many of the other questions are curricular (such as what needs to be learnt at Stage 3 to make it possible to understand Stage 4).
Thanks again for your comments.
Thanks Michael – I think your responses here are astute and balanced. Please keep up the excellent blogging 😊
I suspect that education still has lots to learn from medicine in terms of standards of evidence, even if it doesn’t give direct “do this” instructions.
A model which feels useful to me is that pedagogy might be more like nutrition than acute medicine. In other words, there are some essential factors, but with multiple ways of delivering them in meals. Taking that a bit further, my concern is that, as an educational community, we spend too long obsessing about the exact content of vitamin pills, rather then making sure that there is a range of vegetables in the meal.
I don’t think the fact that we have different values, opinions or methods says much at all about the state of the research field, other than that it is subject to lively debate. Perhaps it says more about the teaching profession itself – either way, rather that than a landscape in which everybody agreed with either of us. And anyway I don’t think we differ as much as you might think – for example, I agree to an extent that “research needs to be more towards [the Mevagissey] end of the spectrum with more carefully defined interventions in more carefully controlled environments”, as this seems to me to require a close consideration of both research and implementation contexts. Although with respect, I think there remains unexplored territory here since schools are not generally “carefully controlled environments”. Real-world research contexts might be messy, but they do have the advantage of being, well, real. If education research can’t tell us anything about what might be worth doing in real-world settings, then it’s probably not worth bothering with.
With regard to the EEF’s ‘point 3’. I think you are in danger of extrapolating a short sentence to an institutional lack of clarity. The toolkit is a top-line summary designed to direct schools toward further reading which might be of use in pursuing evidence-informed practices. Elsewhere on the EEF site there is a whole page dedicated to the challenges of implementation, with links to further reading – see here https://educationendowmentfoundation.org.uk/campaigns/implementation/. And behind articles such as these there lies the emerging field of implementation science, which I think promises huge advances in improving social policy and practice in the coming years.
With regard to the Wiliam example, you are right, of course – it is easier to make gains in contexts where outcomes are already below average. But in a school with many overlapping and interdependent factors, how does one decide where to focus an intervention? What factors are “below average” in your context? ? Underachieving AG&T boys in year 9? SEN girls in year 7 Science? Do you have an issue with cannabis use in year 10? Is it all 3 perhaps? All I’m saying is that this should be the starting point – not the odds that intervention x might work in any given context, but with some sort of a SWAT analysis of the context in which you work. Then you would look at what has worked elsewhere – where intervention x has been effective, and where it has not. Only then would you decide whether it might be worth introducing in your context, and start grappling with the challenges of implementation. In this way you remain open to interventions that may not work well in the majority of cases, but which may be very well suited to your individual context. Which I think is preferable to only playing the odds game.