How unreliable are the judges on Strictly Come Dancing?


That very clean glass wall won’t hold itself up. Photo by Dogboy82 – Own work, CC BY-SA 4.0,

Strictly Come Dancing, one of the BBC’s most popular shows involving celebrities moving in specific ways with experts at moving in specific ways while other experts check if they’re moving specifically enough contains certainties and uncertainties. We’re not sure who will be voted out in any particular week. We don’t know know what the audience are going to complain about. An injured woman not dancing! I was furious with rage! We do know that Craig Revel Horwood will use the things he knows to make a decision about whether he likes a dance or not while saying something mean. We can be pretty sure what Len Goodman’s favourite river in Worcestershire, film starring Brad Pitt and Morgan Freeman and Star Trek: Voyager character is. But can we be sure that the scores awarded by the judges to the dancers are accurate and fair?

In science, a good scoring system has at least three qualities. These include validity (it measures what it’s supposed to measure), usability (it’s practical) and reliability (it’s consistent). It’s difficult to assess the extent to which the scoring system in Strictly Come Dancing possesses these qualities. We don’t really know the criteria (if any) that the judges use to assign their scores other than they occasionally involve knees not quite being at the right angle, shoulders not quite being at the right height, and shirts not quite being able to be done up. As such, deciding whether the scores are valid or not is tricky. The scoring system appears to be superficially usable in that people use it regularly in the time it takes for a person to walk up some stairs and talk to Claudia Winkleman about whether they enjoyed or really enjoyed the kinetic energy they just transferred. In some ways, checking reliability is easier. Especially if we have a way to access every score the judges have ever awarded. And we do. Thanks Ultimate Strictly!

For a test to be reliable, we need it to give the same score when it’s measuring the same thing under the same circumstances. If the same judge saw the same dance twice under consistent conditions, we’d expect a dance to get the same score. This sort of test-retest reliability is difficult to achieve with something like Strictly Come Dancing. The judges aren’t really expected to provide scores for EXACTLY the same dance more than once. Otherwise you’d end up getting the same comments all the time; which would be as difficult to watch as the rumba is for men to dance. Ahem. However, you can look at how consistently (reliably) different judges score the same dance. If all judges consistently award dances similar scores, then we can be more sure that the system for scoring dancing is reliable between raters. If judges consistently award wildly different scores for the same dances, we might be more convinced that they’re just making it up as they go along, or “Greenfielding it” as they say in neuroscience.

To test this, all scores from across all series (except the current series, Christmas specials and anything involving Donny Osmond as a guest judge) were collated and compared. Below, we can see that by and large the judges have fairly similarly median scores (Arlene Phillips and Craig = 7, Len, Bruno Tonioli, Alesha Dixon and Darcey Bussell = 8). The main differences appear to be in the range of scores with Craig and Arlene appearing to use a more complete range of possible scores.


Box plot (shows median scores, inter-quartile ranges, maximum and minimum scores for each judge)

A similar picture is seen if we use the mean score as an average, with Craig (mean score = 6.60) awarding lower scores than the other judges, whose mean scores awarded range from 7.05 (Arlene) to 7.65 (Len and Darcy). Strictly speaking (ironically) we shouldn’t be using the mean as an average for the dance scores. The dance scores can be classified as ordinal data (scores can be ordered, but there is no evidence that the difference between consecutive scores is equal) so many would argue that any mean value calculated is utter nonsense meaningless not an optimum method for observing central tendency. However, I think in this situation there are enough scores (9) for the mean to be useful; like the complete and utter measurement transgression that I am. At a first glance, these scores don’t look too different and we might consider getting out the glitter-themed cocktails and celebrating the reliability of our judges.


Bar chart showing mean scores and variance for each judge.

In order to test the hypothesis that there was no real effect of “judge” on dance scores, I did a statistics at the data. In this case a Kruskal-Wallis test because of the type of measures in use (one independent variable of ‘judge’ divided into different levels of ‘different judges’ and one independent variable of ordinal data). And yes, it would be simpler if Kruskal-Wallis was what it sounded like, a MasterChef judge with a fungal infection. Perhaps surprisingly, the results from the test used could be interpreted as showing that the probability that the judge doesn’t affect the score was less than 1 in 10,000 (P< 0.0001). The table below shows between which judges the differences were likely to exist (P< 0.0001 for all comparisons shown as red).


Table showing potential differences between judges in terms of scores they give to dancers

Thus it would seem that the probability that Craig isn’t have an effect on score is relatively small. In this instance, Craig appears to be awarding slightly lower scores compared to the other judges. The same could be said for Arlene, except if she is being compared to Craig, where she seems to award slightly higher scores.

So it transpires that the scores on Strictly Come Dancing are indeed unreliable. Arlene did and Craig is throwing the whole system out of alignment like a couple of Paso Doble doing a Jive at a Waltz. Tango!

Possibly not though, for a number of reasons. 4.) I am clearly not an expert in statistics, so I may have just performed the analysis incorrectly. 2.) If differences do exist, they are relatively subtle and are likely to be meaningless within individual shows, only coming to light (and bouncing off a glitter ball) when we look across large numbers of scores. That is to say, that a statistical difference may exist, but this difference likely makes no practical difference. A.) At least it’s not The X Factor.

Keep dancing. And doing maths.

Medicus Ex Machina: Is the sonic screwdriver in Doctor Who a deus ex machina?

Let's hope they don't slash the special effects budget too much.

Let’s hope they don’t slash the special effects budget too much.

I like Doctor Who. “I am getting a bit fed up of the sonic screwdriver being used as a deus ex machina.” Is what I said in a brief fit of being wrong after watching a recent episode. I wasn’t wrong about me being fed up. I am capable of identifying my emotional state at least 20% of the time. I was wrong about the use of one of fiction’s most popular Time Lord’s favourite sonic tools. Yet that the sonic screwdriver gets used as a deus ex machina is one of the most common arguments involving the noise-based lock pick. So much so in fact that you might think that the people using the phrase think that the small amount of incorrectly used Latin will act as a deus ex machina in their argument and automatically solve any logical problems their point has. Quod erat demonstrandum.  However it is true that this literary device can be seen as lazy writing, leaving audiences unsatisfied. So what is a deus ex machina, is The Doctor’s sonic screwdriver a good example of one and if it is; why is the use of a deus ex machina problematic?

Doctor Who is a British science fiction programme produced by the BBC about an alien known as The Doctor who can travel through time and space.  It’s been going a little while and a couple of people watch it. The sonic screwdriver, first introduced to the programme in 1968, is a tool commonly used by The Doctor. It is multi-functional, with the most common use being as a lock pick (unless the lock is wooden or a deadlock seal because of rules). To this date the sonic screwdriver has been used to heal injuries, modify phones, scan and identify objects , probe another’s physiology, fix barbed wire, redirect the teleportation of the mayor of Cardiff, cut or burn substances, remotely control a time machine, summon a flying shark and generally put devices made by Apple to shame. This list is by no means exhaustive. Chances are if The Doctor comes across a problem, he’ll reach for his sonic screwdriver. Screwdrivers are cool.

Despite being so obviously useful (or because it was so obviously useful) the sonic screwdriver was briefly written out of the series in 1982. This was done on the instructions of the show producer John Nathan-Turner, arguing that such a device, which could help the main character out of almost any situation, was limiting to the script. It would become boring to the viewers if in response to any obstacle, the solution was always to produce this magic wand. Conversely if the screwdriver wasn’t used in response to a problem, pedantic viewers may be justified in asking why The Doctor didn’t just use one of the many known functions of this handyman’s dream tool. Luckily pedantic science-fiction fans are rare. Rare in the whole of the known universe that is. It is this omni-usefulness that has led to fans of the show to complain that the screwdriver is used as a deus ex machina.

A deus ex machina, literally a “god from the machine”, is a plot device whereby an apparently unsolvable problem is suddenly or abruptly solved, with the contrived and unexpected intervention of some new event, character, ability, or object. The potential original use of the phrase is from Horace’s Ars Poetica. Horace argued poets should never resort to a god from the machine to solve their plots. This more literally referred  to a crane or device used by actors playing gods in Greek tragedies being lowered onto or lifted up through the stage through a trap door.

There are a number of requirements for a plot development to be categorised as a deus ex machina:

1.)    Deus ex machina are solutions. They shouldn’t make things worse. They can’t be twists that only change the understanding of a story.

2.)    The plot device must be sudden or unexpected. If the relevant item is featured or referenced earlier in the story, they will not change the course of the story at that point or even appear to be a likely solution to the problem they  eventually are a solution to.

3.)    The problem the deus ex machina solves must be otherwise unsolvable. If the problem could be solved with common sense or another simple intervention, the solution is not a deus ex machina no matter how unexpected it seems. It’s just a bit fancy and unnecessary.

Popular examples of deus ex machina in literature and film include the random rescue of hobbits by giant eagles in The Lord of the Rings and the sudden arrival of King Richard in Robin Hood: Prince of Thieves to shuddenly sholve all the heroe’sh problemsh.

A deus ex machina is usually criticised as undesirable in writing and often used to imply lack of imagination in the writer. Reasons given are that it acts as a sudden disregard for a story’s logic and can challenge the suspension of disbelief required for an audience to remain emotionally involved in a narrative. Elephants on unicycles. It is usually argued it is better for characters to have agency within a story. Characters should be responsible for events with identified skill-sets leading to a more likely and perhaps more palatable story conclusion. In turn this leads to possible acceptable uses of the deus ex machina as a device.  The powerlessness of the characters in a large and mysterious universe may want to be highlighted. Or the use of a deus ex machina might be funny or used to make some other point. This point may or may not exist until after the use of a deus ex machina has been pointed out the writer.

Sonicscrewdriver2010Perhaps surprisingly there has been little research investigating why deus ex machina are
experienced as unacceptable. I could not find any apparent examples when searching PubMed, PsycINFO (search engines for a certain type of scientific research paper) or Google Scholar and nothing turned up at the last minute to unexpectedly deliver any to me. Experiments with babies show they pay more attention to unexpected events inconsistent with their rudimentary understanding of the world. For example if they are shown a doll, a screen covers that doll and they see another doll place behind that screen, they look for longer at the rigged experimental outcome of there being only one doll when the screen is lowered than when there are two. Similarly babies are shown to look longer at a ball which appears to roll on its own than a ball that is rolled by a person. Neither of these really tells us anything about the use of deus ex machina in literature and in fact could be twisted out of recognition to support some theory that says people prefer unexpected events or solutions. Sadly these shoehorned studies do not suddenly save us in exploring why deus ex machina are generally unsatisfying in stories.

Deus ex machina are definitely undesirable in science. Scientists devise hypotheses, deduce implications for observations from them, and test those implications. Any explanation that invokes some mysterious, unexpected solution to a problem without reference to the internal logic i.e. established scientific laws of the universe, is not a scientific theory at all. Even Bayesian statistics or “inverse probabilities” which start with a prior distribution and makes assumptions about probability can be used to check scientific models.  Implications of assumptions of the model are compared to the empirical evidence.  If the model makes wild claims from unlikely data that doesn’t fit the existing “good” evidence then it is likely not an accurate model. I’m talking to you Andrew Wakefield. Wakefield being another person in this post that’s not a real doctor.

None of this however answers our original (and likely now nearly forgotten) question as to whether the sonic screwdriver is a deus ex machina. As hinted I would now argue that it isn’t.  It certainly would fit our second criteria in acting as a solution or a quick fix. Also the third criteria in that the problems may be unsolvable without the screwdriver . However it is certainly not unexpected. As Andrew Ellard, script editor on such popular television programmes as The IT Crowd and Red Dwarf has argued, The Doctor as a Time Lord is an alien with extremely advanced technology. Sufficiently advanced in fact to often appear as magic. The sonic screwdriver is an example of this. The fact that it has a lot of functions appearing for the first time in certain episodes is also in keeping with this.  You don’t use all the applications of your smartphone all the time. An episode where The Doctor lists every function of the sonic screwdriver, set in stone for the rest of the series’ lifetime would not be interesting. Unless the idea of a Time Lord-inspired Top Gear-style, “Top Screwdrivers” appeals to you.

The sonic screwdriver is used to solve realistic (locked doors, wounds and flying sharks) but dull problems. We don’t want our hero to spend an episode staring at a locked door, fiddling with his scarf. We want him to use his established technology to move through the story to the more interesting problems. The sonic screwdriver allows this. It is not a deus ex machina and if used responsibly and not too frequently it is not a problem. Also Doctor Who is a thoroughly enjoyable series and even if the sonic screw driver were an occasional deus ex machina I’m not sure it would make it any less fun. Even if you are a surprised baby.

Wondering About Wonders of Life: Does it matter that Brian Cox is not a biologist?

If a brain in a jar thinks it watched "Take Me Out", will it still experience the irreversible damage?

If a brain in a jar thinks it watched “Take Me Out”, will it still experience the irreversible damage?

Professor Brian Cox has been on the telly recently and like everyone who goes on the telly he received a bit of criticism. Not from the various groups that harbour general furiousness at Professor Cox for showing that the Earth orbits the sun, or that very small particles can tell us something important about the entire universe or that unicorns don’t wear yellow wellingtons (or indeed any wellingtons), but by some scientists.

Over the past few weeks the image of Professor Cox has been flying into our homes, lounging at our delighted and curious faces over a partially lit mountain, a photosynthesising jellyfish in one hand, a mantis shrimp in the other and an infectious grin and a great big particle physicists brain in the centre. Following his successful series Wonders of the Universe and Wonders of the Solar System, Professor Cox is starring (or the biological equivalent not involving stars) in Wonders of Life. This show is about life in general and more specifically about how physics affects life and its definition, development and everyday functioning. For example the first two episodes were about what life “is” and how living things used physics to detect their surroundings. Sounds good doesn’t it? It is.

So why are some people annoyed? Well, it’s because Brian Cox is a physicist and this is about life, the turf of the biologists. And if anyone should know about turf it’s the biologists. This seems like an issue that has passed, and indeed a lot of the criticisms seemed only to be present just before and during the first episode. However I took the radical decision to watch the whole series and have a think before putting my precarious and confused thoughts on the matter in a blog post. I enjoyed the series and learned a lot. I certainly don’t agree with the mean spirited and badly thought out argument by Elaine Glaser that Brian Cox presents science in a manner that evokes wonder at science but little else.  Quite the contrary. I am much more in agreement with Stephen Curry’s Occam’s Corner article on the subject. The show invoked wonder, gave explanation and invited further investigation by the viewers themselves. The nobbers!  But does the fact that a show about life in its disgusting and multiple forms is presented by a physicist rather than a biologist matter?

Not really. The point of the show is to relay science, with some pretty pictures if possible, into the waiting and receptive eyes and brains of the viewers. As long as the facts and the scientific methods used to achieve these facts are as true as the scientific model will allow then does it matter who passes them along? As long as it’s done with a modicum of charm, is clear and conveys understanding, which Brian Cox of course manages to do, it should be fine right? He is a well known figure, known for his expertise and enthusiasm for science in general and this renown is useful in gaining audience trust and attracting an audience to a more credible show in the first instance.  Brian Cox is a particle physicist and nobody seemed to mind when he makes shows about space. It’s all just physics innit? It’s all the same innit? (No)

But it’s about biology, the biologists cry in a manner as loud and as plaintive as that of the kakapo, the loudest and one of the most amusingly named birds, whose mating call can be heard from 4-5 miles away! Isn’t it? Tangentially, yes. The show is about biophysics and as such and by the arguments of those who are complaining should need a biophysicist rather than a biologist. It’s not as if the show doesn’t have biologists behind it, advising on the presentation of their incredible and complex subject. To ignore their input and focus on Cox as the star (Starfish? Is this a biological equivalent?) is to miss their own point. Biologists and biophysicists do have a vital input to the show; they’re just not relaying the words.

As I have already mentioned Professor Cox has previous form in this area. He makes interesting and accessible shows that get people interested in science and if people need a recognisable face to draw them in then is that really a problem? Sir David Attenborough himself recently said he would gladly pass the torch of his legacy in science presenting to Professor Cox. This must have felt an astounding honour and I’m not sure many would argue that Attenborough has no experience of and no right to talk about nature. If they do it’s probably while whispering while hiding in a bush. Which is ironic.

Or is a trick being missed like a magician with bad aim? Could the show have been used to add another inspirational figure to the television roster? A biologist or biophysicist would have even more enthusiasm talking about a field they know in even greater detail and could act as another role model for people to admire and to seduce them further into scientific interest, exploration and understanding.

This is obviously a balancing act. Use a new expert for every programme and you lose the “celebrity” draw and trust in the show that initially attracts people and likely allows the show to be created in the first place. Only use one echinoderm (It doesn’t work, does it?) for every science programme and you lose credibility. After all if you’re only using the presenter as a celebrity to learn and repeat some facts and explanations without fully understanding them themselves then we might as well be watching Ant and Dec’s Saturday Night Geology or The Davina McCall of Nature. And as interesting as it is we can’t make every science TV show or documentary somehow about physics. This way lies such shows as Dead or Alive?: The Quantum Physicist’s Guide to Feline First-Aid and Wonders of the Art of Salvador Dali.

But the real star (I think we’ll just stick with that) of the show is of course the science and if that’s getting across accurately and entertainingly then that’s fine with me. Would it be good to have a biologist or biophysicist presenting this show and shows in a similar vein? Yes of course, the more the merrier, up to the point of celebrity dilution and audience loss. This feels like it needs a graph.

Is it bad for Brian Cox to do it? No, not really. I’m not an expert in television or physics so by a lot of the arguments relevant to this discussion my opinion on it doesn’t matter. And it doesn’t.  This seems like a good point to end the blog post. So does it matter that Brian Cox is not a biologist but is making TV programmes tangentially connected to biology? A bit but no really. Not an astounding or strong conclusion but like many issues in science communication the issue isn’t clear cut. Carry on Professor Cox! This is not a film suggestion.