Tag Archives: Richard Lilford

To Subgroup or Not to Subgroup?

Far from being ‘stamp collecting’, as Ernest Rutherford is said to have claimed, classifying things is central to the scientific enterprise – imagine biology without the Linnaean taxonomy (multi-dimensional classification) of plants, animals and minerals (now plants, animals, fungi, protists, chromists, archaea and eubacteria kingdoms). Or medicine without its nosology. Classification has been the basis of all knowledge, and Rutherford was wrong – for example, astronomy is also built on a classification of stars and planets.

However, classification does not come free of problems. On the contrary, I call it the ‘central dilemma of epidemiology’. For a start, it is a human attempt to organise an underlying (latent) and often disorganised world. That is both its strength and its weakness. By organising the underlying complexity it allows abstractions to be made regarding the organising principles that underlie phenomena we observe about us. But the price we must pay is that we are often superimposing a classification over an underlying continuum. Thus, astronomical objects like Pluto can be ‘demoted’ and species change from one genus to another. Many health conditions do not fit neatly into one group or another, bearing features of both – think auto-immune disease and mental illness. Of course, any classification system is useful, insofar as it leads to new knowledge about underlying mechanisms and it is quite natural that the process is iterative, such that new classifications emerge – clades in biology rather than the original Linnaean family tree, for example.

But in the practice of epidemiology the issues of groups and subgroups can be a problem, not just because groups overlap, or misclassification may occur. A problem also arises in the interpretation of observed differences between groups. On the one hand, we do not want to miss important subgroup differences in the effect of an exposure on an outcome. On the other hand, we also want to avoid spurious associations. There are many examples, especially in the context of treatment trials, of subgroup associations that were subsequently over-turned.

The usual argument put forward to avoid spurious associations is that only subgroups specified in advance should be considered as a test of an hypothesis – all else is a fishing expedition, the results of which are to be down-weighted.

This is all very well but it just moves the problem from the analysis stage to the design stage. The corollaries are two-fold:

  1. Any subgroup must be selected on the basis of sound principles – there should be a theoretical model for an interaction between exposure and outcome. The statistical subgroup analysis is then designed to strengthen or weaken the credibility of the model. Note, the issue is the interaction between subgroup and outcome through the treatment effect. A direct effect on outcome is neither here nor there.
  2. Since precision is often low in a subgroup, and always lower than in the group as a whole, hypothesis tests are even less appropriate to subgroup effects than to the overall effect. Dichotomising the results into positive and null, and using this dichotomy to make a decision, is always stupid and is risible in a subgroup.

Some subgroups derive from an underlying, if latent, scale. Socio-economics groups, for example, or age. But others are irrevocably categorical. Gender, for example, or rural vs. urban residence. In the former situation – where the group is homologous (scalable) – a small subgroup is not a large problem, because the statistical model can look for a trend. The situation is more problematic when a small subgroup is not part of a homologous continuum. Any examination in the small subgroup will be imprecise in proportion to its size. Amalgamating it within a larger group makes sense on the basis that ‘it’s better to have a precise answer to a general problem, than an imprecise answer.’

But this logic breaks down if there is a sound theoretical reason to expect a different result in the small sub-group. Grouping trans people with male or female would be unsuitable for many purposes. In such a situation it is better to have an imprecise answer to a specific question.

Richard Lilford, ARC WM Director


Science Denial and the Importance of Engaging the Public with Science

A recent paper in JAMA, concerning science denial, tackles a problem of immense importance.[1] For us scientists, science denial negates our reason for being. Far more important though, is the effect on society. We need to think only of the vaccination fiasco. The JAMA paper used the difficulties that people with certain neurological conditions have with processing information as an analogy for the challenges that people with low scientific literacy have with interpreting complex graphs. Such difficulties leave room for false beliefs, including beliefs in conspiracy theories. While this analogy might shed light on neural mechanisms, there are far more important determinants of science denial in the population at large. One issue is the effect of education. Lack of educational attainment is consistently associated with science denial and the propensity to believe in conspiracy theories.[2]

Of course, this does not prove that improving science education would solve the problem. It may simply be the case that the cause of low educational achievement is also the cause of a predisposition to believe conspiracy theories. For example, low self-esteem or cognitive ability may be determinants of both low educational attainment and science denial. More likely, education plays a part, and both nature and nurture are involved. In that case, educational achievement conditional on early-life cognitive ability should correlate with resistance to conspiracy theories. We do not know whether this possibility has been examined.

Debunking misinformation with evidence or education is not enough. In responding to COVID-19, behavioural scientists were quick to point out that debunking could even lead to a backlash and increase the belief in misinformation. While the evidence on backlash is mixed, alternative approaches are still needed. One alternative is ‘pre-bunking’,[3] which is analogous to medical inoculation: people are exposed to a little bit of misinformation that activates their ability to critique it, but not so much misinformation as to be overwhelming. Web-based games like ‘Get Bad News’ apply this approach and are used by governments and schools to reduce people’s susceptibility to fake news. Reminding people before they engage with information to assess the accuracy of sources may also help.[4]

Yet, education, pre-bunking, and reminders are arguably ‘demand-side’ factors, which largely rely on the public selecting into engagement with science. These may be the very people least likely to denounce it. Given this, it is incumbent upon policymakers – and academics – to address the ‘supply-side’ factors, too. They must consider how to provide trustworthy, transparent, and accessible information, including to those with lower levels of education or cognitive ability. Sadly, this does not always happen; for example, little effort appears to have been directed towards testing some of the public health messaging about COVID-19 in the UK.[5] Confusing messaging can breed uncertainty, which is easily filled with simple but false information – including scientific information. Critiquing conspiracy theorists for their ‘bad science’ is unlikely to be persuasive. Instead, we advocate building trust in rigorous science.

Engaging the public with science is critically important; we can hardly think of a more important issue. Here at ARC West Midlands we take public engagement very seriously. We continuously seek opportunities to engage on science. In previous news blogs, we tested some of the government’s COVID-19 messaging ourselves,[6][7] and described our plans to use geospatially referenced maps to engage communities where COVID-19 infections are not under control.[8] We are engaging the public in numerous implementation science projects, including one based on mathematical modelling and another on the role of chance in decision-making. In all of these, development of the service, engagement with decision-makers, and with the public, go hand in hand.

Richard Lilford, ARC WM Director; Laura Kudrna, Research Fellow


  1. Miller BL. Science Denial and COVID Conspiracy Theories: Potential Neurological
    Mechanisms and Possible Responses
    . JAMA. 2020.
  2. Van Prooijen J-W. Why Education Predicts Decreased Belief in Conspiracy Theories. Appl Cognit Psychol. 2016; 31(1).
  3. Van Bavel JJ, et al. Using social and behavioural science to support COVID-19 pandemic response. Nat Hum Behav. 2020; 4:460-71.
  4. Pennycook G, et al. Fighting COVID-19 misinformation on social media: Experimental evidence for a scalable accuracy-nudge intervention. Psychol Sci. 2020; 31(7):770-80.
  5. BBC News. Coronavirus: Minister defends ‘stay alert’ advice amid backlash. 10 May 2020.
  6. Kudrna L, Schmidtke KA. Changing the Message to Change the Response – Psychological Framing Effects During COVID-19. NIHR ARC West Midlands News Blog. 2020; 2(7): 7-9.
    See also our London School of Economics and Political Science blog.
  7. Schmidtke KA, Kudrna L. Speaking to Hearts Before Minds: Increasing Influenza Vaccine Uptake During COVID-19. NIHR ARC West Midlands News Blog. 2020; 2(10):9-11.
    See also our London School of Economics and Political Sciences blog.
  8. Lilford RJ, Watson S, Diggle P. The Land War in the Fight Against COVID-19. NIHR ARC West Midlands News Blog. 2020; 2(10):1-4.

Use of Causal Diagrams to Inform the Analysis of Observational Studies

Observational studies usually involve some sort of multi-variable analysis. To make sense of the association between an explanatory variable (E) and an outcome (O), it is necessary to control for confounders – age for example in clinical studies. A confounder (C) is a variable that is associated with both E and O. Indeed it is causal of E and O as shown by the direction of arrows in Figure 1.

Fig 1. Causal Diagram for a Confounder

A common error is to mistake a confounder for a mediator. If the variable lies on the causal pathway between E and O, then it is a Mediator – M in Figure 2.

Fig 2. Causal Diagram to Distinguish Between a Confounder (C) and a Mediator (M)

Failure to make this distinction, and to adjust for M, will reduce or remove the effect of E on the outcome. In a study of the effect of money spent on tobacco on lung cancer, it would be self-defeating to adjust for smoking! If we are interested in decomposing different causal pathways, then we should adapt the multivariable analysis to examine how much of the effect of E or O is explained by the putative mediator (M in Figure 2) – a structural equation model or ‘mediator’ analysis.

There are some issues to consider:

  1. It may not be possible to say for certain whether a variable is a mediator or confounder and some variables may be both. Then try the analysis three ways: omit it, treat it as a confounder, or treat it as a mediator.
  2. It is hard to know which variables to include as confounders. A dataset was sent for analysis by 29 different teams of statisticians.[1] They came up with different results that varied wildly. This was because they adjusted for different combinations of variables. The corollary is that choice of variables should not be left to statisticians – it turns on causal theory that distinguishes between variables that are likely to have arrows pointing from E and O via M, and those pointing from C to both E and O (Figure 2). Context matters!
  3. There may be an interaction between variables, such that the causal effect of one variable on E or O is amplified or attenuated in the presence of another. Given four variables, each with four ‘levels’, yields 256 possible first order interactions. So, again, theory is needed to determine which variables to include in such interaction tests.

A variable may exist that is an independent cause of C or M (let’s call these C* and M*), as in Figure 3. There is no reason to adjust for these variables. Likewise, do not adjust for any variable that ‘precedes’ E, as also shown in Figure 3.

Fig 3. Variables That Cause Change in Other Variables

In this example, C* and M* are not causally linked to O, except through C and M respectively. But a situation may occur where such a link is possible. It is well known that maternal smoking is causally linked to both low birth-weight and to neonatal deaths, as per Figure 4. The theory is that smoking is toxic and leads to both a small baby and, via that pathway and other pathways, leads to neonatal death.

Fig 4. Causal Pathway for Smoking and Neonatal Deaths

If this analysis is conducted controlling for ‘small baby’, then smoking is associated with lower mortality – it appears protective. The obvious fault was to control for a variable on the causal pathway, as per Figure 2. But this could explain why the association may be reduced, but not reversed.

The explanation for the reversal lies in a putative third variable (perhaps a ‘genetic’ defect, G), which predisposes to both a small baby and neonatal death (Figure 5). Note, that both E and G collide on M, and such a scenario leads to ‘collider bias’ – by controlling for one source of bias, the door is opened to another. It is well known that there may be unobserved (‘lurking’) confounders in any association. The same applies, of course, to a variable that might completely alter the meaning of an association once one has conditioned on another variable.

Fig 5. Collider Bias

These analyses show that conducting a multivariable analysis is not, or rather should never be, an entirely data-driven / empirical exercise. Choices have to be made, such that the statistical model informs on, but does not determine, the causal model. For a brilliant example of extensive causal chains involving confounders, colliders and mediators, see an example from Andrew Forbes and colleagues.[2]

Note, we are not arguing against adjustment per se. It is an essential part of the analysis. We argue against adjusting without reference to a causal model.

Richard Lilford, ARC WM Director; Sam Watson, Senior Lecturer [With thanks to Peter Diggle (Lancaster University & Health Data Research UK) for comments.]


  1. Silberzahn R, et al. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Adv Method Pract Psychological Science. 2018; 1(3).
  2. Williamson EJ, et al. Introduction to causal diagrams for confounder selection. Respirology. 2014; 19(3): 303-11.

The Land War in the Fight Against COVID-19

Gone are the days of thinking there is a quick fix to the COVID-19 pandemic. Another country-wide lockdown would reduce COVID-19 infection, but at the same time would damage the economy and pose a threat to other long-term health conditions, with disproportionate effects on the more disadvantaged groups in society. The Great Barrington Declaration – aiming for herd immunity while sequestering high-risk people – does not bear close examination.[1] Vaccination is not an automatic get out of jail card – we do not yet know when vaccination will be available at the required volume, nor what degree of protection it will confer. So, this is the land war. We must work on supply chains, procedures, detection and contact tracing, getting ever slicker at the operation. Personal protection, social distancing and graded lockdowns can all play a part, but only if they are accepted by the general public, who deserve clear explanations of when, where and why unwelcome restrictions will be imposed and what these restrictions are intended to achieve.

While central government has an obvious role to play, it has become clear that the battle must go local; and the more local the better. The risk of being hospitalised with COVID-19 in Birmingham varies dramatically across the various electoral wards, with the seven-day rolling rate of new cases (for week ending 14 October 2020) ranging from 43.8 per 100,000 in Nechells, to 825.8 in Selly Oak.[2] So, supported by the MRC, NIHR ARC West Midlands and our host hospital (University Hospitals Birmingham NHS Foundation Trust) we are developing a computer application to track the evolving pattern of the COVID-19 pandemic. We have developed software that uses geostatistical models to identify “hot spots”, however one defines them, across a broad space such as an urban conurbation or a country. Within such a space we identify localities at whatever scale is relevant for local decision-making and that the data can support. We can map rates of infection per unit of population in real time on these maps to show the current state of the epidemic and its direction of travel (see Example). These maps can direct decision-makers to specific localities where incidence is increasing rapidly and hence where urgent action is needed.

But there is a problem with policy action directed at small areas and particular communities – dictatorial edicts are likely to provoke resentment rather than effective action, especially when carried out at a very local level. It is one thing to place restrictions across a whole country or even a large city, but quite another to try to lockdown an area such as Lady Pool in Birmingham or Chapel Town in Leeds. Indeed, the disease has highest incidence in BAME communities who may feel victimised or disenfranchised. Already only 18% of people fully comply with UK regulations regarding self-isolation.[3] So here we come to the second use of our application and the maps it produces.

We think that policy-makers should increasingly turn to local communities and ask them to be the architects, not recipients, of policy. In essence we are arguing for an ‘assets-based’ or ‘participatory’ approach based on ‘co-invention’. And here our application can help by providing scientific data at a local level in a form that can be easily assimilated. We are arguing at a local level for the type of thing that Prof Chris Witty used at a national level in his Downing Street presentation with the Prime Minister and Chancellor (12 October 2020). There is evidence that populations relate well to local maps and they are sometimes used in qualitative research as a method to promote discussion among people.[4] The approach we are advocating here, of high-risk spatio-temporal identification, followed by case-area targeted intervention, has proven effective in limiting the spread of cholera outbreaks,[5] and we advocate a similar approach with respect to the COVID-19 pandemic.

We would be pleased to hear from news blog readers regarding:

  1. Your opinions and advice.
  2. Whether you would like to hear more or use the application when it is developed.
  3. Whether you have examples of similar initiatives elsewhere in the world.
  4. Whether you would like to collaborate.

You can contact us at ARCWM@warwick.ac.uk.

Richard Lilford, ARC WM Director; Sam Watson, Senior Lecturer; Peter Diggle, Distinguished Professor at Lancaster University

Example of Real-Time Surveillance of COVID-19

For this example we have aggregated the results to MSOA (middle-layer Super Output Area) level across the catchment area of University Hospitals Birmingham NHS Foundation Trust, although we have retained other areas of Birmingham to make the boundary of the city clear. One could aggregate to smaller or larger levels as needed. A case here is an admission to hospital for COVID-19.

We have produced these outputs as if we were working on March 26 2020 using data from the preceding two weeks. The first thing someone interested in tracking COVID-19 in the city might ask is what is the incidence of the disease that day?

There is a lot of variation across the different MSOAs, with one area standing out as being high (yellow area). The variation here could be explained by differences in demographics or socioeconomic status, and we might want to ask whether any differences are for unexpected reasons. We can break down the incidence into
different components:


  • Expected is the number of cases we would expect that day from each area based on the size of its population.
  • Observed shows the relative risk in each area associated with observable characteristics
    (age, ethnicity, and deprivation). For example, consider if the average incidence across the city were one case per 10,000 person-days. An area with a larger proportion of older residents would have a high risk; if this risk were double the average then it would have a relative risk of two.
  • Latent is the relative risks in each area due to unexplained factors or unobserved
    variables. Our area with more older people may have an expected incidence of two cases per 10,000 person-days (a ‘baseline’ of 1 per 10,000 person-days times a relative risk of two), but if we observe an average rate of four cases per 10,000 person-days, then there is an additional unexplained relative risk of 2.
  • Posterior SD indicates the predictive variance.

So based on these plots the area with high incidence in the North of Birmingham would appear to be higher than we would expect based on the observed variables by factor of 2 or 3. This may indicate the need for public health intervention. We might finally ask, how this compares to previous days?

The next plot shows the incidence rate ratio, which here is the ratio of incidence compared to seven days prior for each area. A value of one indicates no change, two a doubling, and so forth. One can clearly see that it is above one, i.e. it is increasing, city-wide. The greatest relative increases are centred on the area we identified as being of high concern.


  1. Alwan NA, et al. Scientific consensus on the COVID-19 pandemic: we need to act now. Lancet. 2020.
  2. Public Health England. Coronavirus (COVID-19) in the UK: Interactive Map. 19 October 2020.
  3. Smith LE, et al. Adherence to the test, trace and isolate system: results from a time series of 21 nationally representative surveys in the UK (the COVID-19 Rapid Survey of Adherence to Interventions and Responses [CORSAIR] study). MedRXiv. 2020. [Pre-print].
  4. Boschmann EE, Cubbon E. Sketch maps and qualitative GIS: Using cartographies of individual spatial narratives in geographic research. Professional Geographer. 2014;66(2):236-48.
  5. Ratnayake R, et al. Highly targeted spatiotemporal interventions against cholera epidemics, 2000-19: a scoping review. Lancet Infect Dis. 2020.

When Waiting is Not Enough

Healthcare is emerging from the immediate crisis response of COVID-19 into a hugely uncertain environment. One of the very few things of which we can be sure is significantly longer waiting times for elective procedures.

The Health Foundation recently published a report drawn from pre-COVID data,[1] which starkly portrayed the challenges around the 18 weeks Referral to Treatment target. The report estimated that the NHS needed to treat an additional 500,000 patients per year for the next four years to restore delivery of the target. Using data from NHS England following the first month of COVID-19 induced elective shutdown, Dr Rob Findlay noted a jump, both in the number of patients waiting over 52 weeks, and the average wait time for patients, which rose to 6 months.[2] These figures are likely to increase further in coming months. The article also noted that very few long-wait patients were treated. Longer wait patients should be de facto low clinical urgency, as it is this that has made them appropriate to wait.

There are two significant decision-making points for the treatment of patients on waiting lists. Clinical urgency, which of course affects those near the start of their waiting time, and being in imminent danger of breaching a waiting time target, which necessarily affects those towards the end. Between these decision-making points at the start and end of the waiting list lie a huge volume of patients with little categorisation or prioritisation.

Herein lies a significant future challenge: as waiting times increase and a growing number of patients breach waiting time targets, how do you ensure that limited elective capacity is targeted towards those with greatest clinical need?

If NHS England and NHS Improvement do not relax waiting time restrictions, maximum wait times will continue to be an important decision-making point. This incentivises providers to make a trade-off and treat longer waiting, but clinically less urgent, patients over short waiting, but clinically more urgent, ones. This would be a difficult position to justify ordinarily but in a time of likely constrained resource, the policy is likely to do far more harm than good.

It is crucially important to use need as the basis for prioritising which patients to treat. A recent literature review described some of the efforts made around the start of the millennium to develop a more systematic and transparent approach to prioritisation based on need. This approach developed from the Western Canada Waiting List Project [3] and the New Zealand Priority Criteria Project.[4] These approaches were rigorously reviewed through a range of academic articles and evaluated well, showing both transparency and consistency of decision making and prioritisation. Importantly, they also carried strong public support when reviewed with focus groups.

These ‘point-count’ systems work by creating a scoring chart for each clinical condition, such as cataract surgery, major joint replacement, coronary bypass graft. However, they have also been successfully used and evaluated for topics such as the use of Magnetic Resonance Imaging (MRI) and children’s mental health. The scoring grid is unique to each clinical condition and developed through consensus discussion with clinicians to balance a range of clinical and social factors. The objective is to prioritise patients for treatment who will gain the most substantial benefit from intervention.

‘Point-count’ systems have translated successfully into several healthcare settings but not to the NHS. Often these types of changes are put in to the ‘too difficult’ category as the resource required to implement them is seen to be greater than the benefit gained. However, we are moving to a different paradigm post COVID-19 where integrated care systems are more accountable to their population and a more objective and transparent decision-making process is desirable.

Think too of the benefits of a shared language of waiting lists. We should not forget that many non-clinical staff are involved in the booking and scheduling of elective patients. A common currency in which objective comparisons can be made on the likely benefit of surgery or intervention across clinical indications and specialties is highly appealing.

One of the most keenly-debated elements of the development of these ‘point-count’ systems was what factors should be considered as part of the scoring criteria. Repeatedly the idea of including some reflection of how long a patient had waited was considered, and strongly rejected. Instead a measure of ‘potential for disease progression’ was included to ensure those, for instance, waiting for a joint replacement procedure, were not constantly usurped by patients with a more acute presentation. However, it guards against the current system of those waiting longest receiving priority at the potential expense of another who would derive greater clinical benefit.

So, as a policy directive there is a clear indication – the maintenance of the current maximum wait times will prioritise many clinically less urgent patients over more urgent cases. It remains to be seen whether the evidence base is substantial enough, and whether there is sufficient appetite within the NHS to revisit some of these clinical prioritisation approaches, but their use should be considered and their implementation would make a fascinating piece of research in the coming years.

Paul Bird, Head of Programme Delivery (Engagement), Richard Lilford, ARC WM Director

With thanks to Prof. Tim Hofer (University of Michigan) for discussion and input.


  1. Charlesworth A, Watt T, Gardner T. Returning NHS waiting times to 18 weeks for routine treatment. The Health Foundation. 2020.
  2. Findlay R. Average waiting time for NHS operations hits six months thanks to covid. Health Serv J. 2020.
  3. Noseworthy TW, McGurran JJ, Hadorn DC, et al. Waiting for scheduled services in Canada: development of priority-setting scoring system. J Eval Clin Pract. 2003; 9(1): 23-31.
  4. Hadorn DC, Holmes AC. The New Zealand priority criteria project. Part 1: Overview. BMJ. 1997; 314: 131.

Recognising the rising tide in service delivery and health systems research

With rising demands and finite resources, health systems worldwide are under constant financial pressure. The US has been at the extreme end of high spending, with health expenditure consisting of 17% of its GDP in 2017 – compared with 9.8% for the UK and 8.7% for the average of the OECD countries (OECD).[1] Therefore, the imperative of containing healthcare cost is mounting in the US. Under the Affordable Care Act (ACA), alternative payment models (often known as value-based payments) have been widely introduced to replace the fee-for-service model.

A recent article in JAMA highlighted a paradox,[2] in which an apparent plateau in overall healthcare expenditure (at around 18% of US GDP) is contrasted with lack of significant success reported in individual evaluations of these alternative payment models. Why has health spending as a proportion of GDP plateaued when the interventions to reduce spending have been ineffective in doing so? The authors ruled out the explanation that the growth in GDP has outpaced the growth of health expenditures as the latter seems to be genuinely flattening. So how can this discrepancy be reconciled?

The authors offered three explanations:

  1. Anticipation of ACA-driven expansion of alternative payment models may have induced changes in the psychology and practice of clinicians and health care organisations, leading to curbs on spending irrespective of the introduction of alternative payment models.
  2. Primed by the above change in mindset, clinicians and health care organisations may have been influenced by their peers and emulate their practice. This would cause a wider spread of the change beyond the institutions where the alternative payment modelled were first introduced and evaluated (e.g. from within the Medicaid system to those covered by commercial insurers).
  3. Simultaneous introduction of a large number of alternative models in different places may have led to contamination of control groups in individual evaluations, where the control group chosen in one evaluation may be subject to the introduction of another alternative payment model.

Taken in the round, these explanations suggest a secular trend of system-wide changes (in this case cost containment), which may take various forms and be achieved through different means, but which are triggered by heightened awareness of the same issue and shared social pressure to tackle it across the board – what we have described as the ‘rising tide phenomenon’.[3] The phenomenon is by no mean a rare occurrence in health services and systems research and so is well worth considering when a null finding is observed in a controlled study. The corollary is that when there is a rising tide, null findings do not disprove the potential effectiveness of the intervention being evaluated. A more nuanced interpretation taking into account the secular trend is required, as the authors of the aforementioned paper did.

Yen-Fu Chen, Associate Professor; Richard Lilford, ARC WM Director


  1. Organisation for Economic Co-operation and Development. Health. 2020. Available at: https://stats.oecd.org/Index.aspx?ThemeTreeId=9
  2. Navathe AS, Boyle CW, Emanuel EJ. Alternative Payment Models—Victims of Their Own Success? JAMA. 2020; 324(3):237-8.
  3. Chen Y-F, Hemming K, Stevens AJ, Lilford RJ. Secular trends and evaluation of complex interventions: the rising tide phenomenon. BMJ Qual Saf. 2016; 25(5): 303-10.

The Holy Grail of Quality Measurement

Writing in JAMA, Austin and Kachalia argue for automation of quality measurements.[1] We ourselves have argued that the proliferation of routine quality measures is getting out of hand.[2]

The authors argue, as we have argued, that using quality measures to incentivise organisations is a blunt tool, subject to gaming. Far better, is to use quality measures in real-time to prompt doctors to provide high quality care.

In fact, this is what computerised decision support offers. There is considerable empirical support for use of this type of decision tool. Working with Prof Aziz Sheikh and colleagues NIHR ARC West Midlands has investigated decision support for prescribing [3] and we are now investigating its use in antibiotic stewardship.[4] We are entirely in support of the use of decision support to improve care in real-time.

However, we question the idea that the majority of healthcare can be guided by online decision support. Working with Prof Timothy Hofer in Michigan, ARC WM co-investigators have shown that the measurement of the quality of hospital care is extremely unreliable.[5] Kappa measures of agreement between reviewers were about 20%. This means that seven reviewers would be needed for each case note, to achieve a reliability of 80%.

That is to say, that for much of medical care, there is no agreed standard. Truly, the majority of medical care is more art than science.

We think that the time has arrived to abandon hubristic notions about standardising and quality assuring the generality of clinical care. Medicine is not like aviation. Commercial aviation is almost entirely computerised. Emergencies aside, the whole process can be guided algorithmically. Our paper in Milbank Quarterly, shows quite clearly that this is not the case for medicine.[5]

Working with Prof Julian Bion, the ARC WM Director had an opportunity to audit numerous case notes from patients with sepsis.[6] The idea was to observe quality of care against a package of evidence-based criteria. Many of these criteria was based on actions that should be carried out within a specified time from diagnosis. The exercise proved almost impossible, since the point of diagnosis was ephemeral. In most cases there was no clear point to start the clock and the very diagnosis of sepsis had to be reverse-engineered from the time at which a sepsis-associated action took place! This exercise provided eloquent testimony to the judgemental, rather than rules-based, nature of much medical practice. We should use algorithmic decision support where clear rules exist, but we must stop pretending that the whole of medicine can be guided in this way. Perhaps we should just stand back a little, and accept some of the imperfections in our systems. Like a harm-free world, perfection will always lie beyond our grasp.[7]

Richard Lilford, ARC WM Director


  1. Austin JM, Kachalia A. The State of Health Care Quality Measurement in the Era of COVID-19. The Importance of Doing Better. JAMA. 2020.
  2. Lilford RJ. Measuring Quality of Care. NIHR CLAHRC West Midlands News Blog. 21 April 2017.
  3. Yao GL, Novielli N, Manaseki-Holland S, et al. Evaluation of a predevelopment service delivery intervention: an application to improve clinical handovers. BMJ Qual Saf. 2012;21:i29-38.
  4. Usher Institute. ePrescribing-Based Antimicrobial Stewardship. 2020.
  5. Manaseki-Holland S, Lilford RJ, Te AP, et al. Ranking Hospitals Based on Preventable Hospital Death Rates: A Systematic Review With Implications for Both Direct Measurement and Indirect Measurement Through Standardized Mortality Rates. Milbank Q. 2019;97(1):228-84. 
  6. Lord JM, Midwinter MJ, Chen YF, et al. The systemic immune response to trauma: an overview of pathophysiology and treatment. Lancet. 2014;384(9952):1455-65.
  7. Meddings J, Saint S, Lilford RJ, Hofer TP. Targeting Zero Harm: A Stretch Goal That Risks Breaking the Spring. NEJM Catal Innov Care Del. 2020; 1(4).

Leadership, Heroism and Heroic Leadership

Some years ago, two outstanding academic leaders, Peter Pronovost and Lord Ara Darzi, wrote an article in which they argued for an end of heroism in medicine.[1] I responded in the pages of our previous CLAHRC WM News Blog along the lines of, be careful what you wish for.[2]

I was reminded of this interchange by the evening celebration of health workers seen across many countries of the world during the COVID-19 pandemic. What were members of the public doing, if not allowing health service to feel just a little heroic? Quite right too, health staff risk their lives on an almost daily basis and have a higher mortality compared to other people of similar ages.

One recent morning I heard a poem about nurses on the radio. The poet was making the point that nursing is not just another profession. I have been a doctor and a patient and I can tell you that from my perspective being a doctor or a nurse is certainly not just another profession. Yes, it is a calling, even if the call comes from inside.

Doctors and nurses put their lives on the line when necessary. They will work all night. They will stay on at the end of the afternoon if they still have patients to see. These are the things we do, we like to do them, and we are admired for doing them. We put ourselves out and we go the extra mile. The patient is not a client, or rather they are privileged clients.

But let us also be aware of the dangers of heroism that might turn self-indulgent and become almost narcissistic. Leadership involves determining a course of action, often an unpopular or dangerous one, and then carrying people with you. Leadership can be demonstrated anywhere within an organisation. My business colleagues talk about dispersed leadership. I have both led people senior to me and I have been led by people junior to me. So there is no room for arrogance in leadership and leaders must listen. They must listen to others and to that quiet, still voice within.

Can leadership be taught? James Stoller has conducted a systematic review of leadership training.[3] On self-reported outcomes, leadership training provides consistent improvement. But objective evidence is hard to find. People who have done leadership training are more likely to go on to senior management roles. But this hardly proves cause and effect. Indeed, trainees who score highly on leadership qualities, such as emotional intelligence at base line, are more likely to gain senior management positions than those with lower scores. So, I would guess leadership training helps a bit, but most of the variance is explained by innate characteristics.

Richard Lilford, ARC WM Director


  1. Pronovost PJ, Ravitz AD, Stoll RA, Kennedy SB. Transforming Patient Safety: A Sector-wide Systems Approach. Doha, Qatar: World Innovation Summit for Health. 2015.
  2. Lilford RJ. Can We Do Without Heroism in Health Care? NIHR CLAHRC West Midlands News Blog. 20 March 2015.
  3. Stoller J. Developing Physician Leaders: Does It Work? BMJ Leader. 2020; 4(1): 1-5.

Policy Makers Should Use Evidence, But What Should They Do In an Evidence Vacuum?

There are two points of view concerning the obligations of policy makers when there is no direct evidence to guide them:

  1. It is wrong to take any action or intervene unless there is evidence to support your decision.
  2. A lack of evidence is neutral; it neither allows a decision-maker to intervene, nor does it sanction non-intervention.

Which is correct? Writing in the Lancet recently, Feng, et al. advocate the use of face masks in public to prevent the spread of COVID-19.[1] They say it is an asymmetrical choice; unlikely to do harm and may do much good by preventing the spread of the disease from pre-symptomatic people to people who are unaffected.

The ARC WM Director sides with the ‘lack of evidence is neutral’ principle. In my opinion the argument that a policy maker should not intervene in the absence of direct evidence is flawed for a series of linked reasons:

  1. The obligation to use evidence when it exists does not entail the requirement to fail to act when there is no such evidence.
  2. Further, there is never a circumstance in which no relevant evidence is available. Granted, there may be no direct, comparative evidence, but this is not tantamount to no evidence at all.
  3. There can be no automatic supposition that the expected value of a proposed intervention is less than that of the status quo. That is to say, the balance of benefits, harms and costs may go either way when there is no incontrovertible comparative evidence. It is then a matter of judgment as to the relative probabilities of benefit and cost that must sit alongside values in determining the best course of action.
  4. The theoretical basis for decisions under uncertainty derive from expected utility theory, which reconciles probability and values/preferences.[2][3] Under this axiomatic theory, probability refers to the decision maker’s degree of belief. 

Of course, nothing written above should be misinterpreted to imply either that good evidence should not inform decisions or that policy makers have no obligation to try to collect evidence to better inform future decisions. Indeed, the mandate to collect and use evidence is now enshrined in law in many states in the USA and was a manifesto commitment for the current UK government.

The US state of Oregon is well known for ground-breaking policies. Right back in 2003 it passed legislation requiring evidence-based procurement of clinical services in the field of addictions beginning 2005.[4] By 2011, 75% of addiction services commissioned by public money had to be evidence-based.[5] Likewise, nearby Washington state published a law in 2012 requiring policy makers to use empirically supported services for children’s health and welfare.[6] 

The British government has a tripartite structure for policy trials:

  1. Funding universities to carry out policy trials to inform the government’s programme. A good example is The Work and Health Unit (WHU) trial of an intervention to encourage small- and medium-sized enterprises (SMEs) to do more to promote employee health and welfare.[7] The WHU have sponsored ARC WM faculty, supported by the West Midland Combined Authority and RAND Europe, to carry out a four arm cluster randomised trial of 100 SMEs.[8]
  2. Funding external ‘what works’ centres, such as the Education Endowment Fund that was established in 2011 by The Sutton Trust with £125m funding from the Department for Education. This organisation has conducted a very large series of educational RCTs, in which England now leads of the world, as recently described in your news blog.[9]
  3. In-house trials conducted by individual government departments. I am a member of the Cabinet Office ‘What works trial advice panel’ that advises on in-house and externally commissioned trials whatworks.blog.gov.uk/trial-advice-panel/. HMRC has conducted the largest-ever RCT of self-assessment tax schemes, for example. The environment agency has recently conducted an RCT to tackle waste crime. I am currently part of a small group advising government departments on the design and evaluation of an intervention to help people who have recently become carers to adapt to their new circumstances without becoming depressed, and in some cases being able to continue to work.
  4. Funding academic centres, such as DHSC policy research centres.

ARC West Midlands will continue to promote local and international studies to provide evidence for evidence-based policy. We like to work very closely with policy makers and service managers so that our work addresses their immediate needs. We like to think of ourselves as pioneers in the fields of rapid response and opportunistic research, and can cite a number of on-going and recent examples, many covering the areas of public health and social care.

Richard Lilford, ARC WM Director; with thanks to Emily Power for contributions.


  1. Feng S, et al. Rational use of face masks in the COVID-19 pandemic. Lancet Resp Med. 2020.
  2. Thornton JG, Lilford RJ, Johnson N. Decision analysis in medicine. BMJ. 1992; 304: 1099-103. 
  3. Lilford RJ, Braunholtz D. The statistical basis of public policy: a paradigm shift is overdue. BMJ. 1996; 313: 603.
  4. Oregon Legislative Assembly. Human Service Issues: Health Care. Senate Bill 267. In: 2003 Summary of Legislation. Oregon: Legislative Fiscal Office; 2003. p59.
  5. Rieckmann T, et al. Employing Policy and Purchasing Levers to Increase the Use of Evidence-Based Practices in Community-Based Substance Abuse Treatment Settings: Reports from Single State Authorities. Eval Program Plann. 2011; 34(4): 366-74.
  6. Trupin E, Kerns S. Introduction to the Special Issue: Legislation Related to Children’s Evidence-Based Practice. Admin Policy Ment Health. 2017; 44(1): 1-5.
  7. Thrive at Work Wellbeing Programme Collaboration. Evaluation of a policy intervention to promote the health and wellbeing of workers in small and medium sized enterprises – a cluster randomised controlled trial. BMC Public Health. 2019; 19: 493.
  8. Lilford R, Russell S, Sutherland A. Thrive at Work Wellbeing Premium – Evaluation of a Cluster Randomised Controlled Trial. AEA RCT Registry. October 17 2018.
  9. Lilford RJ. UK Takes Over From the US as the Home of Trials of Educational Interventions. NIHR CLAHRC West Midlands News Blog. June 1 2018.

Organisational Consequences of Coronavirus, COVID-19

Health services around the world are scrambling to deal with COVID-19. The virus massively disrupts services. Modelling the spread of the disease is allowing governments to formulate public policy. Modelling patient flows – operations research – is helping health care organisations to manage the surge in demand – for example by releasing spare capacity and redeploying human and physical resources from elective to emergency care. Infectious diseases create a conundrum for the services since sick people need to attend facilities, but congregation of infected cases in health facilitates increases transmission of the infectious agent. So the trick is to visit facilities virtually (mobile [m] consulting) rather than physically. Enter ARC West Midlands.

We have a well-established programme of m-Health including (but not limited to):

  1. Our host hospital, University Hospitals Birmingham NHS Foundation Trust (UHBFT), is working with Babylon Health to enhance its virtual clinic capacity.
  2. Building on work of Gill Combes, Sarah Damery and James Ferguson, we plan a more extensive evaluation of the UHBFT m-Consulting programme that is expanding rapidly to cope with COVID-19.
  3. From her UK work on m-Consulting Frances Griffiths has quick guides freely available for specialist teams maintaining contact with their patients managing long-term health conditions at home.[1] She leads projects on m-Consulting in Africa and South Asia and, with her collaborators, is developing policy briefs underpinned by evidence-based principles to guide application.
  4. Melanie Calvert is an international authority on Patient-Reported Outcome Measures, which could help determine who should attend facilities and who should not. Modern aeroplane engines incorporate sensors that send signals to land-based workshops. This real-time monitoring, rather than just the schedule, determines the need for repairs. Likewise, patients in future will be monitored by their symptoms and test results, and these will be used to trigger visits to the clinic.

ARC WM members are planning a suite of studies in this country and abroad. The COVID-19 pandemic has precipitated a sharp shift towards m-Health / m-Consulting that is likely to prove indelible. In UK general practice all patients are now having phone consultations before any necessary face-to-face contact. Many practices have systems in place for video-conferencing. Last week, author FG took just ten minutes to learn how to use the secure and confidential system via her own phone so she could set eyes on an immune-compromised patient with infection, without asking the patient to leave her place of safety. Patients are learning rapidly too. The same patient could not get their sound to work so they used the landline too – but that patient is now urgently sorting out the sound.

We know that many other centres are also gearing up to study the organisational issues of epidemics generally, and m-Health specifically. M-Consulting warrants study – it is open to abuse/fraud, poor quality control and medical error, and can result in inequalities in care received. Experienced health professionals are good at mitigating these dangers,[2] but we need to understand how to systematise and embed m-Consulting to optimise health gains. We warmly invite other people in the UK and beyond to join our enterprise to share ideas and formulate research plans. In the meantime James Ferguson is leading an initiative to track use of m-Consulting to identify opportunities and barriers, and identify training needs for staff and patients. 

Richard Lilford, ARC WM Director; Frances Griffiths, Professor of Medicine in Society


  1. LYNC study team. LYNC Study Quick Reference e-book and Topic Guides. Warwick: University of Warwick; 2017.
  2. Griffiths F, Bryce C, Cave J, et al. Timely digital patient-clinician communication in specialist clinical services for young people: a mixed-methods study (the LYNC study). J Med Internet Res. 2017; 19(4): e102.