Category Archives: Director’s Blog

To Subgroup or Not to Subgroup?

Far from being ‘stamp collecting’, as Ernest Rutherford is said to have claimed, classifying things is central to the scientific enterprise – imagine biology without the Linnaean taxonomy (multi-dimensional classification) of plants, animals and minerals (now plants, animals, fungi, protists, chromists, archaea and eubacteria kingdoms). Or medicine without its nosology. Classification has been the basis of all knowledge, and Rutherford was wrong – for example, astronomy is also built on a classification of stars and planets.

However, classification does not come free of problems. On the contrary, I call it the ‘central dilemma of epidemiology’. For a start, it is a human attempt to organise an underlying (latent) and often disorganised world. That is both its strength and its weakness. By organising the underlying complexity it allows abstractions to be made regarding the organising principles that underlie phenomena we observe about us. But the price we must pay is that we are often superimposing a classification over an underlying continuum. Thus, astronomical objects like Pluto can be ‘demoted’ and species change from one genus to another. Many health conditions do not fit neatly into one group or another, bearing features of both – think auto-immune disease and mental illness. Of course, any classification system is useful, insofar as it leads to new knowledge about underlying mechanisms and it is quite natural that the process is iterative, such that new classifications emerge – clades in biology rather than the original Linnaean family tree, for example.

But in the practice of epidemiology the issues of groups and subgroups can be a problem, not just because groups overlap, or misclassification may occur. A problem also arises in the interpretation of observed differences between groups. On the one hand, we do not want to miss important subgroup differences in the effect of an exposure on an outcome. On the other hand, we also want to avoid spurious associations. There are many examples, especially in the context of treatment trials, of subgroup associations that were subsequently over-turned.

The usual argument put forward to avoid spurious associations is that only subgroups specified in advance should be considered as a test of an hypothesis – all else is a fishing expedition, the results of which are to be down-weighted.

This is all very well but it just moves the problem from the analysis stage to the design stage. The corollaries are two-fold:

  1. Any subgroup must be selected on the basis of sound principles – there should be a theoretical model for an interaction between exposure and outcome. The statistical subgroup analysis is then designed to strengthen or weaken the credibility of the model. Note, the issue is the interaction between subgroup and outcome through the treatment effect. A direct effect on outcome is neither here nor there.
  2. Since precision is often low in a subgroup, and always lower than in the group as a whole, hypothesis tests are even less appropriate to subgroup effects than to the overall effect. Dichotomising the results into positive and null, and using this dichotomy to make a decision, is always stupid and is risible in a subgroup.

Some subgroups derive from an underlying, if latent, scale. Socio-economics groups, for example, or age. But others are irrevocably categorical. Gender, for example, or rural vs. urban residence. In the former situation – where the group is homologous (scalable) – a small subgroup is not a large problem, because the statistical model can look for a trend. The situation is more problematic when a small subgroup is not part of a homologous continuum. Any examination in the small subgroup will be imprecise in proportion to its size. Amalgamating it within a larger group makes sense on the basis that ‘it’s better to have a precise answer to a general problem, than an imprecise answer.’

But this logic breaks down if there is a sound theoretical reason to expect a different result in the small sub-group. Grouping trans people with male or female would be unsuitable for many purposes. In such a situation it is better to have an imprecise answer to a specific question.

Richard Lilford, ARC WM Director


Use of Causal Diagrams to Inform the Analysis of Observational Studies

Observational studies usually involve some sort of multi-variable analysis. To make sense of the association between an explanatory variable (E) and an outcome (O), it is necessary to control for confounders – age for example in clinical studies. A confounder (C) is a variable that is associated with both E and O. Indeed it is causal of E and O as shown by the direction of arrows in Figure 1.

Fig 1. Causal Diagram for a Confounder

A common error is to mistake a confounder for a mediator. If the variable lies on the causal pathway between E and O, then it is a Mediator – M in Figure 2.

Fig 2. Causal Diagram to Distinguish Between a Confounder (C) and a Mediator (M)

Failure to make this distinction, and to adjust for M, will reduce or remove the effect of E on the outcome. In a study of the effect of money spent on tobacco on lung cancer, it would be self-defeating to adjust for smoking! If we are interested in decomposing different causal pathways, then we should adapt the multivariable analysis to examine how much of the effect of E or O is explained by the putative mediator (M in Figure 2) – a structural equation model or ‘mediator’ analysis.

There are some issues to consider:

  1. It may not be possible to say for certain whether a variable is a mediator or confounder and some variables may be both. Then try the analysis three ways: omit it, treat it as a confounder, or treat it as a mediator.
  2. It is hard to know which variables to include as confounders. A dataset was sent for analysis by 29 different teams of statisticians.[1] They came up with different results that varied wildly. This was because they adjusted for different combinations of variables. The corollary is that choice of variables should not be left to statisticians – it turns on causal theory that distinguishes between variables that are likely to have arrows pointing from E and O via M, and those pointing from C to both E and O (Figure 2). Context matters!
  3. There may be an interaction between variables, such that the causal effect of one variable on E or O is amplified or attenuated in the presence of another. Given four variables, each with four ‘levels’, yields 256 possible first order interactions. So, again, theory is needed to determine which variables to include in such interaction tests.

A variable may exist that is an independent cause of C or M (let’s call these C* and M*), as in Figure 3. There is no reason to adjust for these variables. Likewise, do not adjust for any variable that ‘precedes’ E, as also shown in Figure 3.

Fig 3. Variables That Cause Change in Other Variables

In this example, C* and M* are not causally linked to O, except through C and M respectively. But a situation may occur where such a link is possible. It is well known that maternal smoking is causally linked to both low birth-weight and to neonatal deaths, as per Figure 4. The theory is that smoking is toxic and leads to both a small baby and, via that pathway and other pathways, leads to neonatal death.

Fig 4. Causal Pathway for Smoking and Neonatal Deaths

If this analysis is conducted controlling for ‘small baby’, then smoking is associated with lower mortality – it appears protective. The obvious fault was to control for a variable on the causal pathway, as per Figure 2. But this could explain why the association may be reduced, but not reversed.

The explanation for the reversal lies in a putative third variable (perhaps a ‘genetic’ defect, G), which predisposes to both a small baby and neonatal death (Figure 5). Note, that both E and G collide on M, and such a scenario leads to ‘collider bias’ – by controlling for one source of bias, the door is opened to another. It is well known that there may be unobserved (‘lurking’) confounders in any association. The same applies, of course, to a variable that might completely alter the meaning of an association once one has conditioned on another variable.

Fig 5. Collider Bias

These analyses show that conducting a multivariable analysis is not, or rather should never be, an entirely data-driven / empirical exercise. Choices have to be made, such that the statistical model informs on, but does not determine, the causal model. For a brilliant example of extensive causal chains involving confounders, colliders and mediators, see an example from Andrew Forbes and colleagues.[2]

Note, we are not arguing against adjustment per se. It is an essential part of the analysis. We argue against adjusting without reference to a causal model.

Richard Lilford, ARC WM Director; Sam Watson, Senior Lecturer [With thanks to Peter Diggle (Lancaster University & Health Data Research UK) for comments.]


  1. Silberzahn R, et al. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Adv Method Pract Psychological Science. 2018; 1(3).
  2. Williamson EJ, et al. Introduction to causal diagrams for confounder selection. Respirology. 2014; 19(3): 303-11.

The Land War in the Fight Against COVID-19

Gone are the days of thinking there is a quick fix to the COVID-19 pandemic. Another country-wide lockdown would reduce COVID-19 infection, but at the same time would damage the economy and pose a threat to other long-term health conditions, with disproportionate effects on the more disadvantaged groups in society. The Great Barrington Declaration – aiming for herd immunity while sequestering high-risk people – does not bear close examination.[1] Vaccination is not an automatic get out of jail card – we do not yet know when vaccination will be available at the required volume, nor what degree of protection it will confer. So, this is the land war. We must work on supply chains, procedures, detection and contact tracing, getting ever slicker at the operation. Personal protection, social distancing and graded lockdowns can all play a part, but only if they are accepted by the general public, who deserve clear explanations of when, where and why unwelcome restrictions will be imposed and what these restrictions are intended to achieve.

While central government has an obvious role to play, it has become clear that the battle must go local; and the more local the better. The risk of being hospitalised with COVID-19 in Birmingham varies dramatically across the various electoral wards, with the seven-day rolling rate of new cases (for week ending 14 October 2020) ranging from 43.8 per 100,000 in Nechells, to 825.8 in Selly Oak.[2] So, supported by the MRC, NIHR ARC West Midlands and our host hospital (University Hospitals Birmingham NHS Foundation Trust) we are developing a computer application to track the evolving pattern of the COVID-19 pandemic. We have developed software that uses geostatistical models to identify “hot spots”, however one defines them, across a broad space such as an urban conurbation or a country. Within such a space we identify localities at whatever scale is relevant for local decision-making and that the data can support. We can map rates of infection per unit of population in real time on these maps to show the current state of the epidemic and its direction of travel (see Example). These maps can direct decision-makers to specific localities where incidence is increasing rapidly and hence where urgent action is needed.

But there is a problem with policy action directed at small areas and particular communities – dictatorial edicts are likely to provoke resentment rather than effective action, especially when carried out at a very local level. It is one thing to place restrictions across a whole country or even a large city, but quite another to try to lockdown an area such as Lady Pool in Birmingham or Chapel Town in Leeds. Indeed, the disease has highest incidence in BAME communities who may feel victimised or disenfranchised. Already only 18% of people fully comply with UK regulations regarding self-isolation.[3] So here we come to the second use of our application and the maps it produces.

We think that policy-makers should increasingly turn to local communities and ask them to be the architects, not recipients, of policy. In essence we are arguing for an ‘assets-based’ or ‘participatory’ approach based on ‘co-invention’. And here our application can help by providing scientific data at a local level in a form that can be easily assimilated. We are arguing at a local level for the type of thing that Prof Chris Witty used at a national level in his Downing Street presentation with the Prime Minister and Chancellor (12 October 2020). There is evidence that populations relate well to local maps and they are sometimes used in qualitative research as a method to promote discussion among people.[4] The approach we are advocating here, of high-risk spatio-temporal identification, followed by case-area targeted intervention, has proven effective in limiting the spread of cholera outbreaks,[5] and we advocate a similar approach with respect to the COVID-19 pandemic.

We would be pleased to hear from news blog readers regarding:

  1. Your opinions and advice.
  2. Whether you would like to hear more or use the application when it is developed.
  3. Whether you have examples of similar initiatives elsewhere in the world.
  4. Whether you would like to collaborate.

You can contact us at

Richard Lilford, ARC WM Director; Sam Watson, Senior Lecturer; Peter Diggle, Distinguished Professor at Lancaster University

Example of Real-Time Surveillance of COVID-19

For this example we have aggregated the results to MSOA (middle-layer Super Output Area) level across the catchment area of University Hospitals Birmingham NHS Foundation Trust, although we have retained other areas of Birmingham to make the boundary of the city clear. One could aggregate to smaller or larger levels as needed. A case here is an admission to hospital for COVID-19.

We have produced these outputs as if we were working on March 26 2020 using data from the preceding two weeks. The first thing someone interested in tracking COVID-19 in the city might ask is what is the incidence of the disease that day?

There is a lot of variation across the different MSOAs, with one area standing out as being high (yellow area). The variation here could be explained by differences in demographics or socioeconomic status, and we might want to ask whether any differences are for unexpected reasons. We can break down the incidence into
different components:


  • Expected is the number of cases we would expect that day from each area based on the size of its population.
  • Observed shows the relative risk in each area associated with observable characteristics
    (age, ethnicity, and deprivation). For example, consider if the average incidence across the city were one case per 10,000 person-days. An area with a larger proportion of older residents would have a high risk; if this risk were double the average then it would have a relative risk of two.
  • Latent is the relative risks in each area due to unexplained factors or unobserved
    variables. Our area with more older people may have an expected incidence of two cases per 10,000 person-days (a ‘baseline’ of 1 per 10,000 person-days times a relative risk of two), but if we observe an average rate of four cases per 10,000 person-days, then there is an additional unexplained relative risk of 2.
  • Posterior SD indicates the predictive variance.

So based on these plots the area with high incidence in the North of Birmingham would appear to be higher than we would expect based on the observed variables by factor of 2 or 3. This may indicate the need for public health intervention. We might finally ask, how this compares to previous days?

The next plot shows the incidence rate ratio, which here is the ratio of incidence compared to seven days prior for each area. A value of one indicates no change, two a doubling, and so forth. One can clearly see that it is above one, i.e. it is increasing, city-wide. The greatest relative increases are centred on the area we identified as being of high concern.


  1. Alwan NA, et al. Scientific consensus on the COVID-19 pandemic: we need to act now. Lancet. 2020.
  2. Public Health England. Coronavirus (COVID-19) in the UK: Interactive Map. 19 October 2020.
  3. Smith LE, et al. Adherence to the test, trace and isolate system: results from a time series of 21 nationally representative surveys in the UK (the COVID-19 Rapid Survey of Adherence to Interventions and Responses [CORSAIR] study). MedRXiv. 2020. [Pre-print].
  4. Boschmann EE, Cubbon E. Sketch maps and qualitative GIS: Using cartographies of individual spatial narratives in geographic research. Professional Geographer. 2014;66(2):236-48.
  5. Ratnayake R, et al. Highly targeted spatiotemporal interventions against cholera epidemics, 2000-19: a scoping review. Lancet Infect Dis. 2020.

Recognising the rising tide in service delivery and health systems research

With rising demands and finite resources, health systems worldwide are under constant financial pressure. The US has been at the extreme end of high spending, with health expenditure consisting of 17% of its GDP in 2017 – compared with 9.8% for the UK and 8.7% for the average of the OECD countries (OECD).[1] Therefore, the imperative of containing healthcare cost is mounting in the US. Under the Affordable Care Act (ACA), alternative payment models (often known as value-based payments) have been widely introduced to replace the fee-for-service model.

A recent article in JAMA highlighted a paradox,[2] in which an apparent plateau in overall healthcare expenditure (at around 18% of US GDP) is contrasted with lack of significant success reported in individual evaluations of these alternative payment models. Why has health spending as a proportion of GDP plateaued when the interventions to reduce spending have been ineffective in doing so? The authors ruled out the explanation that the growth in GDP has outpaced the growth of health expenditures as the latter seems to be genuinely flattening. So how can this discrepancy be reconciled?

The authors offered three explanations:

  1. Anticipation of ACA-driven expansion of alternative payment models may have induced changes in the psychology and practice of clinicians and health care organisations, leading to curbs on spending irrespective of the introduction of alternative payment models.
  2. Primed by the above change in mindset, clinicians and health care organisations may have been influenced by their peers and emulate their practice. This would cause a wider spread of the change beyond the institutions where the alternative payment modelled were first introduced and evaluated (e.g. from within the Medicaid system to those covered by commercial insurers).
  3. Simultaneous introduction of a large number of alternative models in different places may have led to contamination of control groups in individual evaluations, where the control group chosen in one evaluation may be subject to the introduction of another alternative payment model.

Taken in the round, these explanations suggest a secular trend of system-wide changes (in this case cost containment), which may take various forms and be achieved through different means, but which are triggered by heightened awareness of the same issue and shared social pressure to tackle it across the board – what we have described as the ‘rising tide phenomenon’.[3] The phenomenon is by no mean a rare occurrence in health services and systems research and so is well worth considering when a null finding is observed in a controlled study. The corollary is that when there is a rising tide, null findings do not disprove the potential effectiveness of the intervention being evaluated. A more nuanced interpretation taking into account the secular trend is required, as the authors of the aforementioned paper did.

Yen-Fu Chen, Associate Professor; Richard Lilford, ARC WM Director


  1. Organisation for Economic Co-operation and Development. Health. 2020. Available at:
  2. Navathe AS, Boyle CW, Emanuel EJ. Alternative Payment Models—Victims of Their Own Success? JAMA. 2020; 324(3):237-8.
  3. Chen Y-F, Hemming K, Stevens AJ, Lilford RJ. Secular trends and evaluation of complex interventions: the rising tide phenomenon. BMJ Qual Saf. 2016; 25(5): 303-10.

The Holy Grail of Quality Measurement

Writing in JAMA, Austin and Kachalia argue for automation of quality measurements.[1] We ourselves have argued that the proliferation of routine quality measures is getting out of hand.[2]

The authors argue, as we have argued, that using quality measures to incentivise organisations is a blunt tool, subject to gaming. Far better, is to use quality measures in real-time to prompt doctors to provide high quality care.

In fact, this is what computerised decision support offers. There is considerable empirical support for use of this type of decision tool. Working with Prof Aziz Sheikh and colleagues NIHR ARC West Midlands has investigated decision support for prescribing [3] and we are now investigating its use in antibiotic stewardship.[4] We are entirely in support of the use of decision support to improve care in real-time.

However, we question the idea that the majority of healthcare can be guided by online decision support. Working with Prof Timothy Hofer in Michigan, ARC WM co-investigators have shown that the measurement of the quality of hospital care is extremely unreliable.[5] Kappa measures of agreement between reviewers were about 20%. This means that seven reviewers would be needed for each case note, to achieve a reliability of 80%.

That is to say, that for much of medical care, there is no agreed standard. Truly, the majority of medical care is more art than science.

We think that the time has arrived to abandon hubristic notions about standardising and quality assuring the generality of clinical care. Medicine is not like aviation. Commercial aviation is almost entirely computerised. Emergencies aside, the whole process can be guided algorithmically. Our paper in Milbank Quarterly, shows quite clearly that this is not the case for medicine.[5]

Working with Prof Julian Bion, the ARC WM Director had an opportunity to audit numerous case notes from patients with sepsis.[6] The idea was to observe quality of care against a package of evidence-based criteria. Many of these criteria was based on actions that should be carried out within a specified time from diagnosis. The exercise proved almost impossible, since the point of diagnosis was ephemeral. In most cases there was no clear point to start the clock and the very diagnosis of sepsis had to be reverse-engineered from the time at which a sepsis-associated action took place! This exercise provided eloquent testimony to the judgemental, rather than rules-based, nature of much medical practice. We should use algorithmic decision support where clear rules exist, but we must stop pretending that the whole of medicine can be guided in this way. Perhaps we should just stand back a little, and accept some of the imperfections in our systems. Like a harm-free world, perfection will always lie beyond our grasp.[7]

Richard Lilford, ARC WM Director


  1. Austin JM, Kachalia A. The State of Health Care Quality Measurement in the Era of COVID-19. The Importance of Doing Better. JAMA. 2020.
  2. Lilford RJ. Measuring Quality of Care. NIHR CLAHRC West Midlands News Blog. 21 April 2017.
  3. Yao GL, Novielli N, Manaseki-Holland S, et al. Evaluation of a predevelopment service delivery intervention: an application to improve clinical handovers. BMJ Qual Saf. 2012;21:i29-38.
  4. Usher Institute. ePrescribing-Based Antimicrobial Stewardship. 2020.
  5. Manaseki-Holland S, Lilford RJ, Te AP, et al. Ranking Hospitals Based on Preventable Hospital Death Rates: A Systematic Review With Implications for Both Direct Measurement and Indirect Measurement Through Standardized Mortality Rates. Milbank Q. 2019;97(1):228-84. 
  6. Lord JM, Midwinter MJ, Chen YF, et al. The systemic immune response to trauma: an overview of pathophysiology and treatment. Lancet. 2014;384(9952):1455-65.
  7. Meddings J, Saint S, Lilford RJ, Hofer TP. Targeting Zero Harm: A Stretch Goal That Risks Breaking the Spring. NEJM Catal Innov Care Del. 2020; 1(4).

Leadership, Heroism and Heroic Leadership

Some years ago, two outstanding academic leaders, Peter Pronovost and Lord Ara Darzi, wrote an article in which they argued for an end of heroism in medicine.[1] I responded in the pages of our previous CLAHRC WM News Blog along the lines of, be careful what you wish for.[2]

I was reminded of this interchange by the evening celebration of health workers seen across many countries of the world during the COVID-19 pandemic. What were members of the public doing, if not allowing health service to feel just a little heroic? Quite right too, health staff risk their lives on an almost daily basis and have a higher mortality compared to other people of similar ages.

One recent morning I heard a poem about nurses on the radio. The poet was making the point that nursing is not just another profession. I have been a doctor and a patient and I can tell you that from my perspective being a doctor or a nurse is certainly not just another profession. Yes, it is a calling, even if the call comes from inside.

Doctors and nurses put their lives on the line when necessary. They will work all night. They will stay on at the end of the afternoon if they still have patients to see. These are the things we do, we like to do them, and we are admired for doing them. We put ourselves out and we go the extra mile. The patient is not a client, or rather they are privileged clients.

But let us also be aware of the dangers of heroism that might turn self-indulgent and become almost narcissistic. Leadership involves determining a course of action, often an unpopular or dangerous one, and then carrying people with you. Leadership can be demonstrated anywhere within an organisation. My business colleagues talk about dispersed leadership. I have both led people senior to me and I have been led by people junior to me. So there is no room for arrogance in leadership and leaders must listen. They must listen to others and to that quiet, still voice within.

Can leadership be taught? James Stoller has conducted a systematic review of leadership training.[3] On self-reported outcomes, leadership training provides consistent improvement. But objective evidence is hard to find. People who have done leadership training are more likely to go on to senior management roles. But this hardly proves cause and effect. Indeed, trainees who score highly on leadership qualities, such as emotional intelligence at base line, are more likely to gain senior management positions than those with lower scores. So, I would guess leadership training helps a bit, but most of the variance is explained by innate characteristics.

Richard Lilford, ARC WM Director


  1. Pronovost PJ, Ravitz AD, Stoll RA, Kennedy SB. Transforming Patient Safety: A Sector-wide Systems Approach. Doha, Qatar: World Innovation Summit for Health. 2015.
  2. Lilford RJ. Can We Do Without Heroism in Health Care? NIHR CLAHRC West Midlands News Blog. 20 March 2015.
  3. Stoller J. Developing Physician Leaders: Does It Work? BMJ Leader. 2020; 4(1): 1-5.

Policy Makers Should Use Evidence, But What Should They Do In an Evidence Vacuum?

There are two points of view concerning the obligations of policy makers when there is no direct evidence to guide them:

  1. It is wrong to take any action or intervene unless there is evidence to support your decision.
  2. A lack of evidence is neutral; it neither allows a decision-maker to intervene, nor does it sanction non-intervention.

Which is correct? Writing in the Lancet recently, Feng, et al. advocate the use of face masks in public to prevent the spread of COVID-19.[1] They say it is an asymmetrical choice; unlikely to do harm and may do much good by preventing the spread of the disease from pre-symptomatic people to people who are unaffected.

The ARC WM Director sides with the ‘lack of evidence is neutral’ principle. In my opinion the argument that a policy maker should not intervene in the absence of direct evidence is flawed for a series of linked reasons:

  1. The obligation to use evidence when it exists does not entail the requirement to fail to act when there is no such evidence.
  2. Further, there is never a circumstance in which no relevant evidence is available. Granted, there may be no direct, comparative evidence, but this is not tantamount to no evidence at all.
  3. There can be no automatic supposition that the expected value of a proposed intervention is less than that of the status quo. That is to say, the balance of benefits, harms and costs may go either way when there is no incontrovertible comparative evidence. It is then a matter of judgment as to the relative probabilities of benefit and cost that must sit alongside values in determining the best course of action.
  4. The theoretical basis for decisions under uncertainty derive from expected utility theory, which reconciles probability and values/preferences.[2][3] Under this axiomatic theory, probability refers to the decision maker’s degree of belief. 

Of course, nothing written above should be misinterpreted to imply either that good evidence should not inform decisions or that policy makers have no obligation to try to collect evidence to better inform future decisions. Indeed, the mandate to collect and use evidence is now enshrined in law in many states in the USA and was a manifesto commitment for the current UK government.

The US state of Oregon is well known for ground-breaking policies. Right back in 2003 it passed legislation requiring evidence-based procurement of clinical services in the field of addictions beginning 2005.[4] By 2011, 75% of addiction services commissioned by public money had to be evidence-based.[5] Likewise, nearby Washington state published a law in 2012 requiring policy makers to use empirically supported services for children’s health and welfare.[6] 

The British government has a tripartite structure for policy trials:

  1. Funding universities to carry out policy trials to inform the government’s programme. A good example is The Work and Health Unit (WHU) trial of an intervention to encourage small- and medium-sized enterprises (SMEs) to do more to promote employee health and welfare.[7] The WHU have sponsored ARC WM faculty, supported by the West Midland Combined Authority and RAND Europe, to carry out a four arm cluster randomised trial of 100 SMEs.[8]
  2. Funding external ‘what works’ centres, such as the Education Endowment Fund that was established in 2011 by The Sutton Trust with £125m funding from the Department for Education. This organisation has conducted a very large series of educational RCTs, in which England now leads of the world, as recently described in your news blog.[9]
  3. In-house trials conducted by individual government departments. I am a member of the Cabinet Office ‘What works trial advice panel’ that advises on in-house and externally commissioned trials HMRC has conducted the largest-ever RCT of self-assessment tax schemes, for example. The environment agency has recently conducted an RCT to tackle waste crime. I am currently part of a small group advising government departments on the design and evaluation of an intervention to help people who have recently become carers to adapt to their new circumstances without becoming depressed, and in some cases being able to continue to work.
  4. Funding academic centres, such as DHSC policy research centres.

ARC West Midlands will continue to promote local and international studies to provide evidence for evidence-based policy. We like to work very closely with policy makers and service managers so that our work addresses their immediate needs. We like to think of ourselves as pioneers in the fields of rapid response and opportunistic research, and can cite a number of on-going and recent examples, many covering the areas of public health and social care.

Richard Lilford, ARC WM Director; with thanks to Emily Power for contributions.


  1. Feng S, et al. Rational use of face masks in the COVID-19 pandemic. Lancet Resp Med. 2020.
  2. Thornton JG, Lilford RJ, Johnson N. Decision analysis in medicine. BMJ. 1992; 304: 1099-103. 
  3. Lilford RJ, Braunholtz D. The statistical basis of public policy: a paradigm shift is overdue. BMJ. 1996; 313: 603.
  4. Oregon Legislative Assembly. Human Service Issues: Health Care. Senate Bill 267. In: 2003 Summary of Legislation. Oregon: Legislative Fiscal Office; 2003. p59.
  5. Rieckmann T, et al. Employing Policy and Purchasing Levers to Increase the Use of Evidence-Based Practices in Community-Based Substance Abuse Treatment Settings: Reports from Single State Authorities. Eval Program Plann. 2011; 34(4): 366-74.
  6. Trupin E, Kerns S. Introduction to the Special Issue: Legislation Related to Children’s Evidence-Based Practice. Admin Policy Ment Health. 2017; 44(1): 1-5.
  7. Thrive at Work Wellbeing Programme Collaboration. Evaluation of a policy intervention to promote the health and wellbeing of workers in small and medium sized enterprises – a cluster randomised controlled trial. BMC Public Health. 2019; 19: 493.
  8. Lilford R, Russell S, Sutherland A. Thrive at Work Wellbeing Premium – Evaluation of a Cluster Randomised Controlled Trial. AEA RCT Registry. October 17 2018.
  9. Lilford RJ. UK Takes Over From the US as the Home of Trials of Educational Interventions. NIHR CLAHRC West Midlands News Blog. June 1 2018.

Organisational Consequences of Coronavirus, COVID-19

Health services around the world are scrambling to deal with COVID-19. The virus massively disrupts services. Modelling the spread of the disease is allowing governments to formulate public policy. Modelling patient flows – operations research – is helping health care organisations to manage the surge in demand – for example by releasing spare capacity and redeploying human and physical resources from elective to emergency care. Infectious diseases create a conundrum for the services since sick people need to attend facilities, but congregation of infected cases in health facilitates increases transmission of the infectious agent. So the trick is to visit facilities virtually (mobile [m] consulting) rather than physically. Enter ARC West Midlands.

We have a well-established programme of m-Health including (but not limited to):

  1. Our host hospital, University Hospitals Birmingham NHS Foundation Trust (UHBFT), is working with Babylon Health to enhance its virtual clinic capacity.
  2. Building on work of Gill Combes, Sarah Damery and James Ferguson, we plan a more extensive evaluation of the UHBFT m-Consulting programme that is expanding rapidly to cope with COVID-19.
  3. From her UK work on m-Consulting Frances Griffiths has quick guides freely available for specialist teams maintaining contact with their patients managing long-term health conditions at home.[1] She leads projects on m-Consulting in Africa and South Asia and, with her collaborators, is developing policy briefs underpinned by evidence-based principles to guide application.
  4. Melanie Calvert is an international authority on Patient-Reported Outcome Measures, which could help determine who should attend facilities and who should not. Modern aeroplane engines incorporate sensors that send signals to land-based workshops. This real-time monitoring, rather than just the schedule, determines the need for repairs. Likewise, patients in future will be monitored by their symptoms and test results, and these will be used to trigger visits to the clinic.

ARC WM members are planning a suite of studies in this country and abroad. The COVID-19 pandemic has precipitated a sharp shift towards m-Health / m-Consulting that is likely to prove indelible. In UK general practice all patients are now having phone consultations before any necessary face-to-face contact. Many practices have systems in place for video-conferencing. Last week, author FG took just ten minutes to learn how to use the secure and confidential system via her own phone so she could set eyes on an immune-compromised patient with infection, without asking the patient to leave her place of safety. Patients are learning rapidly too. The same patient could not get their sound to work so they used the landline too – but that patient is now urgently sorting out the sound.

We know that many other centres are also gearing up to study the organisational issues of epidemics generally, and m-Health specifically. M-Consulting warrants study – it is open to abuse/fraud, poor quality control and medical error, and can result in inequalities in care received. Experienced health professionals are good at mitigating these dangers,[2] but we need to understand how to systematise and embed m-Consulting to optimise health gains. We warmly invite other people in the UK and beyond to join our enterprise to share ideas and formulate research plans. In the meantime James Ferguson is leading an initiative to track use of m-Consulting to identify opportunities and barriers, and identify training needs for staff and patients. 

Richard Lilford, ARC WM Director; Frances Griffiths, Professor of Medicine in Society


  1. LYNC study team. LYNC Study Quick Reference e-book and Topic Guides. Warwick: University of Warwick; 2017.
  2. Griffiths F, Bryce C, Cave J, et al. Timely digital patient-clinician communication in specialist clinical services for young people: a mixed-methods study (the LYNC study). J Med Internet Res. 2017; 19(4): e102.

Guidelines on How to Change Services for the Better

Theory of Change

If you want to improve care at the front line against a standard (e.g. kindness to clients, implementing cancer treatment, etc.) then you have to intervene at the service level. The development of service interventions is stock in trade for service managers/clinicians; they are doing so all the time. But how should an intervention be developed? As you might expect this subject of how is an immense one, but there is broad agreement on the process, recently described by Wight et al., and detailed below.[1]

  1. Define and understand the problem.
  2. Identify things that might change.
  3. Come up with a causal change mechanism/theory of change.
  4. Identify how to deliver the change.
  5. Test and refine on a small scale.
  6. Roll out and evaluate (summative evaluation).

Well that’s pretty basic and fits well with the Medical Research Council guidance referred to in a previous CLAHRC West Midlands News Blog. [2]

Different Approaches

For a much more extensive discussion see a recent paper by Alicia O’Cathain, which discusses different approaches.[3] In fact the approaches are not hermetically sealed from each other and many have overlapping constraints. The emphasis, of course, varies. Few do not highlight the importance of involving service users in the development and design. No one thinks that an intervention should not be preceded by “diagnosis of the causes of a developed problem.” Piloting before widespread application is widely supported if not always adhered to. Some (intervention mapping for example) are more elaborate and formulaic than that. However, it is hard to insist a one-size-fits-all approach. Having an explicit theory does not increase the probability of success, but it does make it easier to explain the intervention to others.

Behavioural Psychology

One way to obtain change is to mandate certain behaviours and to enforce compliance. Such coercion is often justified, but in the grey area of healthcare in general, and medical care in particular, few activities are governed by hard rules. Mandating correct clinical diagnosis, for example, does not make a lot of sense. So we are into more subtle methods to change behaviour. 

Some interventions are truly straightforward and do not require conscious behaviour change- certain engineering solutions, such as forced function to prevent misconnecting anaesthetic gas pipes, for example. But most require those annoying creatures, human beings, to change their behaviours in some way. Perhaps the greatest single greatest contribution to providing a framework comes from the development of the trans-theoretical model [4] and its further distillation in the form of the COM-B model.[5] These models are built up from analysis and categorisation of the myriad preceding psychological theories that seek to explicate behaviour change. Of course, one way to obtain change is to mandate certain behaviours and to enforce compliance. Such coercion is often justified, but in the grey area of healthcare in general, and medical care in particular, few activities are governed by hard rules. Mandating correct clinical diagnosis, for example, does not make a lot of sense. So we are in to more subtle methods to change behaviour.

Thoughts from ARC WM

A recent article published by the Council for Allied Health Professions Research highlights Krysia Dziedzic’s top tips for implementation.[6] Krysia is part of our Long-term conditions theme and directs the Impact Accelerator Unit in the School of Primary, Community and Social Care at Keele University. Here I give my own tips for service change.

Some Frequently Flouted ‘Rules’ of Behaviour Change When Service Intervention are Designed and Implemented

Incentives (expectancy theory)Never use an incentive, positive or negative, when the people at whom it is targeted do not believe they can achieve it under their own volition.[7] [8]
Even if an intervention is targeted at the frontline of operations, intervene also at ‘higher’ levelsIn general, when intervening at the operational level, also activate higher levels, not only to liberate resource but also to create the right social environment in line with Social Expectancy Theory.[9] [10]
Political workDo not intervene when people are not expecting it and when it may change patterns of work, without first doing political work to ‘win hearts and minds’. People might not oppose what you are attempting, but you need active support. I think it is worth considering compensating the first generation of losers after Aneurin Bevan’s “I stuffed their mouths with gold” dictum.[11]
Be persistent, but also patientExpect prolonged resistance if skill substitution or material disruption of work is involved.[12] Elinor Ostrom’s emphasis on developing personal relations and providing lots of time for dialogue – cheap talk.[13] It also takes time for people in different roles to share the same intellectual map or ‘logics’.[12]
PilotingWhenever possible pilot interventions to iron out problems. If possible, alpha test them before they are rolled out. Incremental change is generally better than re-engineering business process, which involves greater risk than more incremental approaches.[14]
Involve service users in the design of interventions at all stagesCo-design not only makes sense, but is supported by experimental evidence.[15] [16] The ARC WM approach is to involve public contributors simultaneously in intervention design and evaluations.
Address multiple barriers to implementationInterventions are more likely to succeed if all material barriers are identified and addressed.[17] Frameworks, such as COM-B / trans-theoretical model can help identify ‘lurking’ barriers.
Seek risk-sharing agreements when purchasing equipmentEquipment often fails and repair can be very expensive because the vendor is in a monopoly position. Build in service contracts or even re-imbursement by hours of trouble-free service.
Do not overload the intervention description Be parsimonious by describing the essential features of a service intervention. Consider ‘essential’ and optional elements. Remember, if a compound intervention has n components, and the probability of successful implementation of each is p, then only pn will get the complete bundle.[18]
Encourage innovationMentor front-line staff to be the architects of their own destiny, rather than prescribe solutions – try to be an ‘invisible leader’.
Always read the previous literature concerning the proposed interventionFailure to do so is scientific and management malpractice. Yes, contexts vary, but not to the degree that systematic analysis of previous experience can be jettisoned.
EvaluationsConduct (and distinguish between) intra-mural (formative) and extra-mural (summative) evaluations. The former are necessary to identify unanticipated problems and probe the limits of what may be achieved.[19] [20] Intra-mural evaluations are an integral part of Plan-Do-Study-Act (PDSA) cycles, Total Quality Management (TQM), and so on.

Richard Lilford, ARC WM Director; Krysia Dziedzic,  Director of the Impact Accelerator Unit


  1. Wight D, Wimbush E, Jepson R, et al. Six steps in quality intervention development (6SQuID). J Epidemiol Community Health. 2016;70:520-525. 
  2. Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M. Developing and evaluating complex interventions: the new Medical Research Council guidance. BMJ. 2008; 337: a1655.
  3. O’Cathain A, Croot L, Duncan E, et al. Guidance on how to develop complex interventions to improve health and healthcare. BMJ Open. 2019;9:e029954.
  4. Prochaska JO, Velicer WF. The transtheoretical model of health behaviour change. Am J Health Promot. 1997; 12(1): 38-48.
  5. Michie S, van Stralen MM, West R. The behaviour change wheel: a new method for characterising and designing behaviour change interventions. Implement Sci. 2011;6:42.
  6. Swaithes L, Campbell L, Fowler-Davis S, Dziedzic K. Top Tips. Implementation for Impact. Council for Allied Health Professions Research. 2019.
  7. Lilford RJ. Financial Incentives for Providers of Health Care: The Baggage Handler and the Intensive Care Physician. NIHR CLAHRC West Midlands News Blog. 25 July 2014.
  8. Lilford RJ. Two Things to Remember About Human Nature When Designing Incentives. NIHR CLAHRC West Midlands News Blog. 27 January 2017.
  9. Lilford RJ. Monumental Study of Service Interventions to Drive up the Quality of Care in Low- and Middle- Income Countries. NIHR CLAHRC West Midlands News Blog. 19 October 2018.
  10. Ferlie E, & Shortell S. Improving the quality of health care in the United Kingdom and the United States: a framework for change. Milbank Quart. 2001; 79(2): 281-315.
  11. BBC News. Making Britain Better. 1 July 1998.
  12. Lilford RJ. How Theories Inform our Work in Service Delivery Practice and Research. NIHR CLAHRC West Midlands News Blog. 21 September 2018.
  13. Lilford RJ. Polycentric Organisations. NIHR CLAHRC West Midlands News Blog. 25 July 2014.
  14. Lilford RJ. Introducing Hospital IT Systems – Two Cautionary Tales. NIHR CLAHRC West Midlands News Blog. 4 August 2017.
  15. Lilford RJ, Skrybant M. Our CLAHRC’s Unique Approach to Public and Community Involvement Engagement and Participation (PCIEP). NIHR CLAHRC West Midlands News Blog. 24 August 2018.
  16. Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The Stepped Wedge Cluster Randomised Trial: Rationale, Design, Analysis, and Reporting. BMJ. 2015; 350: h391.
  17. Lilford RJ. It Really Is Possible to Intervene to Reduce Teenage Pregnancy. NIHR CLAHRC West Midlands News Blog. 14 November 2014.
  18. Resar R, Griffin FA, Haraden C, Nolan TW. Using Care Bundles to Improve Health Care Quality. IHI Innovation Series white paper. Cambridge, Massachusetts: Institute for Healthcare Improvement; 2012.
  19. Lilford RJ, Foster J, Pringle M. Evaluating eHealth: How to Make Evaluation More Methodologically Robust. PLoS Med. 2009; 6(11): e1000186.
  20. Lilford RJ. The MRC Framework for Complex Interventions – The Blind Spot. NIHR CLAHRC West Midlands News Blog. 7 June 2019.

Measuring Things That Are Not Themselves Directly Observable

Much of science concerns concepts not material entities. We talk easily and glibly about wealth, satisfaction, liberal democracy and metropolitan elites. But in science we need to quantify these types of thing. To paraphrase Galileo; if something is not measurable make it so!

The ARC WM Director was made very aware of definitional and measurement issues while attending the African Research Collaboration on Sepsis research meeting in Dar Es Salaam. The Collaboration is funded by an NIHR Global Health Group grant awarded to Jamie Rylance at Malawi Liverpool Wellcome Research Centre. The meeting covered many fascinating topics. One recurring theme was how to define sepsis. Since 1991, three international conferences have been held to “define sepsis” – the most recent consensus statement (2016) was published in JAMA.[1] 

Right off the bat in reading the literature there is a problem, as the challenge of the measurement task is often referred to as that of finding an operational definition or worse simply “a definition”.  This is a problem because referring to the measurement task as “defining sepsis” can obscure the fact that there is currently a well specified and seemingly widely accepted conceptual definition of sepsis from Sepsis-3, namely “Sepsis should be defined as life-threatening organ dysfunction caused by a dysregulated host response to infection.”[1]  But as noted in the same publication, “There are, as yet, no simple and unambiguous clinical criteria or biological, imaging, or laboratory features that uniquely identify a septic patient.”  So, to be clear, virtually all of the arguments and difficulties that have arisen after each consensus conference establishing a conceptual definition are in how to design a measurement procedure, including the selection of a population, a set of observable variables and the mathematical model that combines them. So this got us thinking about measurements of scientific constructs.

A clearly defined conceptual entity that is not directly observable is often referred to as a latent construct or variable.  Building on Bollen and Bauldry,[2] and Hand,[3] three scenarios are possible when defining a measurement procedure for a latent construct, such as ‘sepsis’:

  1. Where a measurable reference category or gold-standard for a latent construct exists, such as the molecular classification of intersex or the chemical classification of endocrine disorders. The reference category is then held to be the observable representation of the construct. Other potentially more easily measured observable features can be assessed directly as to how accurately and precisely they represent the construct through their relationship to the reference category.  
  1. Where theory is “poorly formulated” with regard to how the latent construct exerts its effects, some observable features can be combined in what Hand called a “pragmatic measurement”[3] procedure to produce a measurement that is useful not because you understand what is going on but only to the extent that the pragmatic measures have some ability to predict an outcome of interest, as is the case with the concept of socioeconomic status, the histological grading of tumours, or the APACHE score of acute illness severity. In the absence of a model causally relating the construct to the observed features, the combination of the features into an index can only be said to summarise the observable features rather than represent the underlying construct.  In turn, the index is actionable only because of its ability to predict.  Finally, as the index is only a summary of observable features, the components of such an index cannot be changed or left out with changing the nature of what is being measured.
  1. Where there is a well-specified formal conceptual definition the task is to identify a pool of exchangeable and observable features that theory would suggest are caused by the construct. By use of a statistical model that includes those observable features the latent variable that causes them can then be identified. Yet, the hypothesised causal relationship between the underlying construct and the observed effects requires a continuing effort to collect evidence supporting the argument that the observed effects are a valid representation of the underlying construct. The example here would be schizophrenia, where the American College of Psychiatrists definition has allowed the science to proceed. A latent social construct (‘this is a schizophrenic’) is hypothesised to predict the observed clinical manifestations that can be measured. This measurement model is itself a theory that remains open to revision or being abandoned entirely, but which still can be employed as a useful tool.

In our opinion the latter ‘third way’ is appropriate for ‘sepsis’. The conceptual definition is not, cannot be, perfect but it is based on broad consensus. Once the conceptual definition has crystallised, science can proceed to develop one or more measurement procedures.  These measurement procedures may well need to be refined or changed in different settings of care. The research may one day yield a reference standard reflecting basic mechanisms; possibly this point is within reach in the case of schizophrenia, where genome-wide association studies have yielded stunning findings.[4] We think this is the approach the sepsis field should follow. It is more profitable than devoting endless effort to attempting to find the holy grail of a reference standard for sepsis. It seems reasonable to accept the JAMA proposal for an operational measurement of the construct. While using it, continue to collect evidence that supports or refutes the theory represented in the measurement model.

Richard Lilford, ARC WM Director; Timothy Hofer, Professor of General Medicine


  1. Singer M, Deutschman CS, Seymour CW, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016; 315(8): 801-10.
  2. Bollen KA, & Bauldry S. Three Cs in Measurement Models: Causal Indicators, Composite Indicators, and Covariates. Psychol Methods. 2011; 16(3): 265-84.
  3. Hand DJ. Measurement theory and practice: the world through quantification. London: Wiley-Blackwell; 2004.
  4. Lilford RJ. Psychiatry Comes of Age. NIHR CLAHRC West Midlands News Blog. 11 March 2016.