Far from being ‘stamp collecting’, as Ernest Rutherford is said to have claimed, classifying things is central to the scientific enterprise – imagine biology without the Linnaean taxonomy (multi-dimensional classification) of plants, animals and minerals (now plants, animals, fungi, protists, chromists, archaea and eubacteria kingdoms). Or medicine without its nosology. Classification has been the basis of all knowledge, and Rutherford was wrong – for example, astronomy is also built on a classification of stars and planets.
However, classification does not come free of problems. On the contrary, I call it the ‘central dilemma of epidemiology’. For a start, it is a human attempt to organise an underlying (latent) and often disorganised world. That is both its strength and its weakness. By organising the underlying complexity it allows abstractions to be made regarding the organising principles that underlie phenomena we observe about us. But the price we must pay is that we are often superimposing a classification over an underlying continuum. Thus, astronomical objects like Pluto can be ‘demoted’ and species change from one genus to another. Many health conditions do not fit neatly into one group or another, bearing features of both – think auto-immune disease and mental illness. Of course, any classification system is useful, insofar as it leads to new knowledge about underlying mechanisms and it is quite natural that the process is iterative, such that new classifications emerge – clades in biology rather than the original Linnaean family tree, for example.
But in the practice of epidemiology the issues of groups and subgroups can be a problem, not just because groups overlap, or misclassification may occur. A problem also arises in the interpretation of observed differences between groups. On the one hand, we do not want to miss important subgroup differences in the effect of an exposure on an outcome. On the other hand, we also want to avoid spurious associations. There are many examples, especially in the context of treatment trials, of subgroup associations that were subsequently over-turned.
The usual argument put forward to avoid spurious associations is that only subgroups specified in advance should be considered as a test of an hypothesis – all else is a fishing expedition, the results of which are to be down-weighted.
This is all very well but it just moves the problem from the analysis stage to the design stage. The corollaries are two-fold:
- Any subgroup must be selected on the basis of sound principles – there should be a theoretical model for an interaction between exposure and outcome. The statistical subgroup analysis is then designed to strengthen or weaken the credibility of the model. Note, the issue is the interaction between subgroup and outcome through the treatment effect. A direct effect on outcome is neither here nor there.
- Since precision is often low in a subgroup, and always lower than in the group as a whole, hypothesis tests are even less appropriate to subgroup effects than to the overall effect. Dichotomising the results into positive and null, and using this dichotomy to make a decision, is always stupid and is risible in a subgroup.
Some subgroups derive from an underlying, if latent, scale. Socio-economics groups, for example, or age. But others are irrevocably categorical. Gender, for example, or rural vs. urban residence. In the former situation – where the group is homologous (scalable) – a small subgroup is not a large problem, because the statistical model can look for a trend. The situation is more problematic when a small subgroup is not part of a homologous continuum. Any examination in the small subgroup will be imprecise in proportion to its size. Amalgamating it within a larger group makes sense on the basis that ‘it’s better to have a precise answer to a general problem, than an imprecise answer.’
But this logic breaks down if there is a sound theoretical reason to expect a different result in the small sub-group. Grouping trans people with male or female would be unsuitable for many purposes. In such a situation it is better to have an imprecise answer to a specific question.
Richard Lilford, ARC WM Director