
It is sometimes assumed that the trans population is so small that the impact on data accuracy of replacing sex with gender identity will be negligible, but that we currently have no reliable data on the size of the trans population either in the population as a whole or within sub-groups such as university students, particular occupations, gay, lesbian and bisexual people, and people with particular health conditions. Crucially, it is impossible to predict how the trans population may change over time. It is unlikely that the trans population will be evenly distributed, for example by age, sex and geography. This means that the effects on data reliability are likely to be greater at the sub-group level.
This can have extreme consequences for particular sub-groups. The trans population is growing rapidly, particularly among young females, and the reasons for this are not well understood, and require investigation.
Social statisticians are typically not only interested in one characteristic, but often examine the intersection between a number of characteristics, e.g. sex, ethnic group, social class, age, etc. One cannot assume that membership of one category has identical effects across sub-groups.
There are clear and obvious distortive effects that can result from small misclassifications of sex data. This can be illustrated this using three examples relevant to public policy.
These examples have been compiled by Professor Alice Sullivan, Professor Lindsey Patterson, Dr Colin Mills, Dr Amanda Gosling and Dr Nicola Williams and formed part of the expert evidence provided to the High Court for FAIR PLAY FOR WOMEN LTD v THE UK STATISTICS AUTHORITY (CO/715/2021).
Distortive Effects On Criminality Patterns
Males commit most crimes, particularly violent and sexual crimes – see Figure below, taken from the Ministry of Justice statistics for 2019 page 3.
This means that even if a small number of males are misclassified as female it can have large distorting effects on female statistics.
For example, HMPPS prison statistics show that approximately 13,000 male sex offenders and 125 female sex offenders are in prison in England and Wales. Data obtained from MOJ reveals that 74 of the male sex offenders identify as female. While misclassification of 74 males would have a negligible impact on the male total it would have a huge impact on the female figures by almost doubling them.
Such misguided data collection in turn leads to incorrect conclusions being drawn and can result in bad policy decisions. Just a small proportion of male offenders migrating into the female offender pool can give the impression that violent and sexual crimes are increasing among women.
This was most starkly demonstrated recently following a BBC radio documentary claiming ““Between 2015 and 2019, the numbers of reported cases of female-perpetrated child sexual abuse to police in England and Wales rose from 1,249 to 2,297 – an increase of 84%.“ Alarming newspaper headlines followed, such as “Number of female paedophiles nearly doubles”. Nowhere was it mentioned that police now record crimes according to gender identity rather than birth sex. It is possible that much of that increase could be due to males who had been misclassified as female. The point being we will never know, because birth sex was not recorded alongside gender identity.
Perceptions of change in the levels of male violence against women vs recorded violence by “women” can be used as a justification to remove sex segregation. Fair Play For Women reported its concerns on this to the recent Home Office consultation on violence against women and girls. This is already happening in facilities such as “gender-neutral” toilets and changing rooms, both in public facilities such as local authority-run sports centres and in workplaces. Those who wish to erode sex boundaries use data to suggest that men and women face equal risks, and therefore sex segregation is not necessary. However, the issue is that males and females present very different risks, and therefore we segregate all in order to reduce access for those few who are a threat to others.
Distortive Effects On Education Patterns
A current policy concern in the UK relates to increasing the proportion of women among higher education students who take degrees in physical science or engineering. There has been a slow growth in the number of women studying physical science or engineering, and in the share of women in these subjects.
The Higher Education Statistics Agency reports the following numbers of students in first year in these subject areas in 2016 and 2018.
However, plausible rates of males self-identifying as female could swamp these trends. What looks like increasing female participation could in fact be falling female participation, concealed by male self-identification as female.
Research in Sweden has indicated that, among people aged 22-29, 6.3% would like to ‘live as or be treated as someone of a different sex’. That rate is relevant to higher education students who are predominantly in that age range. In the above tables, if there were, in 2018, in reality the same number of women and men as in 2016, but 6.3% of males were mis-classified as female, then the number of students ostensibly classified as women would be 10,908 in physical science and 9,158 in engineering, greater than the number of women actually recorded in 2018. In other words, it is quite possible that plausible rates of mis-classification based on gender self-identification could account for the entire growth in female participation in these subjects of study, or even more.
Other studies have reported rates of identifying with a different sex to vary, and most show that for this age group the rate of males identifying as female is lower than the rate of females identifying as male. But even a rate of self-identification of males as female as low as 1.5% (instead of the 6.3% postulated here) would be enough to account for the whole natural growth between 2016 and 2018; and most studies show the male-to-female rate to be higher than this. In short, replacing sex with gender identity would be very likely to make it impossible to assess whether policies aimed at increasing female participation in these subjects were having an impact.
Distortive Effects On Employment Patterns
Many occupations are predominantly male. For example, fewer than 1% of carpenters and vehicle technicians are female. A small number of males misclassified as female will not affect the male totals but could make a big difference to female totals and obscure a problem with poor accessibility for women in some careers. This would be particularly true if low levels of female representation are combined with high levels of males who identify as women in particular labour market sectors, such as IT. This type of equality monitoring is used to identify when and how women are discriminated against. If analysts and interested parties are unable to identify the problem, it means that policy solutions will not materialise or those that do will be considerably less effective.
Similar concerns apply to the issue of gender pay gap data. Companies currently report pay gap data based on the self-declared sex of employees, and non-binary people are excluded from the data set. It is likely that the highest-paid transgender people will be late-transitioning males in established careers whereas the lowest-paid transgender people will be young females who identify as male or non-binary. This differential means that in some careers misclassification of highly paid males could distort and obscure the sex gap. Whilst some commentators argue that this employment discrimination is based on perceived sex or gender identity rather than birth sex, without the data on all of these characteristics it is difficult, if not impossible, to undertake adequate studies to determine what drives such discrimination.
Census data is important in allowing us to track changes over time, including in the labour market, such as sex segregation by sector, commuting times and hours of work. Census data is ideal for this work as the sample size is large enough to be broken down into particular occupations or industries for each region and for each age. The census is unique precisely in being a census, covering the entire population, rather than a survey, which covers only a fraction of the population. No other source has sufficient data to be broken down in this way, particularly at a local area level, the cell sizes (numbers in each group) become too small to be reliable.
Consider the situation of a researcher interested in looking at how the segregation of women into different occupations differs across regions. The measure of segregation used is the well-known Duncan Segregation Index. Data is from the 2001 census using the series “CS033” extracted using the NOMIS app.
They can compute an upper bound to the segregation index by moving 1% of people from the under-represented to the over-represented group within each occupation. A lower bound can be computed by doing the reverse. [see figure below kindly generated by Dr Amanda Gosling, Kent University]
While clear differences between London and the rest of the country emerge, the uncertainty levels for all regions bar London are unacceptably high. That is to say, the differences between those regions are smaller than the uncertainty in each number that arises from a 1% misclassification of sex. This means we cannot know which regions have more or less sex segregation in employment because the range for each region is so wide compared to the differences between them.
Not being able to show a difference is not the same as showing that there is no difference. This clearly demonstrates how even small errors in variables can have large effects on our ability to understand the data.