Computers which magnify our prejudices

As AI spreads, this will become an increasingly important and controversial issue:

For one British university, what began as a time-saving exercise ended in disgrace when a computer model set up to streamline its admissions process exposed – and then exacerbated – gender and racial discrimination.

As detailed here in the British Medical Journal, staff at St George’s Hospital Medical School decided to write an algorithm that would automate the first round of its admissions process. The formulae used historical patterns in the characteristics of candidates whose applications were traditionally rejected to filter out new candidates whose profiles matched those of the least successful applicants.

By 1979 the list of candidates selected by the algorithms was a 90-95% match for those chosen by the selection panel, and in 1982 it was decided that the whole initial stage of the admissions process would be handled by the model. Candidates were assigned a score without their applications having passed a single human pair of eyes, and this score was used to determine whether or not they would be interviewed.

Quite aside from the obvious concerns that a student would have upon finding out a computer was rejecting their application, a more disturbing discovery was made. The admissions data that was used to define the model’s outputs showed bias against females and people with non-European-looking names.

The truth was discovered by two professors at St George’s, and the university co-operated fully with an inquiry by the Commission for Racial Equality, both taking steps to ensure the same would not happen again and contacting applicants who had been unfairly screened out, in some cases even offering them a place.

There is more here, and I thank the excellent Mark Thorson for the pointer.


The Steve Sailer Effect.

Actually there is already a name for part of it, confirmation bias. You don't look at who you have accepted to prove that you have the correct acceptance criteria. You look at who you didn't accept. The rest is the education conundrum that people in education are revealed to be completely unaware of because they actually believe "people we bestow with our acceptance and grace with our education are by definition successes."

I think the issue is that the prevailing faith amongst our elites has as a central tenet something that contradicts nature. That is the belief humans, with the proper education, can rise above the natural limits of our biology. In this case, humans like *their* kind and dislike the *not their kind* by default. This sort of behavior is easy to understand as an evolutionary advantage. In fact, It is surprising that they don't see how bias is baked into the cake, which tells you the power of belief to blinker the adherent.

It's a bit rich to call this 'confirmation bias' when the PDF provides zero evidence that the 'discrimination' was not justified.

Notice, for example, that the author, despite being in high dudgeon and ex cathedra mode, offers no real evidence like scores or graduation rates improving after removing the biased program - which is what *should* have happened if the program were filtering out good candidates. Did they fail to look for such evidence? Then that is the literal definition of confirmation bias. Did they look and not find it? Then they are being actively misleading and dishonest.

Quite aside from the obvious concerns that a student would have upon finding out a computer was rejecting their application

Happens all the time with job applications. Might as well already train students to expect it.

For that you'd have to inform them. But employment algorithms are probably not so rigidly based on prior applicants. And if they are then they are wrong. I suspect they are not as wrong because businesses don't have the same incentives as status credentialing and signaling institutions (with a minor in education).

All this goes away, by the way, if there is unlimited capacity in education (MRU!) and a pathway from nursing to doctoring post-grad.

I think one way of summarizing this is that they forgot to program the computer to be politically correct and toss the idea of a meritocracy out the window.

I was impressed that someone was using algorithmic, computer screening back in 1970's!

Basing application success on application success? Y'er doin' it wrong.

I'd call the computer algorithm extremely useful, although they weren't using it optimally. Building an algorithm that is designed to imitate the admissions officers' decisions is a great way to find out what biases the admissions officers have, since it turns their opaque act of "human judgment" into a relatively transparent & straightforward process of multiplying and adding which other people can then evaluate. (Unfortunately, it appears that no one looked at this algorithm very closely for the first several years.)

It is also probably easier to eliminate bias from a computer algorithm than from human judgment. In theory, all you have to do is remove a couple variables from the equation, rather than changing people's patterns of thinking by somehow training them to be non-prejudiced. (Although in practice making an algorithm non-prejudiced could be somewhat complicated, since it may be using other variables as a proxy for race or gender. Still, probably a lot easier than training a bunch of humans.)

Apparently you aren't familiar with the research on statistical prediction rules (e.g., Dawes, 1979). Simple linear prediction models have been found do better than experts at predictions in many different domains, and to pretty much never do worse than the experts. This is true even when the linear model is designed to imitate the expert (fit to match the expert's judgments, rather than the actual outcome data).

Dawes, 1979. The Simple Beauty of Improper Linear Models in Decision Making

Take note, I did not say the computers weren't doing exactly what they were programmed to do.

"Basing algorithmically calculated application success on past manual application success" is more like it.

That is what I question. A human will factor in "oops" we don't have enough aggregate females even subconsciously. A computer will just look at the chance of success of past applicants. And as it iterates it could easily converge toward zero.

"and to pretty much never do worse than the experts"

Please temper your enthusiasm for freshman level prediction models.

In domains where expertise has value (eg. chess, physics) using linear models for prediction is rubbish.

In domains that are extremely complex and where experts don't do so well, doing as bad as the experts is not something to write home about.

The point is that they are using the wrong model. They should base application success on some metric of studies success (grades, for example). Simply basing the model on the human-based application success just transfers the problems.

The ought to program it to admit a certain number randomly each year and track the long term success of those so admitted and then look at the characteristics of these random successes to determine admissions in the future.

"The admissions data that was used to define the model’s outputs showed bias against females and people with non-European-looking names."

Isn't it easier to eliminate gender and name as inputs to the algorithm then, say, to hide gender and ethnicity from a human interviewer making a subjective judgement. I understand that it would still be possible that the algorithm would pick up on a proxy for gender or ethnicity. However, it seems easier to prevent computers from considering discriminatory inputs like gender and race than preventing humans from doing the same. So, by explicitly determining the inputs to the algorithm, one would think that using an algorithm would actually reduce the influence prejudices. At worst, it seems like AI trained to match human results would reflect, rather than "magnify", human prejudices.

So, do you just get the same rejection rate, or does that rate increase. Let's think about how we could make the latter clearly true. Let's say a human rejects someone because they are female. Well, that female may have other things in their resume that other females also have. Maybe their extracurriculars are more likely in music compared to males. Well, the computer only knows that she was rejected, not that she was only marginally rejected for being female, so it might assign relatively negative score to music. So, rather than getting just a -1 for female, she might get a -1 for female AND a -1 for music.

I think this could be summarized in that if the computer does not contain feedback to make sure the aggregate female ( acceptance rates are consistent then applying acceptance rates individually could mathematically converge towards zero. But I'm not a programmer or statistician.

Why do you assume they were inputs? The racist algorithm (proto-Skynet) had to Google them for itself!

Right, this was my first thought, too. I assume it used some sort of Bayesian classifier, so it's not clear why they would include name and sex as variables in the first place.

Why? Why? It can't be bargained with. It can't be reasoned with. It doesn't feel pity, or remorse, or fear. And it absolutely will not stop, ever, until you are dead.

There are only two possibilities:

1) These are variables, and the "fix" is really easy.

2) These aren't variables, but the "affected" groups just aren't up to snuff on the things the university believes it wants in students.

I find your title prejudiced. Why shouldn't this be "Computers which confirm our prejudices" ?

I agree. I think we have an awful lot of bias clouding what should be a fairly obvious and instructive example here.

Details are scant, but I think there are a few assumptions can be safely made here. From the link:

"Details of each candidate were obtained from his or her
University Central Council for Admission (UCCA) form, but
since this contains no reference to race this was deduced from
the surname and place of birth. The computer used this
information to generate a score which was used to decide
which applicants should be interviewed. Women and those
from racial minorities had a reduced chance of being
interviewed independent of academic considerations."

I'm willing to assume that the computer was not guessing the races of the applicants based on their races. It seems to have selected candidates based upon the data in the admissions form, nothing more.

Rather than presenting us with evidence that the humans had embedded bias in their judgements, it appears to he quite the opposite - that minorities were underrepresented based on objective criteria, rather than the familiarity of their surnames. I find it remarkable how willingly this as been twisted around to show that the programmers have somehow imbedded a hidden racism in their algorithm.

If you both had thought this through a second more you would realize this was a "magnification" of the prejudice, because the algorithm's input set and output set were not decided based on performance, but on previous admissions.

i.e. if you'd had an algorithm that took in application inputs and predicted performance metrics (grades, matriculation rate, placement, etc), then used such metrics to reinforce it's predictions, that would be ok. But this looked at who was admitted after passing through the algorithm and reinforced itself on that. So any bias would be magnified over time...

Nice try though. Easier to blame shrill liberals than to think things through. btw it's spelled "embedded"

How's that flux capacitor coming?

I'm not defending their admissions criteria. I'm pointing out that the evidence shows a _lack_ of racial bias in applying it.

I'm glad you noticed the misspelling though.

I note that the original BMJ article is from 1988. Thankfully, the excellent Guardian's hot shot journalists were right on the ball, reporting the story a mere 25 years later.

News, hot off the press...

Surely a bias against women isn't a prejudice, but entirely rational in this case? I don't know what they use for admissions in Britain, but if you look at the US, women have significantly lower MCAT scores than men.

On the other hand, we do have an even better proxy for MCAT scores. MCAT scores.

So Anon. here's your data for GMATs and GPAs by gender [pdf]: Yes, the scores of male applicants, on average, are higher than female applicants ... probably even statistically significant difference. But using gender in the place GMAT scores would be a huge loss of individual information. (This was Andrew's point too.)

Oh, certainly. But that assumes there are no interaction effects, while in reality we know that there are. Women receive disproportionately higher grades when grading is subjective, for example. So given two identical candidates the male is going to be, on average, better, and the model would "correctly" be "prejudiced". See for example.

I think we need to discuss the term "better" ... the paper you linked to discusses the behavior of male students in addition their cognitive skills. Recall this article is about identifying promising candidates to become a doctor. Yes, there are important cognitive skills to screen for but behavior is not a trivial aspect of the job. Soft skills are important, especially in service professions, like medical services. I am not disputing that statistical discrimination could arise (where gender proxies for the underlying traits that should sort candidates), but your MCAT score differences by gender, objectively scored, do not strike me as compelling evidence of your hypothesis. If women got a bunch of puffball grades in school, how are they so close on the MCATs?

Anon.'s point, obviously, is that they're NOT using gender in the place of MCAT scores. They're just using the actual MCAT scores, and women perform worse on that metric, so they get accused that the algorithm is "biased against women".

I'd be surprised if the "bias" were unconditional of test scores. And the diffs in MCAT scores are modest ... a significant fraction of female applicants have higher scores than male applicants. Bias is when those women don't tend to get offers.

Are you dense? The "bias" is in the result, not the algorithm. It's typical grievancemongering legerdemain. It doesn't say ANYWHERE that wymyn with higher MCAT scores were being rejected while men with lower scores were being admitted. That's your own personal grievancemongering affirmative-action inferiority complex talking. Clearly getting the best people in the most important jobs is a distant second priority to affirmative-action bean counting in a world where FEEEEEEEEELINGS trump results. Who cares if a doctor is any good at medicine or not, there must be some "soft skills" or some other unfalsifiable nonsense that we can throw in to justify quotas and bonus points so we get a result that doesn't hurt Claudia's FEEEEEEEEELINGS.

No, I'm not dense. THANKS for chatting.


Go see my discussions on Asians and UC college and note that my position is 100% consistent.

We know that women are better team members, or we know in as much as we can know from the studies that have been done. So, if MCAT doesn't capture that kind of thing and the algorithm - and Tyler is referring to AI you should keep in mind - over-emphasizes MCAT then women could be hindered from the start and then their lack of past success could quite easily, depending on the algorithm, converge towards zero in ways that would eventually shock even the humans that the algorithm was trying to emulate.

Were the computers right or wrong in rejecting them?

As AI spreads, we will have to ask: What if AI develops decision making rules that are by our standards prejudiced? Do we need to change our standards (weren't prejudiced after all) or do we change the rules AI work by.
If we change AI's rules, will we encounter conflicts like in 2001's HAL?

They'll just proclaim math is racist.
Butlerian Jihad!

The crucial part is the definition of "showed bias against females." What exactly do they mean by that? With reference to their percent in the population? In the applicant pool? Or benchmarked against a committee of human selectors?

A chap I knew who worked in a vet school said they should have started by rejecting anyone who couldn't spell veterinary.

That school's name should appeal to the more childish among the American readers of this blog.

Ahem, our...figureheads don't go around showing theirs off!

Surely this must have been the very first headline over there, but Harry showed us the real meaning of The Royal 'We'


Know Thyself,

And Those You Mimic

When I was a research assistant, I took a Visual Basic programming class for fun and I remember teacher constantly, loudly reminding us that "computers are stupid." They do exactly what we tell them to do and nothing more. Whereas when I now ask my RA to do something, he occasional points out the logical impossibility of my request, a computer will blithely try to run with my human stupidity. Backward looking rules are going to bake all kinds of prejudices and trends that ceased to exist into the current decisions whether the rules are applied by computers or people.

I thought the article was way too alarmist. There is much promise in computer algorithms for sorting information. I suspect it's easier to get algorithms to run forward looking rules and adapt quickly to circumstances than human decision makers. I do worry that this puts a lot more emphasis on the programmers and the data scientists who usually sat a few layers beneath the big policy decisions and more removed from the people that are affected.

Here's a more eloquent piece with some similar points:


eventually reality cannot be papered over by political correctness. Essentially, this algorithm learnt the rules of the MCAT, leading to disparate impact on applicants. In other words it found the principal properties of a candidate that predicted success in medical school . Medical school is hard, and you need to be pretty smart and conscientious to make it. Apparently women and some minorities, (not Asian-Americans for example) find it harder.
It is the same in racial differences in IQ, heritability in IQ, for example recently with Jason Richwine. The data and the science are clear, but then there is this enormous effort to paper it over and get rid of the messenger.

I think we are having a communication problem. I am not politically correct (I am here, right?) and I am not a feminist ... being a woman is neither a necessary nor sufficient condition for either attributes. I am a fanatic about empirical inference and that's my problem here. The selected article is pretty poorly written, but when I read "the model’s outputs showed bias against females and people with non-European-looking names" I take that to mean that if a female, non-Euro male, and a Euro male all applied with the same other objective characteristics (MCAT-like score, GPA, ect.), the computer algorithm chose the Euro male. Bias means something that can't be explained away on merit. I have several female friends who are doctors and yes, I know it's hard course of study ... they reminded me what I cakewalk an econ PhD was all the time. I am also willing to accept that more white men are well prepared to be doctors than women or minorities, but that does NOT mean gender or race should be used to screen out candidates.

also see Andrew''s comment above to mike about the weighting matrix. further suppose that the type of applicants over the demands of the job changed over time, then a static, backward looking matrix could expose or create some serious imbalances in the chosen candidates. there is no reason why a computer algorithm can't be more adaptive than the typical human judge, but that must be a design feature and not taken for granted.

How did computer programs created in the 1970s and 1980s recognize "non-European" names? More to the point, how on Earth could the software analyze names at all? Something about this doesn't sound right.

Think of it like training a spam filter. The spam filter doesn't know what the words in the email mean. All it knows is that if the email contains certain words, it is likely to be spam. And if the email contains different words, it is likely to be genuine. As you classify more and more email, the filter becomes more and more accurate.

You apply the same concept to the admissions filter, and train it in the same way. Eventually it starts identifying names like "Richard" and "David" to be more likely to be successful and "Ahmed" to be less successful. It doesn't know that "Richard" is an English name, and "Ahmed" is non-English name. All it knows is that admissions with "Richard" are more likely to be approved.

If you think that these computers were programmed to take peoples' names into account as admissions criteria, you're exactly the kind of idiot the article was trying to sucker.


It's not the names, necessarily but the correlations. If the computer really is AI and/or a neural network that emulates people perfectly then it would be just as biased as people. Then if it's iteration does not have any dampening like people would and it bases acceptance on its past acceptance rates, it could converge toward zero.

What IMHO the actual output of a neural network is not the acceptance rates per se, but the weightings that must be rebalanced.

I tyler.

I've been fed up with Liberal economists who think they know everything.

What do you think about a more distributed system of economic thought, whereby people can weigh in through a rational conversational format?

I think it might do some good for our economic system. I think the authorities are unreliable and in some cases, downright deceptive.

Economics is a field where people make lots of money no matter their views. Perhaps we just need to get more people involved in these discussions top make them more relevant to those who don't have money.

It is very odd to this kind of thing happen, but also very predictable.

Comments for this post are closed