Too Good to Be True

In ancient Israel a court of 23 judges called the Sanhedrin would decide matters of importance such as death penalty cases. The Talmud prescribes a surprising rule for the court. If a majority vote for death then death is imposed except, “If the Sanhedrin unanimously find guilty, he is acquitted.” Why the peculiar rule?

In an excellent new paper, Too Good to Be True, Lachlan J. Gunn et al. show that more evidence can reduce confidence. The basic idea is simple. We expect that in most processes there will normally be some noise so absence of noise suggests a kind of systemic failure. The police are familiar with one type of example. When the eyewitnesses to a crime all report exactly the same story that reduces confidence that the story is true. Eyewitness stories that match too closely suggests not truth but a kind a systemic failure, namely the witnesses have collaborated on telling a lie.

police lineupWhat Gunn et al. show is that the accumulation of consistent (non-noisy) evidence can reverse one’s confidence surprisingly quickly. Consider a police lineup but now consider a more likely cause of systemic failure than witness conspiracy. Suppose that there is a small probability, say 1%, that the police arrange the lineup, either on purpose or by accident, so that the “suspect” is the only one who is close to matching the description of the criminal. Now consider what happens to our rational (Bayesian) probability that the suspect is guilwitnessty as the number of eyewitnesses saying “that’s the guy” increases. The first eyewitness to identify the suspect increases our confidence that the suspect is guilty and our confidence increases when the second and third eyewitness corroborate but when a fourth eyewitness points to the same man our rational confidence should actually

Even though the systemic failure rate is only 1%, that small probability starts to weigh more heavily the more consistent (less noisy) the evidence becomes. The red line in the graph at right shows–using a 1% systemic failure rate and realistic probabilities of eyewitness identification–that after 3 witnesses more evidence decreases our confidence and when more than 10 witnesses identify the same suspect we should be less certain of guilt than when one witness identifies the suspect! The yellow line shows how certainty increases when there is no possibility of systemic failure which is what most people imagine is the case. Notice from the green line that even when the probability of systemic failure is tiny (.01%) it begins to dominate the results quite early.

What matters is not that the probability of systemic failure is tiny but how it compares to the probability of consistency which, with any reasonable estimate of noise, is itself getting tinier and tinier as evidence accumulates. In another application, the authors show how even the miniscule probability of a stray cosmic ray flipping a bit in machine code can materially reduce our confidence in common cryptographic procedures.

In summary, the peculiar rule of the Talmud receives support from Bayesian analysis–too much consistency is suspect of failure.


Or put another way: people aren't rational. In particular, they follow the herd, whether in determining guilt or innocence or in making investments. I was fortunate to have represented a real estate contrarian: when other developers (of industrial real estate) were selling, my client was buying, and when other developers were buying, he was selling. It helped that he purchased much of his inventory of land from the Indians (that was the joke among his competitors). It smoothed out my workload and my income (when I mostly represented real estate developers and investors). My contrarian client died many years ago and I found another way to smooth out my workload and income: health care transactions. Am I rational?

And more importantly, people love to spout about how much smarter they are than other people.

AT is right that the unanimity rule protects against all sorts of systemic failures, but the one that leapt to my mind was based on rationality. Suppose the accused is very unpopular for some reason. People want to condemn him regardless of his guilt -- but even more importantly, they don't want to be singled out as his defender. Now every judge has a rational incentive to condemn the accused *even if they know that the result will be that he gets off*.

> Or put another way: people aren’t rational.

Maybe for the Talmud and police eyewitness examples. But it looks like this finding can apply to all evidence (in the Bayesian sense), not just that relating to human behavior.

"realistic probabilities of eyewitness identification"

Forgive me, but aren't the priors contingent upon how much the suspects resemble each other?

Implications from the 97% consensus, anyone?

When "4 out of 5 dentists agree ... " we take that as a good recommendation. If it were "5 out of 5" we would say baloney sandwich.

When the vote goes 79% - 21% for some candidate we are inclined to presume it was fair and represents a popular mandate. And speaking of 97%, when his excellency and Great Father of his People, Turkmenistan President Gurbanguly Berdimuhamedov gets 97% of the vote, then the enemies of mankind suspect the dictator's dilemma.

Great Father of his People, Turkmenistan President my ass:,_1991

An independence referendum was held in Macedonia on 8 September 1991.[1] It was approved by 96.4% of votes, with a turnout of 75.7%.[2]

And you win the thread.

What is surprising to me, at least, is that so many people find this result surprising when reading of it, despite the fact we have that old adage that is the blog entry title.

+1 you made me laugh out loud. Great comment!

To be fair though, it just means that they asked he wrong question.

Probably about the same implications as the 99.99% consensus that the Earth is round...

It's an oblate spheroid

"Round (adjective): shaped like or approximately like a sphere."
Sounds like an oblate spheroid is round to me...

Oblate spheroids are round. (And violets are blue.) I wonder what induced the shift in the literature from ovary and planetary ellipsoids of a century ago to the prolate and oblate spheroids of today. I find ellipsoids in texts from the 1920s and 1930s. Batchelor's 1967 Fluid Dynamics is between the nomenclature of then and now with "prolate ellipsoids" and "oblate ellipsoids."

I was thought that Earth is shaped like a geoid. A geoid is a geometric solid shaped like the Earth.
I am not joking.

This was an excellent find. Simple and thought-provoking. If I didn't have to go to work, I'd post a bit more in-depth about possible policy implications/advice combinations from Wisdom of Crowds.

So there are no objective facts/truths that humans can all agree upon because unanimity indicates falsity ??

This alleged 'lesson' has sharply limited applications that were ignored. Seems more a general caution on human fallibility, but we have swarms of politicians and news people as daily reminders of that.

"So there are no objective facts/truths that humans can all agree upon because unanimity indicates falsity ??"

Objective yes, but subjective is the key. If you personally observe something and 9 other people observe and agree with you, no interna;l alarm is rung. However, if you don't personally observe something and all the people who agree come from a relatively small select group, then there's an ever increasing probability that group bias might be effecting the consensus.

The surprise to this is even more interesting than the subject. In almost every endeavor things are very complex, and only silly people see things in black and white. Even 'objective truths' are very limited, the objective truth usually ending up being the beginning of a very long story with very little objective evidence except the seed idea.

Global warming is a great example of this. Starting from the energy characteristics of CO2 in the atmosphere we are told we need to subsidize third world countries who later this century are going to be affected by rising water levels. Where does the 'objective truth' end and an enormous amount of speculation and modelling begin? Quite early in the whole chain of events.

This is a very complex situation where any calculation or conclusion is going to be dwarfed by the errors in calculation.

And anyone that proclaims certainty about the solution is immediately suspect, and probably should get out a bit more.

Denialists are they are completely left out of the discussion of what to do because they are too busy denying basic science to play any role whatsoever in discussing the suitable response to AGW.

Since the denialists are heavily concentrated in the same ideological groups who are ideologically disinclined to intervention and transfers, their opposition to these approaches gets a very poor hearing because, well, as I said, they're too busy denying the basic science to have any credibility when it comes to discussing the appropriate solution.

Well, the fine print on the graph shows that the assumed prior was 50%. For other applications ( I was thinking of scientific theories such as QM and GR that never seem to fail a test - but then by the premises of the article must generate suspicion based on their very success!) the prior must be substantially larger.

Of course at that point the chart might then say " is impossible to reach 99.95% certainty..."

+1. The 50% prior is doing a lot of work here. In the criminal justice context, where the prior ought to be considerably closer to guilt (even in a "Making a Murderer" context) then the posterior for even fairly hefty unanimous opinions can rise considerably above 95%. That said, the use of a "probably guilty" prior here is bound to be controversial in the context of a presumption of innocence. But if we allow the eyewitness testimony to be calculated after we have adjusted the prior for the other evidence in the case it might not be too bad.

I've been waiting for your review of Netflix's "Making a Murderer" - is this it?

+1, you stole my thunder!
I was going to say: Professor Tabarrok finally watched Making a Murderer and this is his meta-critique.

I'm puzzled by the strategic implications Talmudic rule. I believe that I am the only judge of the 23 who thinks the defendant is innocent. So I vote for his guilt with the intended result that the defendant goes free. Except that one of the other judges who believes him to be guilty also believes that I, despite my belief in the defendant's innocence, will vote for his guilt. So that judge votes for the defendant's innocence to preclude unanimous support for the defendant's guilt and, thereby, secures the defendant's execution. And so on with all sorts of strategic complexities involving, e.g., judges masking their own beliefs about guilt or innocence.

Are the judges allowed to coordinate?


In practice the rule probably didn't make much difference in the results. But like the Official Opposition arrangements in parliaments it institutionalizes opposition to the majority. In many many spheres today the desire is to demonize anyone who doesn't agree with whatever is the fashion of the moment; a profoundly dangerous impulse and thankfully with institutionalized and broad consensus, opposing voices find a way and the tyrannical impulse ends up discrediting the side who uses it over time.

Like most rules it doesn't work quite as well as one would want, but the opposite being a desire for unanimity and the stomping of opposing voices works out far far worse.

It could be that the rule wasn't to protect the accused, but to protect anyone who disagreed with the majority fashion of the moment.

Perhaps. What was really going on, though, was that attitudes towards the death penalty had changed in Jewish society and the rabbis needed a work-around to accommodate them.

The list of capital crimes in the Torah is very long, including such things as disobeying your parents or having an affair with a married woman. The death penalty couldn't be abolished for any of them---God had commanded it---but Talmudists were free to make conviction for capital crimes so difficult as to be practically impossible, effectively abolishing the death penalty in Jewish law no later than the first century CE.

In point of fact, the passage in Mishnah Makkot 1:10: "A Sanhedrin that puts a man to death once in seven years is called a murderous one. R. Eleazar ben Azariah says 'Or even once in 70 years.'

Also, worth noting is the voting procedure was from most junior to most senior to avoid intimidating the more junior judges with the authoritative view of the more senior members of the court. That would certainly complicate the game theory on voting to acquit in order to convict and vice versa.

Climate science, speech regulations/political correctness on campus, string theory, all have this aspect to them. The critics aren't welcome at the party.

In reliability testing people will do test-to-failure (TTF) for this reason. To prove that the test is actually doing something and find the limits of the reliability space. Until you bound the space the data has no good meaning. This concept can apply to lots of things including academic studies and modelling.

So, back to Making a Murderer, is the point of the post that:

1: There is sufficient evidence, but not an overwhelming amount, and that should make us more confident that he did do it.
2: Because all the prosecution and detectives involved were coordinated on their response that he was obviously guilty, then we should be less likely to believe them and more confident that in fact he did not do it.

I'm not familiar with that particular case, but there was one a few years ago in Manitoba where someone was convicted of a murder iirc and later exonerated. There was lots of discussion about what went wrong, but the most interesting data point came from listening to the falsely convicted in an interview. After a few sentences my personal reaction was one of abhorrence. His attitude, way of speaking, everything told me on a deep level that he was guilty of something, no matter. My reaction surprised me and explained everything about what happened.

There was no evidence other than the fellow happened to be somewhere within a rather generous radius of where the crime happened, there was a shocking accumulation of evidence pointing elsewhere. It took a long time and a beating down of very strongly convinced people in positions of decision to fix the error.

The adversarial legal system provides a check on consensus, having to prove beyond a reasonable doubt guilt. In some situations those safeguards aren't enough.

The US system of plea bargaining and almost unlimited power of the prosecutors is troublesome. There has to be a strong institutional check to the human tendency to go headstrong in the direction you start out on.

See also the question of significantly insignificant significance tests ( and the controversy of R.A. Fisher and Mendel's peas.

In tractate Sanhedrin of the Babylonian Talmud, which deals with crimes and punishments, paradoxically, the more heinous the crime or severe the punishment, the harder it is to convict someone, which makes studying it very interesting. Along the way, there is much disagreement, of course.

The Synoptic Gospels offer a distinctive commentary on Sanhedrin proceedings, too.

What does this say about Federal case juries, where 12 angry men (and women) must be unanimous for the verdict to stick? compare to state juries, where often a majority or supermajority is sufficient (in Georgia, back in the 1960s, a man could be sent to the electric chair on a 3 out of 5 jury majority vote for the crime of rape).

Another possibility, whoever was copying the Talmud(because they had to hand copy it back in the day) copied it wrong.

Hang on, can this stuff be right? The Talmud is a creation of the years AD. Palestine was then ruled by Rome which would not (I strongly suspect) put routine power of execution in the hands of some conclave of shamans.

The Talmud, including the Mishna, contains discussions of laws even if they were no longer applicable at the time of codification.
Animal sacrifice was only practiced in the Temple, which was destroyed in around 72AD, yet there are significant portions of the Mishna and Talmud dedicated to those laws.

So you're saying that 9/11 was an inside job?

All those witnesses. It doesn't add up.


Well this certainly describes pretty much every conviction of a poor black man in this country.

First off, eye witness identification, especially across social class is unbelievably unreliable.

Second, identification of perpetrators, either by photo or lineup is only valid if the person working with the victim does not know who the person suspected is. Otherwise the policeman either consciously or unconsciously guides the victim to identify the person the police suspect.

Finally the issue about personality is often overlooked. Angry, odd or otherwise unpleasant people are far more likely to seem suspicious to an investigator than other possibly more likely suspects causing them to be singled out for more investigation. Further, focusing upon a likely suspect will increase the chance that the investigators will suffer from confirmation bias and overlook or discount exculpatory evidence.

Sadly all of this works against the primary argument that excess of evidence should raise questions or in the case of evidence ignored reaffirm the thought that the chosen suspect id guilty even when they are not.

The vote of the Sanhedrin in the trial of Jesus was not unanimous - at least one member, Joseph of Arimathea, dissented. Did he dissent because he thought Jesus innocent or to avoid the unanimity rule?

How did anyone among his followers know what happened in the trial of Jesus? The guy was arrested and taken away. The court did not sit in public. So how did anyone know?

Since you missed the point others may have too. All four canonical Gospels include the trial of Jesus by the Sanhedrin (it's somewhat different in John), including the dissenting vote. Did the authors of the Gospels include the dissenting vote in order to avoid the unanimity rule (i.e., if the vote had been unanimous, Jesus would have been found innocent). Surely, the authors of the Gospels were aware of the unanimity rule. Then the question is whether the authors included the dissent just to emphasize the malevolence of the Sanhedrin (i.e., Joseph of Arimathea dissented not because he thought Jesus innocent but because he thought Jesus guilty and he cast a false vote in order to avoid the unanimity rule).

Ah, so the evangelists just invented the voting, but consistent with your point. Fair enough.

E.T. Jaynes makes the same point explicit in his masterly Probability Theory: The Logic of Science. More evidence of miraculous claims, or uncommonly strong evidence in of itself, tends to increase our scepticism that we are being deceived. For example, on extrasensory perception (ESP):

"[T]he very evidence which the ESP'ers throw at us to convince us, has the opposite effect on our state of belief; issuing reports of sensational data defeats its own purpose. For if the prior probability of deception is greater than that of ESP, then the more improbable the alleged data are on the null hypothesis of no deception and no ESP, the more strongly we are led to believe, not
in ESP, but in deception."

Jaynes goes on to note that Lapace perceived this phenomenon long ago: "Those who make recitals of miracles decrease rather than augment the belief which they wish to inspire; for then those recitals render very probable the error or the falsehood of their authors."

That from Laplace's Essai Philosophique sur les probabilites (1819). Not as old as the Sanhedrin, but good to see that the thread of common sense runs through history.


Also keep in mind these findings are only for human affairs, not natural phenomena. That the sun comes up every morning from the east does not make it more false.

It would be nice to add a strategic element to this.

If too much evidence is suspicious then maybe don't reveal all of it?

If people expect such understatement, what is the equilibrium?

In fact, Dan Stone has a nice paper on this, though the emphasis is not on this point but on using it to understand continuing disagreement. "A few bad apples":

There is a long stats literature on the too good to be true idea with a continuous signal starting with Dawid (1973): "Posterior expectations for large observations," Biometrika 60(3), 664--667. In that literature if the tails of the signal distribution are thicker than the tails of the prior distribution then the posterior distribution converges back to the prior distribution. It doesn't immediately ring as true as the nice binomial model in this paper, but it seems there is a connection in that one reason for the thick tails of the signal can be that it is a mixture of two different distributions.

we offer a wide range of legal consulting services like central excise,service tax,GST,CST,VAT and DGFT . Expert analysis identifying your path of growth strategic planning crafting a plan to take you there risk management minimizing your risks along the way:-

Comments for this post are closed