Model this
Doctors were given cases to diagnose, with half getting GPT-4 access to help. The control group got 73% right & the GPT-4 group 77%. No big difference.
But GPT-4 alone got 92%. The doctors didn’t want to listen to the AI.
Here is more from Ethan Mollick. And now the tweet is reposted with (minor) clarifications:
A preview of the coming problem of working with AI when it starts to match or exceed human capability: Doctors were given cases to diagnose, with half getting GPT-4 access to help. The control group got 73% score in diagnostic accuracy (a measure of diagnostic reasoning) & the GPT-4 group 77%. No big difference. But GPT-4 alone got 88%. The doctors didn’t change their opinions when working with AI.