The study, funded by the William and Flora Hewlett Foundation, compared the software-generated ratings given to more than 22,000 short essays, written by students in junior high schools and high school sophomores, to the ratings given to the same essays by trained human readers.
The differences, across a number of different brands of automated essay scoring software (AES) and essay types, were minute. “The results demonstrated that over all, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items,” the Akron researchers write, “with equal performance for both source-based and traditional writing genre.”
“In terms of being able to replicate the mean [ratings] and standard deviation of human readers, the automated scoring engines did remarkably well,” Mark D. Shermis, the dean of the college of education at Akron and the study’s lead author, said in an interview.
Here is more.