SWORD for Peer Grading

Jennifer Imazeki who blogs at Economics for Teachers has an interesting post about SWORD, a peer review system for grading writing. Basically students write an assignment and submit it to SWORD which randomly assigns 5 other students to grade the assignment and offer helpful comments; the gradees, in turn, grade the quality of the comments.  Here’s Imazeki on some neat aspects of the system:

One of the coolest things about the SWoRD system is how it calculates grades. Students receive a grade both for reviewing and for writing. The reviewing grades are based half on ‘consistency’, which takes into account things like if a student just gives all high scores or all low scores, or scores that are really different from the other reviewers of the same papers, and half on the back evaluation ‘helpfulness’ scores. The writing grades are based on the numeric rubric scores from the reviewers but adjusted for the consistency of the reviewers so, for example, if a paper has four reviewers who give high scores and one reviewer who gives low scores, the low scores from that one reviewer will be given less weight. The instructor can also adjust how much weight is given to the reviewing and the writing scores for each assignment.

The big virtue of SWORD is that it makes written assignments feasible even in a large class.  I will be interested to read how Imazeki’s pilot experiment works out.


This depends on students being qualified to judge work that they are in the process of mastering. For good students, such peer review would be a wonderful component of education. For others, it would be the blind grading the blind.

Statistically some kids will get a clump of papers that are of similar quality, so that grading strictly on (in)consistency is wrong.

You want to grade the kids on accuracy, which is a totally separate question.
You may be able to set up a system in which the teacher grades 1/10th of the papers (randomly chosen), and then the kids who tend to agree with the teacher are both (1) regarded to be high-scoring graders and (2) more heavily trusted to assign proper grades to their peers who weren't graded by the teacher.
Then it becomes an empirical question of how much "signal" needs to be provided by the teacher-evaluated grades (that is, how many papers the teacher must grade) in order to make the whole system accurate within an acceptable margin.

However, if the bottleneck here is the teacher's time, why not make use of the internet? Suppose there is an online clearinghouse where accredited teachers can log on and grade papers for money. When you have extra time and need money, you go on and grade some. When you don't, you put your papers up to be graded.

Perhaps one other method to judge accuracy would be to distribute one pre-prepared paper to every student that the teacher has already judged to be low, medium, or high quality. Every student would thus have 6 papers to grade, not knowing which one is the pre-prepared paper. The identity of the student they are grading would be left hidden as well to help reduce bias (if it happens to be a friend) as well as gaming the system.

There is actually a different online site, called Calibrated Peer Review (developed by folks at UCLA, I think), that does something sort of similar. Students first evaluate papers that the instructor has also evaluated (one each of low, medium and high quality) and the students learn what is and isn't a good paper by seeing if their evaluations match the instructor's.

What next, your students do all the teaching, grading, and research? Oh...wait....

Anything's better than a teacher taking home piles of work to mark. So 19th century!

Positive comments (and the corresponding positive grade) will almost surely tend to be seen by the gradee as more helpful than criticism. Being considered "helpful" is rewarded and being "consistent" is rewarded. Solve for the equilibrium...

The solution is to have the professor sample audit the papers and the grades. But when people go from 100% work to 0% work, they usually aren't happy about going back to 10% work.

Btw, do you guys really grade EVERY assignment? So much low-hanging fruit...

Yes and no. Students actually tended to ding vague comments (including "looks good to me" or the equivalent); criticism was acceptable IF the reviewer also made suggestions to improve. In evaluations afterwards, students actually said things like, "I realized it was OK to be harsh because when revising my own papers, I actually found those comments more helpful than the ones that just said everything looked great".

Would it be possible to work a quid pro quo among the students and thus game the system? Can't be that hard to find who got assigned my essays to grade? Say, use a md5sum and post it on facebook?

OTOH, what kind of major is "Leisure Studies"? It is included in the list of disciplines that use SWORD on their website.

I remember taking a math class in college that used an unconventional approach to teaching. We worked through the problems in a book and basically rediscovered the entire subject ourselves. The teacher didn't teach and if asked a direct question would answer in riddles. Needless to say we all hated it. I wanted to really learn the subject so I retook the class, differential equations, at the community college. I'm all for alternative teaching methods but make sure it is appropriate for the students.

Sounds like an abuse of the Moore method. Done right, it develops students who are really good mathematicians. Unfortunately, this comes at the expense of their knowledge of mathematics.

As an adjunct in an MBA and also law school, I've done this group correction before, but with the students seeing a near perfect prior essay exam of another student with my corrections to grade against.

Not only does it help them to grade, but, more importantly, it gives them some idea of what to aspire to and what my expectations and the class norms are.

A lot of education revolves around expected norms. When I did this, the papers and exams got much better.

That's what competition does.

how is this any different from the peer review journal publication process?

surely no serious researcher will claim they have mastered their field.

but there is a difference - one would imagine the teacher being better able to judge writing than his/her students. Otherwise why is the teacher there? A similar situation does not hold for the journal publication process. There is no "grandmaster" who is able to judge article quality better than those writing it.

Nevertheless it is a reasonable thing to try - at the very least, the students gain by trying to criticize. So I do not view this is as substituting for understanding class material, rather it can at best serve as a complement. But perhaps a very good compliment.

In my field, editors of top journals are grandmasters and they know who to turn to for qualified opinions on the papers submitted to them.

I did this for my class at UC Berkeley in 2009 (ripping off my HS teacher's assignment in 1987!)

Each essay went to three students; they ranked the essays they reviewed in 1/2/3 place and wrote a critique that WE -- TAs and me -- graded.

It was a great set-up. Biggest problem: students who didn't trust peers. Biggest surprise: overuse/incomprehension of game theory ("if I give a good paper a bad grade, that will make my grade relatively better" -- total fail).

Highly recommended to reduce work load AND, improve writing (someone WILL read this!), and critical thinking skills of reviewers -- who all got more understanding by reading others' works.

Seems like a good idea. Can learn from other student's efforts as well.

This idea has been around for decades; a friend started developing the software for a system like this in the 70s using the RSTS/E timeshare system under grants for using technology in social science and literature teaching, with the lead researcher a psychology/behavioral science professor. Others were working on similar research projects at other universities as well; Ball State was nearby and one of the coresearchers on some of the grants as I recall..

My understanding from my friend who spent a lot of time on the human interface engineering and implementing the social engineering features was the method required a lot of extra commitment on the part of the instructor. The teaching method must change to move students into active learning, from the getting teacher talking points fed into their brains so they can regurgitate the talking points back to get the grade, with understanding and knowledge being a side effect.

Obviously doing this on timesharing terminals put restrictions on accessibility, but the system replaced discussion groups which required presence at fixed times, prepared and actively engaged to get good grades, with the meek often not fairing well. This project fueled the purchase of networked personal computers in labs across campus, and then later networking the dorm rooms. All before the Internet became the lingua universal of personal computers (most of the systems were with Appletalk).

My point is this is not a "pilot" or innovative.

The question is why has academia been so slow to adopt new technology and learning innovation to promote active learning?

If this were a bad idea, it should have been discarded two decades ago, shouldn't it?

Out of curiosity I googled for info on the system my friend worked on and found this clip from an article by Stephen C. Ehrmann and Mauri Collins Published in the September 2001 issue of Educational Technology Magazine

"Delphi: About a quarter century ago, Jerome Woolpy of Earlham College pioneered an early use of computer bulletin boards to support more extensive interaction among students. Each week each student in the course was required to do three things:

Ø Post a question on an electronic bulletin board

Ø Respond to another student's question

Ø Comment on another student's response

Variations of this approach have been used occasionally since then as generations of computer technologies have come and gone. The faculty time required is relatively small and the potential gains for students large. Yet this simple, structured interaction format is still apparently little used. Even faculty developers and technology trainers seem to know little about these simple kinds of interaction structures and so rarely inform faculty members about the possibilities. Nor have software vendors as yet automated this kind of structured interaction process (e.g., offering the faculty member an "automatic readout" of which students have taken which steps, or providing "automatic reminders" by e-mail to students who do not complete each of the three steps on time each week."

The authors were asking in 2001: "Can distance learning..." and observing:

"In a 1999 paper, Murray Turoff quotes Thorstein Veblen as saying "Institutions are habits of thought" (Turoff, 1999, p. 1). So are courses and classrooms, whether paper or face-to-face. In this essay, we suggest that most instructors, administrators and software developers are missing major opportunities because they assume that online collaboration among students must follow the same forms as traditional interaction in face-to-face classrooms. Yet, dating back to the 1970s, a few educators and software developers have been pioneering far more imaginative ways of helping students learn with one another in virtual space; they have been multiplying the advantages of extended access with the strengths of enriched learning environments."

As economists who think the best ideas for anything simply win out in the market place of ideas by the force of an invisible hand, why would Alan Tabarrok post about an obviously failed idea, which is so little know decades after it was "invented" that it is considered innovative?

A similar thing existed in Canada, called PeerScholar, its use at the University of Toronto was curtailed by a union grievance which alleged that allowing students to do grading "took away" work from unionized teaching assistants. In a craven move, the university bowed to pressure.

See: http://www.cupe3902.org/settlement-in-peer-scholar-case/

File under: unionization dampening innovation

This shows which they last very much lengthier and thus saving you income which could otherwise are actually utilized to purchase new ones.s

Comments for this post are closed