Michael L. Scott, Tim Stelzer, Gary E. Gladding
Phys. Rev. ST Phys. Educ. Res. 2, 020102 , (2006)
Abstract
The reliability and validity of professionally written multiple-choice exams have been extensively
studied for exams such as the SAT, GRE, and the Force Concept Inventory. Much of the success
of these multiple-choice exams is attributed to the careful construction of each question, as well
as each response. In this study, the reliability and validity of scores from multiple-choice exams
written for and administered in the large introductory physics courses at the University of Illinois,
Urbana-Champaign were investigated. The reliability of exam scores over the course of a semester
results in approximately a 3% uncertainty in students' total semester exam score. This semester test score uncertainty yields an uncertainty in the students’assigned letter grade that is less than
1/3 of a letter grade. To study the validity of exam scores, a subset of students were ranked
independently based on their multiple-choice score, graded explanations, and student interviews.
The ranking of these students based on their multiple-choice score was found to be consistent with
the ranking assigned by physics instructors based on the students’written explanations (r > 0.94
at the 95% confidence level) and oral interviews (r = 0.94+0.06-0.09).