On Satuday I submitted the final grades for Math10A, the new UC Berkeley freshman math class for intended biology majors that I taught this semester. In assigning students their grades, I had a chance to reflect again on the system we use and its substantial shortcomings.

The system is broken, and my grade assignment procedure illustrates why. Math 10a had 223 students this semester, and they were graded according to the following policy: homework 10%, quizzes 10%, midterms 20% each (there were two midterms) and the final 40%. If midterm 1 was skipped then midterm 2 counted 40%. Similarly, if midterm 2 was skipped then the final counted 60%. This produced a raw score for each student and the final distribution is shown below (zeroes not shown):

The distribution seems fairly “reasonable”. One student didn’t do any work or show up and got a 5/100. At the other end of the spectrum some students aced the class. The average score was 74.48 and  the standard deviation 15.06. An optimal raw score distribution should allow for detailed discrimination between students (e.g. if everyone gets the same score thats not helpful). I think my distribution could have been a bit better but I overall I am satisfied with it.  The problem comes with the next step: after obtaining raw scores in a class, the professor has to set cutoffs for A+/A/A-/B+/B/B-/C+/C/C-/D+/D/D-/F. Depending on how the cutoffs are set, the grade distribution can change dramatically. In fact, it is easy to see that any discrete distribution on letter grades is achievable from any raw score distribution. One approach to letter grades would be to fix an A at, say, any raw score greater than or equal 90%, i.e., no curving. I found that threshold on wikipedia. But that is rarely how grades are set, partly because of large variability in the difficulty of exams. Almost every professor I know “curves” to some extent. At Berkeley one can examine grade distributions here.

It turns out that Roger Purves from statistics used to aim for a uniform distribution:

Roger Purves’ Stat 2 grade distribution over the past 6 years.

The increase in C- grades is explained by an artifact of the grading system at Berkeley.  If a student fails the class they can take it again and record the passing grade for their GPA (although the F remains on the transcript). A grade of D is not only devastating for the GPA, but also permanent. It cannot be improved by retaking the class. Therefore many students try to fail when they are doing poorly in a class, and many professors simply avoid assigning Ds. In other words, Purves’ C- includes his Ds. Another issue is that an A+ vs. A does not affect GPA, but an A vs. A- does; the latter is obviously a very subjective difference that varies widely between classes and professors. Note that Roger Purves just didn’t assign A+ grades, presumably because they have no GPA significance (although they do arguably have a psychological impact).

Marina Ratner from math failed more students [Update November 9, 2014: Prof. Ratner has pointed out to me that she receives excellent reviews from students on Ratemyprofessors, while explaining that “the large number of F in my classes are due to the large number of students who drop the class but are still on the list or don’t do any work” and that “One of the reasons why my students learned and others did not was precisely because of my grading policy.”]. Her grade distribution for Math 1b in the Spring semester of 2009 is below:

Marina Ratner’s Math 1B, Spring 2009.

In the same semester, in a parallel section, her colleague Richard Borcherds gave the following grades:

Richard Borcherd’s Math 1B, Spring 2009.

Unlike Ratner, Borcherds appears to be averse to failing students. Only 7 students failed out of 441 who were enrolled in his two sections that semester. Fair?

And then there are those who believe in the exponential distribution, for example Heino Nitsche who teaches Chem 1A:

Heino Nitsche’s Chem 1A, Spring 2011.

The variability in grade assignment is astonishing. As can be seen above, curving is prevalent and arbitrary, and the idea that grades have an absolute meaning is not credible. It is statistically highly unlikely that Ratner’s students were always terrible at learning math (whereas Borcherds “luckily” got the good students). Is chemistry inherently easy, to the point where an average student taking the class deserves an A?

This messed up system is different, yet similar in other schools. Sadly, many schools have used letter grading to manipulate GPAs via continual grade inflation. Just three weeks ago on December 3rd, the dean of undergraduate education at Harvard confirmed that the median grade at Harvard is an A- and the most common grade an A. The reasons for grade inflation are manifold. But I can understand it on a personal level. It is tempting for a faculty member to assign As because those are likely to immediately translate to better course evaluations (both internal, and public on sites such as Ninja Courses and ratemyprofessor). Local grade inflation can quickly lead to global inflation as professors, and at a higher level their universities, are competing with each other for the happiest students.

How did I assign letter grades for Math 10A?