Every year, professors around the world write millions of letters of recommendation. They write letters for admission to graduate schools, law schools, and medical schools. They write letters for tenure cases to help colleagues with their promotions. They write letters for students who wish to study abroad. They write letters for fellowships and scholarships. They write letters for summer workshop programs and for graduate student research grants.
Surely, this massive effort must be justified. If we ask our faculty to write these letters every year and for nearly every program, there must be some evidence that they provide valuable information.
That, however, is false.
The peer-reviewed research on letters of recommendation, called “LoRs” by scholars, shows that they possess very little value. These letters are junk on fancy letterhead.
My argument may strike readers as strange and counter-intuitive. Wouldn’t a professor be in a good position to know a student and write a letter? In theory, yes. Professors are in a better position to evaluate a student than, say, their neighbor. But the practice of letter writing is much different than the theory.
Professors write so many letters that older, more established professors have all kinds of tricks for managing the massive number of recommendation requests, which results in unreliable letters. Some only write letters for “A” students or their favorites. Other instructors write the same letter over and over again because they simply don’t know what to write for the fifteenth law school applicant who got a B+ in their course.
Even worse, some faculty simply don’t write letters at all. They ignore requests or promise to do it and then break their promise. Writing a reference letter requires time and effort that many professors don’t have—or don’t care to have.
Perhaps those anecdotes are misleading. Even though LoRs require extra effort from faculty, they might give valuable evaluations. Unfortunately, though, they don’t. For the last 40 years, researchers in workplace psychology and education have analyzed LoRs and the answer is pretty clear: LoRs are a really bad way to evaluate people.
Workplace psychologists treat LoRs the way psychologists treat any measurement of the human mind: One must ask if LoRs have biases. One must also ask if the measurement is valid (i.e., it is correlated with what you care about) and reliable (i.e., if you repeat the measurement, you get similar results). A measurement tool is only useful if it has few biases, correctly measures your desired outcome, and can be used repeatedly with similar results.
One of the best articles to address these issues is a 1993 article in Public Personnel Management, written by Michael G. Aamodt, Deon A. Bryan, and Alan J. Whitcomb. They carefully reviewed the evidence on LoRs and described their many problems.
First, Aamodt, Bryand, and Whitcomb noted that LoRs suffer from extreme bias. Students often choose the professor who likes them the most, which means that many letters are unusually positive. Conversely, many professors refuse to write letters for weaker students. Another issue is that many faculty may be worried about confidentiality. One might write a more positive letter if they knew that the letter might be leaked to a student. Finally, there is evidence that women and people of color receive different evaluations than white men.
The most damning evidence comes from a systematic analysis of studies that try to link letters to job performance. Aamodt, Bryan, and Whitcomb review the results:
Even though references are commonly used to screen and select employees, they have not been successful in predicting future employee success (Muchinsky, 1979). In fact, the average validity coefficient for references is only .13 (Browning, 1968; Mosel & Goheen, 1959). This low validity is due mostly to four main problems found with references and letters of recommendation: Leniency, knowledge of the applicant, low reliability, and extraneous factors involved in the writing and reading of letters of recommendation.
More recent research backs up the general claim that LoRs are atrocious. The Journal of Academic Medicine published in 2014 an analysis of 437 letters submitted for three cohorts of medical students. The results? Of 76 types of information contained in LoRs, only three were found to have a significant effect on graduation rates. The results were so modest that the authors wrote that “LoRs have limited value to admission committees, as very few LoR characteristics predict how students perform during medical school.”
Even worse, a 2014 meta-analysis of letters in graduate programs shows that LoRs have almost no predictive ability. Published in the International Journal of Selection and Assessment, the paper combines data from dozens of studies. For example, the statistical model that predicts GPA in graduate school is only improved by a measly 1 percent when you include data for LoRs. LoRs have a small effect on graduation rates, too; they improve that model by 6 percent. The author’s assessment of the reliability and validity of LoRs is very negative: “If letters were a new psychological test they would not come close to meeting minimum professional criteria (i.e., Standards) for use in decision making (AERA, APA, & NCME, 1999).” (emphasis added)
The problems of low validity and inherent bias in LoRs, according to the experts, are too much for letters of recommendation to be useful. Their use, instead, is based on tradition and outdated requirements, not reason. The question, then, is what to do?
The best thing would be simply to abolish them and evaluate students on things like grade point averages, standardized tests, and writing samples. Those measures are imperfect but, unlike letters of recommendation, the research on those practices show that they have some value. For example, multiple reviews find that the Graduate Record Exam predicts graduate school grades, though it does so modestly.
I do not think academia will abandon letters of recommendation anytime soon. We professors will resist even the mildest reforms, though it would save us considerable time and effort. Still, professors can do a few things.
First, when evaluating students for admission to programs, focus on what students have done. Look at the courses they take. Look at their writing sample. Just ignore letters, or quickly scan them for red flags. Second, in writing your own letters for students, you can save time and effort by writing shorter and more focused letters. Long, complex letters rarely matter. Finally, gently direct your colleagues toward the substantial body of research showing the problems with letters of recommendation. Hopefully, this will help undermine the fetishization of letters of recommendation and bring about a more evidence-based approach to student evaluation.
Fabio Rojas is a professor of sociology at Indiana University, Bloomington, and is the co-editor of Contexts: Understanding People in Their Social Worlds.