The Great AP Score Recalibration

The College Board’s recent testing move may be a problem. Its monopoly power definitely is.

Average course grades tend to be lower in some college subjects than others. Engineering and the “hard” sciences, for example, retain reputations for being “harder” subjects than the humanities and social sciences, even though a naïve observer could just as well assume that students in the latter subjects are smarter.

Do score-average comparisons really matter, though, in practical terms? After college, most graduates will be compared to one another from within their chosen fields. A “C+” engineering graduate will still be chosen ahead of a “C-” engineering graduate, just as an “A+” history grad will be chosen ahead of an “A-” history grad.

Score scales rank individuals’ performances so that they may be compared to one another. There are no “true” test scores; all are relative and at least somewhat subjective. So long as test content and test-taking populations remain similar, test makers can “equate” scores across time, producing the trend lines so popular with journalists and politicians.

College Board is “recalibrating” some AP exam scales and eliciting familiar reactions.Occasionally, however, changes in either the test content or the test-taking population stretch or compress score distributions so much that the scales themselves must be adjusted to remain usefully discriminating. Older readers may remember when the SAT score scales were “re-centered” in 1995-96. College Board, the SAT’s maker, explained that the scales needed to be adjusted because the test-taking population had changed so dramatically—in sheer number and demographic make-up—since the 1940s, when test-takers were predominantly middle- and upper-class white males applying to elite colleges.

A less publicized goal of the recentering synchronized the verbal and math score distributions. Over time, the SAT’s math and verbal scales had developed quite different shapes, and College Board worried that that lack of symmetry threatened the SAT’s face validity among its vast non-technical customer base. Despite these explanations, however, critics accused College Board of score inflation and watering down the content of the SAT.

Now, College Board is “recalibrating” some Advanced Placement (AP) exam scales and eliciting similar reactions from critics. College Board invokes a desire to make score distributions similar across AP exams, just as they crafted similar distributions for the verbal and math SAT subtests in 1995-96.

In all, three changes currently transpire. Standards for all AP exams are being set by a new “evidence-based standard setting” (EBSS) method. Score distributions across all AP exams will have similar shapes. And score distributions for a minority of AP exams have been or soon will be adjusted—all upwards thus far.

Does the recalibration really matter for AP exam users—students, schools, teachers, and colleges? To counteract AP exam score inflation, all a college need do is raise the threshold at which it awards course credit. That’s not hard.

Not hard for independent, private colleges, that is. Indeed, some of them no longer award course credit for exam scores of three, four, or five. Public colleges, however, may not be so flexible. According to College Board,

As of spring 2024, 37 states have implemented statewide or systemwide AP credit policies, which typically require all public higher education institutions to award credit for AP Exam scores of 3 or higher. AP policies that grant credit for scores of 3 have grown 22% since 2015, and the number of policies for credit overall has grown 14%. Both trends are largely attributable to state and system policies.

Criticism of College Board’s “Great Recalibration” comes in two flavors. First, some insist that either college grade inflation induced the AP recalibration, or the AP recalibration validates college grade inflation. If either were the case, though, all AP score distributions would likely be moving up or down, not just some.

To counteract AP exam inflation, all a college need do is raise the threshold at which it awards credit.AP standard-setting in the past typically utilized professional judgment. Groups of experts—teachers, professors, and other subject-field experts—read draft test items, one by one, and estimated how many test-takers would or should know the correct answer. A subjective process.

It was also one that produced varying score distributions across AP tests, as the expert standard-setters in some fields were more conservative or liberal than those in others.

College Board asserts that its new EBSS method incorporates vastly more information that complements those expert judgments. It sounds wonderful.

One will not, however, find detailed technical specifications at the website of the American Council for Education (ACE), in the foundational EBSS research article, or from College Board. It is highly likely that technical process reports are written for each exam’s standard setting. But none have been made public. College Board alludes vaguely to ACE having assumed much of the responsibility for the new black-box process, and ACE reports that “the validity evidence for AP scores … was ‘exceptionally strong.’”

Neither does the single standard-setting example offered in EBSS’s foundational research article reassure. It describes the process employed in the mid-2000s for the high-school course that became the model for Common Core Standards development, implemented without any higher-education input, contrary to initial promises.

The AP program head offered,

[EBSS] guards against variations in panelists because it keeps the standards tied to specific skills and content knowledge demonstrations that we can maintain over time.

And just where might “specific skills and … demonstrations” come from? They’re unlikely to come from anywhere along a wide, disparate range of thousands of college courses. That would offer no uniformity—no standard. More likely they will emanate from the dreaded, fuzzy K-12 “college and career-ready” Common Core Standards, whose “architect” now heads College Board.

The second flavor of criticism of AP’s recent changes assumes that the profit motive must lurk beneath. Raising score distributions of some AP exams could induce more students to take, and more high schools to offer, AP courses, in the belief that higher scores and college credits have become more easily attainable.

College Board has come to depend more and more on the AP program for revenue volume and stability.It is certainly true that College Board has come to depend more and more on the AP program for revenue volume and stability. Revenues from College Board’s other large program, the “SAT Suite of Tests,” have trended less reliable of late. There was the test-optional trend, competition with ACT for statewide contracts, Covid-19, and then the outright elimination of SAT scores from enrollment decisions in both the University of California and the California State University systems (and elsewhere). Advanced Placement’s proportion of College Board’s revenue split increased from 46 percent in 2012 to 54 percent in 2022. It should increase even more in 2025 as competitor ACT assumes management of the statewide college-admission testing program in Illinois, currently College Board’s largest.

The recent reinstatement of the college-admission test requirement at several elite northeastern universities (plus the University of Texas) does not begin to compensate for all these customer losses.

In addition, federal funding for College Board programs has been generally declining for over a decade, and overall AP program participation has plateaued since 2017 after steadily rising for decades.

College Board (and ACT) generally claim any program a success that induces more students to enroll in college. The U.S.’s low rates of college completion and high rates of college indebtedness contradict this self-serving assumption. A stronger case might be made that too many students enroll in four-year colleges.

College Board also alleges that all students benefit from taking AP courses, even those earning scores of one or two and, implicitly, even the large number of students who take a course but not the exam. A large research literature contesting these notions of ubiquitous benefits flowing from college-going and AP course-taking exists but remains unmentioned by the testing firms.

In sum, the “great recalibration” is something of a red herring. It affects only a minority of AP subjects. Though it is inflating AP scores in those subjects, which may inflate College Board revenue in the short term, it comes at some risk, as it may provoke a backlash from colleges that more than compensates.

People are right to suspect College Board’s intentions, as well as its competence.But even that could be a beneficial outcome. College Board has been successful in lobbying governments to subsidize or force educational institutions to purchase its products. Those most adversely affected by the AP program’s success, such as the students who end up inappropriately at college, accumulating debt but not degrees, are widely dispersed, penniless, and powerless. Colleges, however, are not powerless, and they are organized. Moreover, colleges have a direct incentive to push back against College Board. Course credits granted via the AP program represent for them both a loss of revenue and a loss of curricular control.

People are right to suspect College Board’s intentions, as well as its competence. Scandals, blunders, and ideological bias have plagued College Board management ever since the unqualified David Coleman took charge as CEO a decade ago.

In the Advanced Placement program, the private, unelected College Board effectively owns a monopoly. Its monopoly profits pay to lobby governments to further entrench its monopoly power. Relatively small issues such as the “Great Recalibration” disserve us if they divert our attention from the more serious and enduring problem—College Board’s unregulated AP monopoly and consequent lack of public accountability.

Richard Phelps has authored, edited, and co-authored books on standardized testing, learning, and psychology. His newest book is The Malfunction of US Education Policy: Elite Misinformation, Disinformation, and Selfishness (2023).