Last year, Forbes published a headline, “Americans rank a Google internship over a Harvard degree.” It seems higher education is quickly losing hold of its value proposition as the best way to prepare for a job or advance in one’s career.
And that’s not because of a lack of new advancements or insights. Especially in the world of statistics and data science—both of which are among the fastest-growing occupations—there has been tremendous improvement in understanding how we should best analyze data. As a PhD student in organizational psychology, I’m exposed to and have been taught a wide variety of advanced analytical methods: structural equation modeling, meta-analysis, multilevel modeling, machine learning algorithms, mixture modeling, and factor analysis to name a few.
I’m not the only one to experience this rapid growth in advanced analytics curriculum. Modern advanced degree graduates, especially in the social sciences, are often expected to understand this long list of pedantic jargon that few others understand and much less care about. For instance, in my program at George Mason University, the curriculum from 2005 just covers regression, psychometrics, and multivariate (which are generally considered to be introductory content).
Since then, discoveries in more advanced methods have abounded and made their way into curricula. One of the top journals in psychological statistics, Psychological Methods, began in 1996 and since then has published an average of 800 pages annually of articles. In particular, its impact factor (a measure of journal popularity) has skyrocketed in the past decade. The rise of computer technology in statistical analysis has further changed the landscape of analytics curricula.
Students need to be able to explain data and research findings to non-academic audiences, but they’re not being trained to do so.It would seem, at first glance, that these developments should be viewed positively, as evidence that advanced degree graduates are being prepared to join a workforce that demands data analytics as a critical skill. After all, many have argued that higher education should invest more in aligning curriculum with employers’ expectations, especially as it relates to technical skills.
However, I argue that the gap between education and actual job requirements persists because of the way these advanced topics are being taught to students. Advancements in statistical analysis curricula do not necessarily mean that students are leaving better prepared to excel in modern jobs that emphasize data analytics. In many ways, they may be even less prepared.
By over-emphasizing these advanced analytics methods, many programs are losing sight of the importance of “simpler” methods and skills that are actually more important and relevant in future careers. My colleague and I collected an informal survey of about 100 alumni working in non-academic jobs, asking them what statistical methods they used most frequently at work. The most frequently used methods were simple correlation (62 percent used “a lot”), data visualization (55 percent), and regression (49 percent); the advanced methods taught in most curricula were only used “a little” or “not at all.” In fact, the most frequently used software was Tableau, which is a platform for data visualization—this reflects the growing trend of data visualization as the key skill in analytics jobs.
Why the discrepancy?
Advanced analytics are notoriously difficult to explain to a non-academic audience. Many students graduate able to conduct advanced tests, but unable to explain them. It’s been called a “black box” situation where students are only trained to point-and-click to run an advanced test, without being trained to understand what they actually did and then explain their results succinctly to a non-academic audience. This problem persists even for students who go on to become faculty. For example, one study reviewed 784 advanced structural equation models reported in top academic journals, finding that 38 percent of them misreported or misexplained a basic element (known as degrees of freedom) of their analysis.
Data visualization, on the other hand, is all about keeping things simple yet accurate. “Less is more” is the mantra, and the goal is always to ensure audience understanding of the data. It’s considered “simple” in academia because it doesn’t employ most of the advanced methods discussed earlier. But it’s certainly not easy to do, as there are hundreds of factors to consider when designing a visualization, and it takes a lot of practice to get good at it.
Students need to be able to explain data and research findings to non-academic audiences, but they’re not being trained to do so. This leaves them unprepared for a job market that requires data visualization skills, where they will primarily be working with non-academic audiences. After all, academic jobs are on the decline (especially in the humanities and social sciences), and most graduates of Master’s and PhD programs will end up in non-academic jobs.
Moreover, the repercussions of poorly conducted data visualization and the inability to explain statistical findings are potentially far more damaging. How many people read a paper published in an academic journal that uses complex and state-of-the-art statistics to answer a very specific research question? If you’re lucky, maybe a hundred faculty and their students. Many of these papers are even blocked behind a paywall that prevents non-academics from accessing them.
On the other hand, how many people will see and then share or retweet a chart or graph posted online? Websites like Chartr exist solely to create and share data visualizations widely, and there’s even a Reddit community dedicated to data visualization with 16.3 million members.
The potential impact of a good (or bad) data visualization far exceeds the potential impact of a good (or bad) complex research study. I’ve written previously about how easy it is for a visualization to be misleading, even if unintentionally on the part of the designer, and when the data being visualized is about high-impact topics like COVID-19, such misleading elements could lead to major societal problems.
To be clear, I’m not arguing that advanced statistical methods are unimportant. They are critical to advancing science and research. But for the vast majority of the population, advanced methods are much less valuable than the ability to communicate and visualize data. It does no good for students to be able to perform a “latent class analysis” if they are unable to explain the method, demonstrate why the findings are important, and visualize the results to people who have no idea what a latent class analysis is.
Both faculty and students should do their part in enacting change. Higher education curricula, especially in the social sciences, should include data visualization as a requirement, with courses teaching techniques and software such as Tableau or PowerBI. No student should graduate without being able to explain their analysis and visualize their results to a non-academic audience.
At the same time, I would encourage all students, especially those in the social sciences, to supplement their statistical training with self-learning in these topics. There are many widely available free or cheap online courses and training in data visualization. EdX and Udemy are popular platforms for learning data visualization, as are books like Berinato (2016) and Knaflic (2015). Before I started my PhD, I worked in human resources and data analytics for a couple of years; it was here where I discovered just how important data visualization is, and how necessary the skill is for career success.
Advancements in statistical methods will continue, and rightfully so. Academic training thus far has largely kept up with these advances. However, until training in visualization and communication of data catches up with the training in statistical analysis, higher education will likely continue to face a skills gap between graduates’ actual abilities and expectations from employers.
Steven Zhou is a PhD student in organizational psychology at George Mason University, where he studies leadership, teams, and statistics. He previously worked in human resources and data analytics for a large global consumer goods company.