There are nowhere near enough “data scientists.”
It is currently accepted that the people who understand Big Data – the enormous datasets of information being collected with nearly every click of every computing device on the planet – will rule the roost in the future. If you can predict behavior by measuring and monitoring people’s machines down to an almost atomic level, you can make both your customers and your shareholders much happier.
There’s just one problem: outside of companies like Google that have long made use of rich rosters of PhDs, there are nowhere near enough “data scientists” — graduate-level candidates with backgrounds in machine learning or statistics — to analyze the massive streams of information that are being produced, and that gap is growing by the day.
Cofounder Ali Behnam of Riviera Partners, a Palo Alto-based technology recruiting firm, says he’s aware of “thousands” of data science jobs that are awaiting candidates at nearly every type of Bay Area-based tech company. (Even Riviera Partners employs a couple.)
And there are exponentially more nationally, according to McKinsey & Co., whichestimates that the U.S. has roughly 140,000 to 190,000 fewer people with analytic expertise than it needs, and that things are going to grow worse. To wit, McKinsey projects that by 2018, the U.S. will need 60 percent more people with advanced degrees in statistics of machine learning than will be available.
“Not a lot of engineers grow up saying, ‘Gee, I want to be a data scientist,’” notes Teri McFadden, a recruiting VP at Norwest Venture Partners. She says that there are “nowhere near” enough candidates with data science backgrounds to fill the openings she sees at Norwest’s portfolio companies.
Solutions to the situation are far from clear-cut. While some startups with deep pockets can pay top dollar now, the rates are rising so fast that even they may not be able to keep up.
Behnam says pay for data scientists has rocketed from $125,000 to $150,000 two years ago to upwards of $225,000 these days – even for those straight out of school. In cases where a data scientist has a few years of experience in the working world, pay can reach even more dizzying levels. Stephen Purpura, the co-founder and CEO of Seattle-based software company Context Relevant, recently lost out on one job candidate who had with a PhD and seven years of work experience. He was dying to land the guy. As he puts it, “These people are almost like unicorns.” But out of the blue, Microsoft came knocking with an offer of $650,000 in annual salary and guaranteed bonuses. “We can’t compete with that kind of offer,” says Purpura.
For those hoping that equity can bridge the gap, it can’t, say those doing the wooing. “A lot of these folks don’t have that entrepreneurial, I’m-going-to-make-a-bazillion-dollars-at-a-startup type mentality,” observes McFadden. “Their motivations aren’t the same as many software engineers.” It’s something that venture capitalist Venky Ganesan of Globespan Capital Partners has seen repeatedly in his firm’s attempts to help its portfolio companies. (He said two data scientists recently joined two of Globespan’s portfolio companies, leaving a whopping 18 slots to fill.)
Says Ganesan, “Data scientists look at the data [around job stability at companies like Google] and they look at the data on startups, and they understand, probably better than most people, that betting on a startup is like buying a lottery ticket, so why do that?”
Some say that while problematic, the talent shortage isn’t dire, exactly. One chief scientist at an online advertising company who asked not to be named says that while “everyone talks about data scientists and how much they’re needed, [the digital media industry] isn’t Wall Street [which has long employed and is heavily dependent on data science]. There’s a lot of need [for data science], but there’s more perceived need than really exists.”
A common mistake for startups, says this person, is trying to “hire the smartest people in the world” without first ensuring that they have good data to work with — data that he says can be “cleaned” and “normalized” by “reasonably smart” people without advanced degrees, as long as they’ve been trained properly. “Our world is nascent, and the data is all over the place, but people are in a rush to hire all these PhDs,” he says. “It’s like putting crude oil into a car: you need to do a lot of things before you create gasoline, or the car is just going to blow up.”
Ganesan also suggests that companies can “solve” the data scientist shortage by casting a wider net. “Instead of saying, ‘I need someone who’s familiar with ad data,’ [entrepreneurs should be] looking for someone who understands large data sets, and PhDs in biocomputation, statistics, and physics all understand these statistical principles.” Indeed, among those who Ganesan suggests that startups pursue are professors, for whom a data scientist job would likely be a “1.5x to 2x change in salary, considering that some of these guys are making $100,000 a year.”
“What really motivates these people are the types of problems a company is trying to solve,” says Ganesan. “If there’s a new data set. and they feel like they can use their skills to make an impact, that gets them excited.”
Still, it can be very hard to dislodge professors from their chosen profession, and they’re often not a good fit for the startup world. Purpura, who has advanced degrees from Harvard and expects to complete a PhD in information science from Cornell this year, recently hired a professor but he wouldn’t hire many others, he says.
“Professors run on a different schedule than the rest of us,” he says. “Most also stop writing code when they become professors, so getting performance out of a professor might be more difficult than you’d think it would be. There’s a limit to how much [other employees] will listen to [someone who doesn’t write code].”
And as for the suggestion that undergraduates can handle most responsibilities provided they are dealing with “clean” data, Purpura argues that there’s a vast difference between people with undergraduate degrees and PhDs. “A PhD from a great school is forced to go through competitive learning about theory and the reasons that things work versus don’t work. Undergrads don’t get that experience, and industry – which wants solutions yesterday – doesn’t incent them to do it.
“Even when we interview people who’ve worked at Bing or Google for years, we find that those without advanced training have a very difficult time generalizing, so the solutions they build are very short-sighted. They can solve the problem at hand, but they don’t anticipate the problems that will occur a year or even six months down the road.”
Of course, Purpura has a vested interest in believing the data scientist shortage isn’t something that can be solved through hiring. His 11-month-old company is selling prepackaged applications intended to allow companies to solve specific, data-related business problems without specialists like himself.
He launched the business, he says, because the world is “dividing between [data scientists] who are getting paid huge salaries and those who … are being asked to find ways to step up. Would you rather risk finding the former or use tools that reduce your risk and let you get more value out of the people you have? We’re betting on the latter, because the former is unsustainable.”