Academia to Industry: Data Science Myths and Truths

Emily Thompson

October 26, 2015

Emily Thompson
Program Director, Insight

Before I decided to make the jump to a new career in the data science space from my postdoc in particle physics, I had a few doubts about whether or not I’d be a good fit outside of academia. These doubts were compounded by the fact that most of my colleagues and mentors, while supportive, were not in a position to offer unbiased advice to help me understand if I was making the right decision, as they had never worked outside the university/lab setting themselves. This resulted in developing a few pre-conceived notions of what I thought data science was, which I soon came to realize were not so accurate after speaking to more data scientists and leaders in the industry.

During my first 10 months working as a Program Director at Insight with a bird’s-eye view of the data science scene here in Silicon Valley, I’ve gotten a pretty unique view of what it means to be a data scientist. After talking with some of my previous colleagues, as well as a few prospective Insight Fellows, I’ve collected some answers for the concerns often felt by academics wondering about the transition to data science:

“Isn’t a ‘data scientist’ just a fancy way of saying ‘business analyst’? I’ve heard they’re basically the same thing.” It turns out that data science is an extremely broad term, covering many diverse topics. While business analytics still makes up a large portion of data science, data scientists also build data products, create software platforms, develop visualizations and dashboards, and develop new cutting-edge machine learning algorithms. Machine learning itself is applied to many challenges in industry surrounding big data. Tools that historically were used in more traditional academic disciplines are now being deployed in the business space; for example, natural language processing is used for sentiment analysis and to automatically detect content, and deep-learning tools are being used for image recognition.

Maybe the biggest difference between a data “scientist” vs an “analyst” is the level of independence in the role. Old-school business analysts had to be 'given' the data that they needed to work with, already cleaned up and packaged for use. Data scientists must be adept programmers so that they can extract, transform and load data from where it lives in the company, and tend to be a lot less reliant on other teams to do their job.

“Where is data science used? I don’t necessarily want to work on ads ... or be a quant, for that matter.” The domains where data science is being deployed are equally varied and diverse as the field of data science itself. Quantitative finance and advertising might seem like the more traditional industries using data mining that academics transition into, but in recent years with the explosion of raw data being produced, data scientists work on topics covering almost every aspect of life we can think of.

The healthcare industry is undergoing a data revolution, with millions of electronic medical records, vast amounts of genomic data, and drug discovery now being approached with a data-driven perspective. Wearable technologies are bringing health awareness directly to the user, but are also making it possible to collect and analyze huge amounts of aggregated personal data to uncover insights, from how to exercise properly to how sleep affects your mood.

Media is another big domain where data science is being put to use. Big media corporations like News Corp., The New York Times, and Bloomberg are all employing data scientists to attack challenging questions around reader behavior and retention in the digital news age (check out this talk from Rachel Schutt about data science at News Corp.). Netflix uses analytics to recommend movies, but also to help determine what new television shows to produce. Samba TV, a startup in the Bay area, is also employing machine learning techniques for content recommendation, but is also in a unique position to use data collected from their users to form a more accurate picture of television viewership than has ever been achieved before.

Insight Fellows are now working as data scientists in many different industries, from those mentioned above, social networking (Facebook, LinkedIn, Twitter), to gaming (Kixeye, Kabam), to travel (Airbnb, Uber), to apparel (Stitch Fix, Gilt, Trunk Club), to education (Khan Academy, Creative Live, Remind), to internet security (Vectra, Cisco)...the list goes on. So if you’re just beginning to research companies and find data at the core of their products, it’s likely there is a need for a data scientist at almost every one of them.

“I want to make a positive impact in the world ... making more money for a company seems like a conflict of interest.” Working for a for-profit company as opposed to an academic institution doesn’t mean the bottom line is always in conflict with making a positive impact in people’s lives. For example, Premise is a real-time economic data tracking platform that takes millions of observations captured by local contributors on the ground in developing countries. They then use machine learning techniques on this massive dataset to uncover otherwise-unrecognized issues to direct attention to, like helping development banks allocate money to neighborhoods in need or getting unconnected houses on the electric grid. Stitch Fix uses machine learning techniques on their data to pick clothes out of their merchandise inventory that their clients will love. Grand Rounds is a healthcare solution that helps patients get second opinions and find specialized care, which ultimately lead to lower costs. Also in the health sector, the Memorial Sloan Kettering Cancer Center in New York is building a data science team to create data products that doctors can use to recommend better medication to cancer patients.

There are also options for data scientists to work at non-profit organizations. Insight alumni at Khan Academy are using data science for many aspects of the business, like A/B testing new types of problem sets or building predictive models of user success in completing them. In past sessions, Insight Fellows have collaborated with Zidisha, a non-profit microfinance organization building a platform for peer-to-peer lending to developing countries. The Fellows have helped Zidisha find valuable insights surrounding their finance model that have made huge impacts in the way they provide their service.

“I like the freedom of being my own boss in academia. I don’t think I’ll be a good fit in a corporate-structured environment.” While it’s true that the structure in business is quite different from academia, the sort of Mad-Men-esque corporate structure isn’t as prevalent as you might think when it comes to 21st century companies with data at the core of their business. If you are one of the first few employees at a startup company, you have the chance to make an enormous impact on the direction that company takes and will be leading the conversation at a very early stage. Even large companies such as Facebook and LinkedIn are split into smaller working groups, and strive to retain that startup atmosphere with individual teams operating almost like startups themselves. Though there may be a team leader, data science teams are highly collaborative, which is something we strongly emulate in the Insight Fellows program.

The “9 to 5” work day is also evolving...in fact, I’m not sure if I know any data scientist that has such rigid hours. More and more companies are implementing work-from-home policies, and have “unlimited” vacation (which is another way of saying that work hours are flexible...though work-o-holics like myself often have to be told to take time off!). If you’re skeptical about whether or not you really do have the freedom to pursue your ideas under your own terms at a particular company, informational interviews are a great way to find out what a company’s culture and day-to-day working style looks like.

“I feel like I’m taking a huge risk if I leave academia without knowing what the next 10 years of my career looks like. What am I going to do if I work at a company that fails?” There is no way to predict where your career will take you, but then academia isn’t so predictable either. The average amount of time data scientists stay with a particular company is 3-4 years. Data scientists stay in their positions while they are challenged, and it’s expected that after some time, they will seek out new challenges. One great thing about data science in industry is that there are so many options out there, and the field is constantly evolving. Data scientists are in such high demand, if you find yourself at a company that fails, many more opportunities are available. Sometimes whole data science teams are scooped up by other companies.

That being said any time spent at a company, successful or not, is invaluable. One of the most important things you should look for in your first data science job is a collaborative environment where you can learn a ton from your colleagues. In industry, every minute of your time is valued. While you should approach every problem with initiative, you should never expect to sort out an issue all by yourself. When hunting for a job, you should definitely prioritize the team that can provide the best learning experience during your first year outside of academia.

Another important area to focus on while making the transition into industry is to work on building up a strong network. Go to meetups. Attend data conferences. The field of data science doesn’t exist in a vacuum, and when the time comes to look for your next big adventure, you want to be sure you have a large network of peers to give you advice and as well as give you inside information on teams you’re starting to consider.

“Is ‘data science’ a bubble?” I’ve been asked many times about what happens to the role of the data scientist once the tools we use to do analytics are automated and whether or not the role will eventually become obsolete. However, with the amount of data being produced growing exponentially, there’s no clear sign that the need to gain insights from data will slow down anytime soon.

But even while some parts of data science can be automated, I think there will always be a need for scientists to use their skills in industry. Data can be messy, and not applying the right tools or understanding all of the relevant features can lead to misleading results.

In fact, while the term “data science” itself might be a bit of a misnomer to some people (all science has data!), I’d argue that the term “data scientist” has real meaning. Trained scientists know better than anyone that there is an art to understanding data that can take years to master. The first thing my PhD adviser had me do as a graduate student was to look at the data. I recently came across one of the first talks I gave to my research group, and it was full of plots of histograms...most of which were clearly made without understanding what was in them. If I was making those figures now, I’d know to ask myself “do I really expect that distribution to look like that?” or “why is there a crazy feature at 0?” or even “why did I bother making this plot at all?”, etc. The intuition for data analysis built up over years of research is what makes us good scientists, but also makes us the most ideal people to tackle the challenges in this new era of big data.

“I’m worried I don’t have the right skills to be a data scientist.” While having a strong coding ability is important, data science isn’t all about software engineering (in fact, have a good familiarity with python and you’re good to go). Data scientists live at the intersection of coding, statistics, and critical thinking. The most sought-after hard skills, statistics knowledge and coding ability, make up the foundation of a good data scientist’s toolkit. The less definable skills we pick up as scientists-in-training during the PhD program are equally as important, like having the ability to look at data and understand bias, understanding validation, problem solving with messy data often created by someone else, working in a team, and communicating effectively to present results.

Some graduates might worry that unless they have a physics, statistics or computer science degree, they won’t have a shot at becoming a data scientist. This is completely false! A study by June Andrews while she was employed as a data scientist at LinkedIn shows that the distribution of degrees obtained by graduates who are now working as data scientists is extremely varied. Not only does this add to the multidisciplinary nature of data science teams, but additionally, as companies are beginning to use more domain-specific data, there is a growing need for experts in fields such as psychology, medicine, ecology, linguistics, and countless others.

This variety is reflected in the diversity of backgrounds of the Insight Fellows, with any given session being made of PhDs and postdocs from fields as varied as particle physics, neuroscience, biostatistics, and social psychology. As long as you are quantitatively minded, enjoy working with data, and have a curiosity that will lead you ask and answer important questions with data, you’ll have no problem making the leap out of academia into the field of data science.

 

 

Share




Interested in transitioning to a career in data science?
Find out more about the Insight Data Science Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.

Already a data scientist or engineer?
Find out more about our Advanced Workshops for Data Professionals. Register for two-day workshops in Apache Spark and Data Visualization, or sign up for workshop updates.