Harvard Business Review

1/8

Getting Started with Data Science

Welcome to the first issue of HBR’s email series on managing data science! For the next eight weeks, you’ll receive advice from experts in data, analytics, and artificial intelligence on handling the challenges organizations face when managing a data science operation. This week, Cloudera’s Hilary Mason explains how to pick the right data science projects.

BY Hilary Mason

Hilary Mason is the founder of Fast Forward Labs. She is also a data scientist in residence at Accel Partners and was previously the chief scientist at Bitly. @hmason

Here’s how most companies select which data projects to pursue: Management identifies a set of projects it would like to see built and creates the ubiquitous prioritization scatter plot. One axis represents a given project’s value to the business, and the other represents its estimated complexity or cost of development. Management allocates the company’s limited resources to the projects that it believes will cost the least and have the highest business value.

This approach isn’t optimal. Picking the right projects is an essential part of data strategy, and the value/cost scatter plot is, on its own, a recipe for mediocrity.

Data strategy is about more than picking projects, of course. An excellent data strategy starts with a centralized technology investment and well-selected and coordinated defaults for the architecture of data applications. It is specific in the short term and flexible in the long term. And when it comes to picking projects, an excellent data strategy takes into account the fact that data science projects are not independent from one another. With each completed project, successful or not, you create a foundation to build later projects more easily and at lower cost.

Here’s what project selection looks like in a firm with an excellent data strategy: First, the company collects ideas. This effort should be spread as broadly as possible across the organization, at all levels. If you only see good and obvious ideas on your list, you should be worried — it’s a sign that you are missing out on creative thinking. Once you have a large list, filter by the technical plausibility of an idea. Then create the scatter plot described above, which evaluates each project on its relative cost/complexity and value to the business.

Now it gets interesting. On your scatter plot, draw lines between potentially related projects. These connections exist where projects share data resources; where one project may enable data collection that would be helpful to another project; or where foundational work on one project is also foundational work on another. This approach acknowledges the realities of working on data science, like the fact that building a precursor project makes successor projects faster and easier (even if the precursor fails). The costs of gathering data and building shared components are amortized across projects.

This second approach will reveal that some ambitious high-value projects may be more efficient and safer to proceed with than lower-value projects that looked attractive in a naive analysis. Acknowledging that different projects play off one another is the heart of an excellent data strategy.

Read the full article

We also recommend:

How to Spot a Machine Learning Opportunity, Even If You Aren’t a Data Scientist

By Kathryn Hume, HBR

What’s Your Data Strategy?

By Leandro DalleMule and Thomas H. Davenport, HBR

A Short Guide to Strategy for Entrepreneurs

By Kevin J. Boudreau, HBR

Why Businesses Fail at Machine Learning

By Cassie Kozyrkov, Hacker Noon

Ten Red Flags Signaling Your Analytics Program Will Fail

By Oliver Fleming, Tim Fountaine, Nicolaus Henke, and Tamim Saleh, McKinsey & Company

plus

Andrew Ng has a rule of thumb for AI: “If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.” Andrew, who has run AI teams at Stanford, Google, and Baidu, will explore AI adoption in a later issue of this newsletter.

In this 8-part series

1/8

Getting Started with Data Science
By Hilary Mason, general manager of machine learning at Cloudera and founder of Fast Forward Labs

2/8

Coming next
Managing Data Scientists
By Angela Bassa, head of analytics, data science, and machine learning at iRobot

3/8

Building Great Data Products
By Emily Glassberg Sands, vice president of data science at Coursera

4/8

The Kinds of Data Scientist
By Yael Garten, director of Siri Data Science and Engineering at Apple

5/8

Adopting AI
By Andrew Ng, general partner at AI Fund and CEO of Landing AI

6/8

Setting Up an AI Lab
By Foteini Agrafioti, chief science officer at the Royal Bank of Canada and head of Borealis AI

7/8

Curiosity-Driven Data Science
By Eric Colson, chief algorithms officer emeritus at Stitch Fix

8/8

What Analysts Do
By Cassie Kozyrkov, chief decision scientist at Google

Feedback or questions?

Read online

Privacy policy

Manage newsletter preferences

To ensure email delivery, add noreply@a.email.hbr.org to your address book, contacts, or safe sender list.

Copyright © 2021 Harvard Business School Publishing,
an affiliate of Harvard Business School. All rights reserved.

Harvard Business Publishing
20 Guest St, Suite 700, Brighton, MA 02135

ADVERTISEMENT