Here’s how most companies select which data projects to pursue: Management identifies a set of projects it would like to see built and creates the ubiquitous prioritization scatter plot. One axis represents a given project’s value to the business, and the other represents its estimated complexity or cost of development. Management allocates the company’s limited resources to the projects that it believes will cost the least and have the highest business value. |
|
1/8 |
|
|
|
|
Getting Started with Data Science |
|
Welcome to the first issue of HBR’s email series on managing data science! For the next eight weeks, you’ll receive advice from experts in data, analytics, and artificial intelligence on handling the challenges organizations face when managing a data science operation. This week, Cloudera’s Hilary Mason explains how to pick the right data science projects.
|
|
|
|
|
|
|
|
|
|
|
|
|
Hilary Mason is the founder of Fast Forward Labs. She is also a data scientist in residence at Accel Partners and was previously the chief scientist at Bitly. @hmason |
|
|
Here’s how most companies select which data projects to pursue: Management identifies a set of projects it would like to see built and creates the ubiquitous prioritization scatter plot. One axis represents a given project’s value to the business, and the other represents its estimated complexity or cost of development. Management allocates the company’s limited resources to the projects that it believes will cost the least and have the highest business value.
This approach isn’t optimal. Picking the right projects is an essential part of data strategy, and the value/cost scatter plot is, on its own, a recipe for mediocrity.
Data strategy is about more than picking projects, of course. An excellent data strategy starts with a centralized technology investment and well-selected and coordinated defaults for the architecture of data applications. It is specific in the short term and flexible in the long term. And when it comes to picking projects, an excellent data strategy takes into account the fact that data science projects are not independent from one another. With each completed project, successful or not, you create a foundation to build later projects more easily and at lower cost.
Here’s what project selection looks like in a firm with an excellent data strategy: First, the company collects ideas. This effort should be spread as broadly as possible across the organization, at all levels. If you only see good and obvious ideas on your list, you should be worried — it’s a sign that you are missing out on creative thinking. Once you have a large list, filter by the technical plausibility of an idea. Then create the scatter plot described above, which evaluates each project on its relative cost/complexity and value to the business.
Now it gets interesting. On your scatter plot, draw lines between potentially related projects. These connections exist where projects share data resources; where one project may enable data collection that would be helpful to another project; or where foundational work on one project is also foundational work on another. This approach acknowledges the realities of working on data science, like the fact that building a precursor project makes successor projects faster and easier (even if the precursor fails). The costs of gathering data and building shared components are amortized across projects.
This second approach will reveal that some ambitious high-value projects may be more efficient and safer to proceed with than lower-value projects that looked attractive in a naive analysis. Acknowledging that different projects play off one another is the heart of an excellent data strategy.
|
|
Read the full article |
|
|
|
|
|
|