This page looks best with JavaScript enabled

To be agile, or not to be, that is the question

Should data science embrace Agile? Why and why not?

 ·  ☕ 4 min read

I guess the answer depends on whom you ask.

I have seen many Data Scientists bitterly oppose Agile and Scrum:

The crux of the argument is that Data Science is science and not engineering. Therefore:

  • Estimating the time requirement is very difficult.

  • Its nature is not iterative: unlike software, you can’t build a piece that partly works, and then fill in more pieces to make it more complete.

  • Its nature is water-fall: when an idea doesn’t work well, you might have to go back all the way to tweaking the problem formulation and collecting a different kind of data.

  • Agile means more meetings (stand up, sprint planning, retrospective, etc.) and less work.

  • Agile means constant change of priorities (as a consequence of constantly evolving understanding of requirements and business needs).

  • Agile Methodology makes you mechanical and hinders creativity.

In some sense, and to some extent, all of it is true.

Déjà vu for “old” enough Software Engineers.

Interestingly, software engineers who are old enough will feel déjà vu. Programmers had the same arguments in the late 90s:

  • Programming is part art and part science. It is a highly creative process.

  • Estimating software development efforts is a notoriously hard problem.

  • When you discover a problem in the software design, often you have to go back to the very beginning (i.e. it’s waterfall-ish).

  • Do you want me to sit in so many meetings for requirement review, design, estimate, integration plan, test plan, or do you want me to code and finish the stuff?

And here we are! Now most developers follow some kind of iterative process, and data scientists often think that engineers and managers don’t get “science and research”.

Just as then software was (and is) just a means to an end, even now data science and machine learning are means to the business goals.

So, what can we do?

I believe that with time, we will figure out how to manage the unpredictabilities of data science better, just as we figured that out for software development.

First, let’s step back and revisit the Agile manifesto:

  • Individuals and interactions over process and tools

  • Working software over comprehensive documentation

  • Customer collaboration over contract negotiation

  • Response to change over following a plan

Instead of rituals or Agile, we need to go back to the essence and adapt it to machine learning.

In my experience, I have found that the following helps avoiding ML projects failures and improves the chances of successfully deploying ML models:

  • Consolidate Ownership: Cross-functional team of product, developers, and data scientists responsible for the end-to-end project.

  • Integrate Early: Implement a simple (rule-based or dummy) model and develop end-to-end product feature first.

  • Iterate Often: Build better models and replace the simple model, monitor, and repeat.

Consolidating into a single team cross-pollinates data scientists and developers of each-others requirements early on.

Counterintuitively, integrating early actually decouples model and software development (that great software engineering principle: cohesion over coupling), and follow a different cadence yet being in the same rhythm.

It has started.

MLOps for managing Machine Learning Lifecycle is for avoiding risks associated with waterfall development. There are several voices for adapting agile to data science and machine learning. Some of it is already happening:

So, what do you think? What parts of Agile philosophy and process are suitable to adopt in data science and for taking machine learning to production?

Is Agile roller coaster suitable for data scientists?
Is Agile roller coaster suitable for data scientists

Photo by Matt Bowden on Unsplash

Share on

Satish Chandra Gupta
Satish Chandra Gupta
Data/ML Practitioner