A Product Manager’s Guide to MLOps++ (Part 2/2)

A blueprint for product managers for successful ML rollouts at scale

5 min readDec 30, 2021

In part 1 of this article, I covered:

Barriers to operationalizing ML that a product manager may encounter
The ML=software applications mindset that results in problems
The need for a business or user centric rather than system centric approach to MLOps

If you haven’t already read through that article, you can find it here

In this article, I will summarize the framework or construct that I follow to rollout ML-enabled applications in the corporate and commercial banking space.

Exploring MLOps++

While a tech-centric approach may work for e-commerce, social media and other firms with a “digital-only” footprint and in certain industries, large, complex, highly regulated, customer facing institutions with siloed functions and legacy technology like banks need a little “extra” tacked on to traditional MLOps.

I started with what was available, using CRISP-DM and KDDS, and built on aspects that were specific to the domain I am working in. Those of you familiar with Basel norms would recognize the construct: foundation, pillars and oversight. I’ve also built on what’s already available as follows:

Combined data understanding, data preparation and modeling steps from CRISP-DM into one process group
Called out monitoring and feedback, and model maintenance explicitly rather than just planning (as CRISP-DM does)
Added existing SDF processes, as every org. usually has a mature set of processes to develop and rollout traditional software applications, that should be leveraged and built upon for ML products

A sample MLOps++ capabilities framework by Author

Overlaying ML lifecycle stages with tooling by Author

Going into the specifics of MLOps++

Foundation: Platform/tooling and talent

Getting these elements into place before attempting to start on enterprise ML products is paramount. Not considering any of these is the easiest way to fail or stall progress post MVP.
Talent: SMEs provide domain understanding and “common-sense” validation of model results, data science CoE ideates and builds the solution, technology partners support the data science CoE and provide SDF process guidance
Platform/tooling: Everything required to develop, test, rollout, support and maintain an ML product at enterprise scale. A best practice is to build out a strategic tooling stack first as an enabler to all data science products in the org.; this way, systems are standardized, and data, features and models can be seamlessly reused by different teams.

Decisions to use cloud/on-prem/edge/hybrid, necessary granular level approvals from country compliance teams and local regulators, vendor partners should all be decided by a central team beforehand. These tasks when done ad-hoc by data science teams are the biggest time and money drain, add little value and become a maintenance headache.

Execution: The processes

Start with the key business objective, and define a use case. Define:

What is the problem, what is the impact
How the problem is being addressed today, how an ML solution will be better
What is the quantified benefit, what the cost will be
MOSCOW analysis
What are the timelines

2. Assess the use case for suitability for a data science project

What are the regulatory and RAI implications of using ML
Whether we have the tooling and tech maturity to build, deploy and support an ML solution (this should be a non-issue if the foundation is in place)
What are the data requirements, feasibility of getting data in the form, shape, volume required, systems to source from, country-specific compliance and data privacy requirements

3. If proceeding with an ML solution, document the assessment steps to arrive at this conclusion. If not, or if the solution is being put on ice for some reason, list down these reasons. Get agreements/signoffs from relevant stakeholders; this becomes the first set of artifacts for the business case definition phase gate.

4. Get buyin from key stakeholders, secure funding, get the team together, define scope and delivery plans (project initation). You’d generally proceed to setup the project in JIRA, create the necessary initiative, epics, user stories at this stage, and setup Confluence pages to serve as a wiki. Use the artifacts from step (3) to obtain funding and to setup the project in a tool like Clarity at this stage.

5. Create a POC/MVP to build a working prototype that establishes base business value, and is scalable. Data understanding, data preparation, feature engineering, model development and evaluation will be key activities at this stage; monitoring is not necessary if the MVP model is not going to be run in shadow mode.

6. Once the MVP is accepted as successful and go-ahead is obtained, we kick into large-scale buildout and scaling.

Develop on two fronts: the serving application, and the data and machine learning model
Build the ML pipeline: Data engineering, feature engineering, model development, model testing
Integrate and test the application as a whole
Get all necessary approvals: country compliance, regulatory, model sponsor, internal committees, model risk
Deploy the application
Commercialize the application, train users on interpreting results, integrate into business processes
Monitor the application (technical and qualitative)
Gather feedback and Improve the model

Governance

Overlay Governance on all steps to ensure the process is well documented, auditable, traceable and evidenced. Governance broadly addresses the following:

Do we know why and how we built the product we did, and is there accountability and checks and balances for decisions taken and at each phase gate? Have approvals from different stakeholder groups like operational risk, compliance, tech architects, cyber security etc. been obtained where necessary?
Do we have clear ownership of the model?
Does the ML product we built comply with RAI requirements from regulators in the markets it will be used, and also organization data governance and responsible AI use policies?
Is everyone aware of this product, the upstream/downstream impacts?
Were existing process controls fulfilled?

That’s pretty much all there is to it; every product manager should put such a framework into place and use it uniformly across products to simplify process and governance overhead using standardization and avoid any last minute surprises. If there is a data science CoE that manages ML/DL projects across the organization, then such a framework can be maintained and mandated as well.

In all, organizations adopting this approach should see an exponential return on ML investments and far fewer failures due to a key aspect not being considered when setting out.