At Ramp, we are obsessed with saving our customers time and money. Product velocity is how we get there - making developers 1% faster leads to compounding product improvements for our customers. Meanwhile, machine learning models are famously hard to deploy and monitor. Our first Credit Risk model took nearly 4 months and 3,000 lines of code to deploy it. In addition to slow development, having reproducible and explainable machine learning pipelines is critical to Ramp. As a financial services company, the ability to reproduce our model outputs on demand is a requirement.
To improve velocity and reproducibility, we focused on how to decrease time-to-production for new models. To aid this effort, we created a YAML-based configuration system called Turbo. This post discusses why we built Turbo, Turbo's architecture, and how Turbo allows developers to move faster.
Below, we show a stripped-down configuration a developer could use to train a machine learning model. This toy example represents a common goal at Ramp – predicting a customer's spending in the next 90 days using a regression model. The toy example walks through loading data, processing data into features, partitioning data into train and test folds, training a model, and persisting the model in a model registry.
An example of a configuration file for a machine learning model.
You may notice that this configuration is more than just a single component of a machine learning workflow (such as a model); it is the full end-to-end pipeline. We discuss this pipeline in the next section.
Riffing off Thomas Edison, machine learning is 1% modeling inspiration and 99% data perspiration. Most machine learning discussions focus on the model pipeline but miss the other key component, the feature pipeline. The feature pipeline is responsible for transforming raw data into usable features. This step includes general data cleaning like type casting and filtering, as well as machine learning feature engineering such as imputation, scaling and one-hot encoding.
The end to end flow for an ML model.
The diagram above shows this process from taking in “Raw Data” and generating “Model Outputs”:
Building a reusable and stable feature pipeline is often 99% of the battle. Where does the raw data come from? How do you decide to pre-process it? How are the transformations represented in code? And most importantly, how do you make everything reproducible and explainable?
We could solve all this without building an abstraction. For example, we could create standards and templates to follow, enforcing these standards at review time and allowing each developer to code up their pipelines however they want. However, some truths about ML make an abstraction attractive:
To summarize, we wanted an abstraction that:
We have already highlighted a general overview of the core components of a machine learning workflow. There are two pipelines: a Feature Pipeline and a Model Pipeline.
There are multiple distinct workflows or “jobs” to be done. The two primary ones we built for are:
We have also built other jobs, such as the Hyperparameter Tuning Job, where a developer tunes a model's weights. However, this job can be considered an iterative case of the Train Job, so we will not discuss it further in this post.
The diagram below shows how both types of Jobs share standard processes, i.e. building datasets and pipelines to generate outputs (or predictions). Let's discuss further these two processes.
The goal of building datasets is to build the rows and columns necessary to construct features, which can be considered data preprocessing before doing feature engineering. The flow involves getting raw data and potentially partitioning it for train-validation-test splits. The diagram abstracts the complex aspects of the raw data fetching, such as filtering to a specific range of time or a set of columns the developer wants.
The goal of building pipelines is to create a set of operations that transform raw data into features (Feature Pipeline) and another that transform features into model predictions (Model Pipeline). If the developer is Training, a new pipeline must be created, whereas, for Prediction, they are usually reading in an already trained machine learning pipeline from a database, i.e., a Model Store (such as MLFlow).
The process for building a dataset and model pipeline.
We use a light Domain-Driven Design approach to abstract these processes and find the “Entities” (highlighted in yellow) that encapsulate disjoint functional areas. The key takeaway from this picture is that entities are shared between our Train Job and Predict Job, meaning we can build a unified abstraction for different machine learning jobs.
The entities that encapsulate disjoint functional areas.
We can organize these entities around different questions to answer to explain the responsibility of each entity.
A description of the entities roles.
Turbo is built in Python, utilizing Pydantic for our Entities and Aggregates in an OOO style. All an end user needs to deploy ML workflows is a YAML file and a few lines of code. Implementation details are abstracted away behind a simple interface.
Before diving in, we need to introduce one more concept: JobSpec. This class comprises our entities and materializes python objects from our YAML configuration. The JobSpec is used to communicate with external services, such as submitting a job to run in a compute environment (Jupyter Notebook if run locally, or AWS if run remotely) or storing a model in the model store. The diagram below is a simplified representation of a training job. The JobSpec takes in entities, gets run in some environment, and persists a model pipeline in our model store.
A simplified representation of a training job.
With these abstractions, we can realize the final payoff: reproducible pipelines that are simple to create and run. A training job from a config like the one shown in the introduction can be run from the following lines of code. This same code and configuration file can be run in 6 months, 1 year, or 2 years and will produce the same results, as long as your data hasn't changed.
from turbo import JobType, JobSpec
from turbo.train import run_train_job
# Create orchestrator to build and train pipeline from yaml configuration
job_spec = JobSpec.from_yaml(
job_type=JobType.train,
job_config_yaml='example_train_config.yml'
)
# Train the model and register the model to our model store
run_train_job(job_spec=job_spec)
So far, we have only shown examples of the train job. Our unified framework means a user must only learn one schema for all machine learning jobs. Below, we show how a developer would run a predict job, by modifying a training config (in green) into a predict config (in blue). The key takeaway is that the two configurations leverage the same concepts and data. Training a model, and deploying it for predictions is as painless as copying a YAML file and editing a few lines.
Going from a training config to a predict config is as easy as changing a few lines.
As mentioned at the beginning of the article, our first Credit Risk Model took 4 months and >3,000 lines of code for deployment. Today, a new ML model can be deployed in under 5 minutes and requires, on average, ~40 lines of YAML. In addition to deploying models faster, our interface democratizes model development. We currently have nearly ~20 deployed models built by 9 different developers, many of who have never trained a machine learning model, let alone deployed it into production.
In addition to developer velocity, our configuration has allowed us to create reproducible pipelines. As a financial services company, reproducible model development standards are essential for our bottom line. For example, we use time series models to help us better optimize our cash flow and set company goals. We recently migrated our time series model to the configuration system. Every time series model handled holidays differently, introducing inconsistent forecasts. Holidays are essential for a business card company, as business spending patterns have a large seasonality. By migrating to the configuration system, every time series model has holiday alignment for free from now to the future.
Standards are critical to innovation. An apt historical analogy is the railroad industry and the history of the gauges (or widths) of tracks. At the beginning of the railroad industry, different companies built tracks with different widths. This meant that rail moved fast within a company's tracks, but when switching to other companies' tracks, they would have to slow down. Standardizing gauges led trains to run faster by allowing every train car company to stop focusing on the wheels and start focusing on the cars themselves. Similarly, for machine learning, Turbo allows developers to focus less on the wheels (getting models into production) and more on the cars (all the use cases for the models themselves).
With Turbo, we have standardized Ramp's track for developing and deploying machine learning. Each of those machine learning models furthers our goal to help us save time and money for Ramp's customers and improve Ramp's business outcomes.