openshift kubeflow deploy machine-learning installation kubernetes tensorflow pipelines data-science modeldb

How To Deploy Kubeflow On Lightbend Platform With OpenShift - Part 9: Managing Models

Boris Lublinsky Principal Architect, Lightbend, Inc.

Managing Models Using ModelDB

In Part 8 of “How To Deploy And Use Kubeflow On OpenShift”, we looked at deployment operations using Kubeflow pipelines. In this final part of the series, we look at model management as the last component of Kubeflow that we will describe, ModelDB.

Many organizations build hundreds of models a day, but it is very hard to manage all the models that are built over time. ModelDB is an end-to-end system that tracks models as they are built, extracts and stores relevant metadata (e.g., hyperparameters, data sources) for models, and makes this data available for easy querying and visualization. ModelDB organizes model data in a three-level hierarchy, from bottom to top:

Experiment run: every execution of a script/program creates an experiment run.
Experiment: related experiment runs can be grouped into an Experiment (for example, “running hyperparameter optimization for the Neural Network”).
Project: Finally, all experiments belong to a Project (for example, “recommender”).

The main use cases for usage of ModelDB include:

Tracking Modeling Experiments
Versioning Models
Ensuring Reproducibility
Visual exploration of models and results
Collaboration

ModelDB is not part of the “standard” Kubeflow install. It can be installed using the following command:

$ ks generate modeldb modeldb
$ ks apply default -c modeldb

Once installed, ModelDB can be populated (by writing directly to the ModelDB database) in one of the following ways:

Light API (Python) is a way for users to integrate any ML workflow with ModelDB.
Scikit-learn (Python) is the library for the scikit-learn client for ModelDB
Spark.ml (Scala) is a library for Spark that allows you to write Spark ML models to ModelDB.

Some usage examples of ModelDB APIs can be found here. We will show a simple notebook (leveraging Light API) to populate Model DB:

#Install ModelDB
!pip install modeldb --upgrade
from modeldb.basic.Structs import (
    Model, ModelConfig, ModelMetrics, Dataset)
from modeldb.basic.ModelDbSyncerBase import Syncer
#Create a syncer using a convenience API
syncer_obj = Syncer.create_syncer("Project Name", 
   "test_user", 
   "project description", 
   host="modeldb-backend")
# create Datasets by specifying their filepaths and optional metadata
# associate a tag (key) for each Dataset (value) and synch them
datasets = {
    "train" : Dataset("/path/to/train", {"num_cols" : 15, "dist" : "random"}),
    "test" : Dataset("/path/to/test", {"num_cols" : 15, "dist" : "gaussian"})
}
syncer_obj.sync_datasets(datasets)
# create the Model, ModelConfig, and ModelMetrics instances and synch them
model = "model_obj"
model_type = "NN"
mdb_model1 = Model(model_type, model, "/path/to/model1")
model_config1 = ModelConfig(model_type, {"l1" : 10})
model_metrics1 = ModelMetrics({"accuracy" : 0.8})
syncer_obj.sync_model("train", model_config1, mdb_model1)
syncer_obj.sync_metrics("test", mdb_model1, model_metrics1)
# actually write it
syncer_obj.sync()

Once this program runs, we can expose modeldb-frontend service as a route and see what is created. Here is the list of projects

And here is the information about project that we have created

I hope you enjoyed this series on setting up and deploying machine learning on Kubeflow and OpenShift. If you’d like to get professional guidance on best-practices and how-tos with Machine Learning, simply contact us to learn how Lightbend can help.

GO BACK TO PART 1

Author

Boris Lublinsky

Principal Architect, Lightbend, Inc.

Boris Lublinsky is a Principal Architect at Lightbend. Boris has over 30 years experience in enterprise, technical architecture, and software engineering. He is an active member of OASIS SOA RM committee, co-author of Applied SOA: Service-Oriented Architecture and Design Strategies (Wiley), Professional Hadoop Solutions (Wiley), Serving Machine Learning Models (O’Reilly) and Kubeflow for Machine Learning: From Lab to production (O’Reilly).

February 28, 2019

How To Deploy Kubeflow On Lightbend Platform With OpenShift - Part 9: Managing Models

Managing Models Using ModelDB

Author

Boris Lublinsky

Principal Architect, Lightbend, Inc.

Stay up to date on industry trends, analyst insights and all things Lightbend.