In Part 8 of “How To Deploy And Use Kubeflow On OpenShift”, we looked at deployment operations using Kubeflow pipelines. In this final part of the series, we look at model management as the last component of Kubeflow that we will describe, ModelDB.
Many organizations build hundreds of models a day, but it is very hard to manage all the models that are built over time. ModelDB is an end-to-end system that tracks models as they are built, extracts and stores relevant metadata (e.g., hyperparameters, data sources) for models, and makes this data available for easy querying and visualization. ModelDB organizes model data in a three-level hierarchy, from bottom to top:
The main use cases for usage of ModelDB include:
ModelDB is not part of the “standard” Kubeflow install. It can be installed using the following command:
$ ks generate modeldb modeldb
$ ks apply default -c modeldb
Once installed, ModelDB can be populated (by writing directly to the ModelDB database) in one of the following ways:
Some usage examples of ModelDB APIs can be found here. We will show a simple notebook (leveraging Light API) to populate Model DB:
#Install ModelDB
!pip install modeldb --upgrade
from modeldb.basic.Structs import (
Model, ModelConfig, ModelMetrics, Dataset)
from modeldb.basic.ModelDbSyncerBase import Syncer
#Create a syncer using a convenience API
syncer_obj = Syncer.create_syncer("Project Name",
"test_user",
"project description",
host="modeldb-backend")
# create Datasets by specifying their filepaths and optional metadata
# associate a tag (key) for each Dataset (value) and synch them
datasets = {
"train" : Dataset("/path/to/train", {"num_cols" : 15, "dist" : "random"}),
"test" : Dataset("/path/to/test", {"num_cols" : 15, "dist" : "gaussian"})
}
syncer_obj.sync_datasets(datasets)
# create the Model, ModelConfig, and ModelMetrics instances and synch them
model = "model_obj"
model_type = "NN"
mdb_model1 = Model(model_type, model, "/path/to/model1")
model_config1 = ModelConfig(model_type, {"l1" : 10})
model_metrics1 = ModelMetrics({"accuracy" : 0.8})
syncer_obj.sync_model("train", model_config1, mdb_model1)
syncer_obj.sync_metrics("test", mdb_model1, model_metrics1)
# actually write it
syncer_obj.sync()
Once this program runs, we can expose modeldb-frontend
service as a route and see what is created. Here is the list of projects
And here is the information about project that we have created
I hope you enjoyed this series on setting up and deploying machine learning on Kubeflow and OpenShift. If you’d like to get professional guidance on best-practices and how-tos with Machine Learning, simply contact us to learn how Lightbend can help.