Announcing Akka 24.05: More Security. More Performance. More Efficiency. Watch the Webinar Replay
openshift kubeflow deploy machine-learning installation kubernetes tensorflow pipelines data-science model-serving

How To Deploy Kubeflow On Lightbend Platform With OpenShift - Part 7: ML Model Serving

Boris Lublinsky Principal Architect, Lightbend, Inc.

Using Kubeflow For ML Model Serving

In Part 6 of “How To Deploy And Use Kubeflow On OpenShift”, we examined Kubeflow ML parameters tuning. In this part, we look at serving ML models in production.

Once your model is built, Kubeflow allows you to serve models. In this post I will focus on TensorFlow (TF) serving. Kubeflow’s implementation treats each deployed model as two components in your application: tf-serving-deployment and tf-serving-service. We can think of the service as a model, and the deployment as the version of the model. For more information on the TF Serving implementation and architecture, please refer to this article.

TF serving is based on a standard saved model format, which defines both the layout of the content of the saved model on disk and the information stored in the model. The layout on disk looks as follows:

Here the model name (test) has subdirectories for every version of the model (1). Each version subdirectory contains a binary protobuf of the model representation (saved_model.pb) and the variables directory contains model variables. The information in the saved model also contains a signature with all required inputs and outputs defining their names, types and shape

Such a standardized layout provides a foundation for a generic implementation of model loading and serving that is a foundation of TF model serving.

TensorFlow Serving currently provides pointing to the model located either on Google Cloud or on S3. In this post we will use S3.

In order to use S3, you first you need to create a secret that will contain AWS access credentials.

apiVersion: v1
kind: Secret
  name: secretkubeflow
  AWS_ACCESS_KEY_ID: xxxxxxxxxx
  AWS_SECRET_ACCESS_KEY: xxxxxxxxxxxxxxxxxxxxx

Note: do not forget that values for the ID and key have to be base64 encoded. To encode a value, run the command echo -n 'xxx' | base64

Save this YAML to the file, awsaccess.yaml, and deploy the secret using the following command:

$ oc apply -f awsaccess.yaml -n kubeflow

With this in place, the TF serving pod can be generated using the following command:

$ ks generate tf-serving-deployment-aws mnist-v1 --name=mnist

The generation output uses the image tensorflow/serving:1.11.1, which does not work well. Update the tensorflow/serving image version by running the following commands:

$ ks param set mnist-v1 defaultCpuImage tensorflow/serving:1.12.0 
$ ks param set mnist-v1 defaultGpuImage tensorflow/serving:1.12.0-gpu

After the pod is generated, it needs to be configured with some parameters specific to S3:

$ ks param set mnist-v1 modelBasePath s3://fdp-killrweather-data/kubeflow/mnist
$ ks param set mnist-v1 s3Enable true
$ ks param set mnist-v1 s3SecretName awsaccess

Additionally, if your S3 bucket is not in the us-west-1 availability zone, provide the correct s3 connection information, for example:

$ ks param set mnist-v1 s3AwsRegion eu-west-1
$ ks param set mnist-v1 s3Endpoint

TF serving now can be deployed using:

$ ks apply default -c mnist-v1

Once the pod starts, it periodically produce error messages to the console. Apparently this is normal.

Then we need to deploy the service, which will run as a separate pod:

$ ks generate tf-serving-service mnist-service
$ ks param set mnist-service modelName mnist
$ ks apply default -c mnist-service

To access model serving, it is also necessary to create a route for the mnist service. When this is done, you can run several commands (REST APIs) that provide information about installed services, including a deployment status (http://<route>/v1/models/mnist/versions/1):

  "model_version_status": Array[1][
      "version": "1",
      "state": "AVAILABLE",
      "status": {
        "error_code": "OK",
        "error_message": ""

For model metadata (http://<route>/v1/models/mnist/versions/1/metadata):

  "model_spec": {
    "name": "mnist",
    "signature_name": "",
    "version": "1"
  "metadata": {
    "signature_def": {
      "signature_def": {
        "serving_default": {
          "inputs": {
            "image": {
              "dtype": "DT_FLOAT",
              "name": "Placeholder_1:0"
            "key": {
              "dtype": "DT_INT64",
              "name": "Placeholder:0"
          "outputs": {
            "scores": {
              "dtype": "DT_FLOAT",
              "name": "Softmax:0"
            "key": {
              "dtype": "DT_INT64",
              "name": "Identity:0"
            "prediction": {
              "dtype": "DT_INT64",
              "name": "ArgMax:0"
          "method_name": "tensorflow/serving/predict"

For the actual model serving, download this sample json and run the following:

$ curl -X POST -d @input.json http:/v1/models/mnist:predict
    "predictions": [
            "scores": [0.0185772, 7.83087e-05, 0.00904454, 0.566416, 1.58575e-05, 0.379032, 0.000685176, 0.00700155, 0.0174811, 0.00166794],
            "key": 0,
            "prediction": 3

If you do not have access to S3, you can use Minio instead, which is installed as part of Kubeflow, as discussed in post 3 of these series. In order to do this, you need first to copy model data to Minio. The easiest way to do this is to use Minio’s cp command, for example:

$ mc cp saved_model.pb minio/serving/mnist/1/saved_model.pb

Once this is done, you will need to create a secret containing minio credentials. By default, Kubeflow Minio install uses access_key='minio' and secret_key='minio123'. With this in place configuration of Mnist deployment is as follows:

$ ks param set mnist-v1 modelBasePath s3://serving/mnist
$ ks param set mnist-v1 s3Enable 0
$ ks param set mnist-v1 s3SecretName minioaccess
$ ks param set mnist-v1 s3AwsRegion us-west-1
$ ks param set mnist-v1 s3Endpoint minio-service:9000

This will allow you to run TensorFlow serving using Minio as a backend.

Kubeflow also provides support for additional model serving frameworks:

That’s all for this part. Check out the next post on Kubeflow pipelines, and thanks for reading!

p.s. If you’d like to get professional guidance on best-practices and how-tos with Machine Learning, simply contact us to learn how Lightbend can help.


The Total Economic Impact™
Of Lightbend Akka

  • 139% ROI
  • 50% to 75% faster time-to-market
  • 20x increase in developer throughput
  • <6 months Akka pays for itself