openshift kubeflow deploy machine-learning installation kubernetes tensorflow pipelines data-science model-serving

How To Deploy Kubeflow On Lightbend Platform With OpenShift - Part 7: ML Model Serving

Boris Lublinsky Principal Architect, Lightbend, Inc.

Using Kubeflow For ML Model Serving

In Part 6 of “How To Deploy And Use Kubeflow On OpenShift”, we examined Kubeflow ML parameters tuning. In this part, we look at serving ML models in production.

Once your model is built, Kubeflow allows you to serve models. In this post I will focus on TensorFlow (TF) serving. Kubeflow’s implementation treats each deployed model as two components in your application: tf-serving-deployment and tf-serving-service. We can think of the service as a model, and the deployment as the version of the model. For more information on the TF Serving implementation and architecture, please refer to this article.

TF serving is based on a standard saved model format, which defines both the layout of the content of the saved model on disk and the information stored in the model. The layout on disk looks as follows:

Here the model name (test) has subdirectories for every version of the model (1). Each version subdirectory contains a binary protobuf of the model representation (saved_model.pb) and the variables directory contains model variables. The information in the saved model also contains a signature with all required inputs and outputs defining their names, types and shape.

Such a standardized layout provides a foundation for a generic implementation of model loading and serving that is a foundation of TF model serving.

TensorFlow Serving currently provides pointing to the model located either on Google Cloud or on S3. In this post we will use S3.

In order to use S3, you first you need to create a secret that will contain AWS access credentials.

apiVersion: v1
kind: Secret
metadata:
  name: secretkubeflow
data:
  AWS_ACCESS_KEY_ID: xxxxxxxxxx
  AWS_SECRET_ACCESS_KEY: xxxxxxxxxxxxxxxxxxxxx

Note: do not forget that values for the ID and key have to be base64 encoded. To encode a value, run the command echo -n 'xxx' | base64.

Save this YAML to the file, awsaccess.yaml, and deploy the secret using the following command:

$ oc apply -f awsaccess.yaml -n kubeflow

With this in place, the TF serving pod can be generated using the following command:

$ ks generate tf-serving-deployment-aws mnist-v1 --name=mnist

The generation output uses the image tensorflow/serving:1.11.1, which does not work well. Update the tensorflow/serving image version by running the following commands:

$ ks param set mnist-v1 defaultCpuImage tensorflow/serving:1.12.0 
$ ks param set mnist-v1 defaultGpuImage tensorflow/serving:1.12.0-gpu

After the pod is generated, it needs to be configured with some parameters specific to S3:

$ ks param set mnist-v1 modelBasePath s3://fdp-killrweather-data/kubeflow/mnist
$ ks param set mnist-v1 s3Enable true
$ ks param set mnist-v1 s3SecretName awsaccess

Additionally, if your S3 bucket is not in the us-west-1 availability zone, provide the correct s3 connection information, for example:

$ ks param set mnist-v1 s3AwsRegion eu-west-1
$ ks param set mnist-v1 s3Endpoint s3.eu-west-1.amazonaws.com

TF serving now can be deployed using:

$ ks apply default -c mnist-v1

Once the pod starts, it periodically produce error messages to the console. Apparently this is normal.

Then we need to deploy the service, which will run as a separate pod:

$ ks generate tf-serving-service mnist-service
$ ks param set mnist-service modelName mnist
$ ks apply default -c mnist-service

To access model serving, it is also necessary to create a route for the mnist service. When this is done, you can run several commands (REST APIs) that provide information about installed services, including a deployment status (http://<route>/v1/models/mnist/versions/1):

{
  "model_version_status": Array[1][
    {
      "version": "1",
      "state": "AVAILABLE",
      "status": {
        "error_code": "OK",
        "error_message": ""
      }
    }
  ]
}

For model metadata (http://<route>/v1/models/mnist/versions/1/metadata):

{
  "model_spec": {
    "name": "mnist",
    "signature_name": "",
    "version": "1"
  },
  "metadata": {
    "signature_def": {
      "signature_def": {
        "serving_default": {
          "inputs": {
            "image": {
              "dtype": "DT_FLOAT",
              "name": "Placeholder_1:0"
            },
            "key": {
              "dtype": "DT_INT64",
              "name": "Placeholder:0"
            }
          },
          "outputs": {
            "scores": {
              "dtype": "DT_FLOAT",
              "name": "Softmax:0"
            },
            "key": {
              "dtype": "DT_INT64",
              "name": "Identity:0"
            },
            "prediction": {
              "dtype": "DT_INT64",
              "name": "ArgMax:0"
            }
          },
          "method_name": "tensorflow/serving/predict"
        }
      }
    }
  }
}

For the actual model serving, download this sample json and run the following:

$ curl -X POST -d @input.json http:/v1/models/mnist:predict
{
    "predictions": [
        {
            "scores": [0.0185772, 7.83087e-05, 0.00904454, 0.566416, 1.58575e-05, 0.379032, 0.000685176, 0.00700155, 0.0174811, 0.00166794],
            "key": 0,
            "prediction": 3
        }
    ]
}

If you do not have access to S3, you can use Minio instead, which is installed as part of Kubeflow, as discussed in post 3 of these series. In order to do this, you need first to copy model data to Minio. The easiest way to do this is to use Minio’s cp command, for example:

$ mc cp saved_model.pb minio/serving/mnist/1/saved_model.pb

Once this is done, you will need to create a secret containing minio credentials. By default, Kubeflow Minio install uses access_key='minio' and secret_key='minio123'. With this in place configuration of Mnist deployment is as follows:

$ ks param set mnist-v1 modelBasePath s3://serving/mnist
$ ks param set mnist-v1 s3Enable 0
$ ks param set mnist-v1 s3SecretName minioaccess
$ ks param set mnist-v1 s3AwsRegion us-west-1
$ ks param set mnist-v1 s3Endpoint minio-service:9000

This will allow you to run TensorFlow serving using Minio as a backend.

Kubeflow also provides support for additional model serving frameworks:

PyTorch serving based on the seldon-core component.
Seldon serving based on the seldon-core that provides deployment for any machine learning runtime that can be packaged in a Docker container.
Model serving using TRT Inference Server offers a REST and GRPC service for deep-learning inferencing of TensorRT, TensorFlow. and Caffe2 models. The server is optimized to deploy machine and deep learning algorithms on both GPUs and CPUs at scale.

That’s all for this part. Check out the next post on Kubeflow pipelines, and thanks for reading!

p.s. If you’d like to get professional guidance on best-practices and how-tos with Machine Learning, simply contact us to learn how Lightbend can help.

PART 8: DEPLOYING PIPELINES

Author

Boris Lublinsky

Principal Architect, Lightbend, Inc.

Boris Lublinsky is a Principal Architect at Lightbend. Boris has over 30 years experience in enterprise, technical architecture, and software engineering. He is an active member of OASIS SOA RM committee, co-author of Applied SOA: Service-Oriented Architecture and Design Strategies (Wiley), Professional Hadoop Solutions (Wiley), Serving Machine Learning Models (O’Reilly) and Kubeflow for Machine Learning: From Lab to production (O’Reilly).

February 28, 2019

How To Deploy Kubeflow On Lightbend Platform With OpenShift - Part 7: ML Model Serving

Using Kubeflow For ML Model Serving

Author

Boris Lublinsky

Principal Architect, Lightbend, Inc.

Stay up to date on industry trends, analyst insights and all things Lightbend.