How To Install Kubeflow On OpenShift

In Part 1 of this series “How To Deploy And Use Kubeflow On Red Hat OpenShift”, we discussed what Kubeflow is, and how it can be useful for running Machine Learning applications in production. In this post, we discuss installation of Kubeflow.

Kubeflow’s installation is currently based on ksonnet1, a configurable, typed, templating system for the Kubernetes application developer. The goal of ksonnet is to improve the developer experience by providing options beyond writing YAML or text templating. Additionally, ksonnet supports separation of Kubernetes object definitions from the actual cluster destination, thus simplifying moving a developed system to any platform (image source):

The first step is to install ksonnet following these instructions. In the case of Mac OS, the installation can be done using Homebrew:

$ brew install ksonnet/tap/ks

Once this is done, you can use ks --help to validate that the installation is successful and the ksonnet CLI is on the your PATH.

The next step is to connect to the OpenShift cluster. We’ll use an OpenShift 3.11 cluster for the following steps. Once the OpenShift CLI tool oc is configured correctly, we can create a ksonnet Kubeflow project. Recall that I am using version 0.4.1 of Kubeflow for this blog post series.

Now we can run the following commands to download the installation shell script, kfctl.sh:

$ export KUBEFLOW_SRC=kubeflow
$ cd ${KUBEFLOW_SRC}
$ export KUBEFLOW_TAG=v0.4.1
$ curl https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_TAG}/scripts/download.sh | bash

Now run the following commands to setup Kubeflow:

$ export KFAPP=openshift
$ scripts/kfctl.sh init ${KFAPP} --platform none
$ cd ${KFAPP}
$ ../scripts/kfctl.sh generate k8s

This will copy all Kubeflow artifacts locally and will create a disk layout that should look as follows:

On the top level there are 4 main folders:

  • kubeflow - containing the main kubflow artifacts (the directories correspond to the main kubeflow applications)
  • openshift - containing generated applications (as a result of deployment) in the ks_app subdirectory. This directory is used as a working directory for the main operations (discussed below). Also this directory contains a params.libsonnet file, that defines configuration for all of the components.
  • deployment - containing GKE (Google Kubernetes Engine) specific deployment artifacts.
  • scripts - containing scripts for different operations, including kfctl.sh that we used above.

The Kubeflow install process has been mostly tested on GKE and as a result, in order to deploy it on OpenShift, it is necessary to relax some of the security constraints, running the following commands (giving permissions to the pods to run “as user”)2:

$ oc adm policy add-scc-to-user anyuid -z ambassador -nkubeflow
$ oc adm policy add-scc-to-user anyuid -z jupyter -nkubeflow
$ oc adm policy add-scc-to-user anyuid -z katib-ui -nkubeflow
$ oc adm policy add-scc-to-user anyuid -z default -nkubeflow

Refer to this blog post for what these commands mean. Additionally, the current version of kubeflow is using this image, gcr.io/kubeflow-images-public/tf_operator:v0.4.0, which has bugs. It is necessary to update to the image gcr.io/kubeflow-images-public/tf_operator:latest, where those bugs are fixed3. To update the image version, run the following command:

$ ks param set tf-job-operator tfJobImage gcr.io/kubeflow-images-public/tf_operator:latest

Once this is done, we can use the following command to install Kubeflow on the cluster:

$ ../scripts/kfctl.sh apply k8s

NOTE If you want to see what is installed by this command, take a look inside kfctl.sh. You will see:

ks apply default -c ambassador
 ks apply default -c jupyter
 ks apply default -c centraldashboard
 ks apply default -c tf-job-operator
 ks apply default -c pytorch-operator
 ks apply default -c metacontroller
 ks apply default -c spartakus
 ks apply default -c argo
 ks apply default -c pipeline

You can always comment out some of these lines if you do not want to install some of the components or add additional components.

The installation creates a new project on the cluster called kubeflow and deploys everything there.

Once the installation is complete, you should see the following components running:

  • JupyterHub - a multi-user hub which spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.
  • Tensorflow support including job dashboard and tensorflow operator.
  • Katib - scalable and flexible hyperparameter tuning framework inspired by Google Vizier. The following components are deployed to support Katib - studyjob-controller and vizier application.
  • ML-pipelines - a platform for building and deploying portable and scalable end-to-end ML workflows, based on containers. The following are pipeline components - ml-pipeline, ml-pipeline-persistenceagent, ml-pipeline-scheduledworkflow and ml-pipeline-ui.
  • Ambassador - open source API Gateway built using Envoy.
  • Minio - a high performance distributed object storage server, designed for large-scale private cloud infrastructure.
  • Spartakus - collects usage information in Kubernetes clusters.
  • Argo - a workflow engine for Kubernetes.
  • Central dashboard - Kubeflow navigation UI.
  • Metacontroller - an add-on for Kubernetes that makes it easy to write and deploy custom controllers in the form of simple scripts.

To verify that the installation finished correctly:

1. Verify that none of the pods are failing by going to the OpenShift console and viewing the running pods in the kubeflow project:

2. Go to the ambassador service and create a route:

3. Go to the URL exposed by the route. You should see the main Kubeflow page allowing you to interact with different Kubeflow components:

If you need to delete an existing Kubeflow installation, you can use the same script that you used for installation. Run the following command:

$ ../scripts/kfctl.sh delete k8s

NOTE: The delete script will uninstall all the Kubeflow applications and delete the kubeflow namespace. So any additional installations you have done to this namespace will be deleted.

That’s all for this part. Check out the next post on Kubeflow’s support components, and thanks for reading!

p.s. If you’d like to get professional guidance on best-practices and how-tos with Machine Learning, simply contact us to learn how Lightbend can help.

PART 3: KUBEFLOW SUPPORT COMPONENTS


1 Although the current version of Kubeflow uses ksonnet for installation, this might change moving forward.
2 Here I assume that installation is going to be done in the project kubeflow (default). If you want to install to another project, adjust commands accordingly.
3 At the time of writing, the latest version is the only one available where these fixes exist.

Share



Comments


View All Posts or Filter By Tag