openshift kubeflow deploy machine-learning installation kubernetes tensorflow pipelines data-science ambassador argo minio spartakus

How To Deploy Kubeflow On Lightbend Platform With OpenShift - Part 3: Kubeflow Support Components

Boris Lublinsky Principal Architect, Lightbend, Inc.

In Part 2 of “How To Deploy And Use Kubeflow On OpenShift”, we looked at Installation and some of the additional components used by Kubeflow, such as Ambassador, Spartakus, Argo, Minio, etc. In this post, we will describe and show how to use some of them.

Ambassador

Ambassador is an open source, Kubernetes-native microservices API gateway built on the Envoy Proxy. It is aimed to support multiple, independent teams that need to rapidly publish, monitor, and update services for end users. Ambassador can also be used to handle the functions of a Kubernetes ingress controller and load balancer.

The main features of Ambassador are:

Self-Service via Kubernetes Annotations. Ambassador is built from the start to support self-service deployments - a developer working on a new service doesn't have to go to operations to get their service added to the mesh, they can do it themselves by adding annotations to their service, for example:
```
metadata:
  annotations:
    getambassador.io/config: |-
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: centralui-mapping
      prefix: /
      rewrite: /
      service: centraldashboard.kubeflow
   ………………...
```
Kubernetes-Native Architecture. Ambassador relies entirely on Kubernetes for reliability, availability, and scalability. It persists all state in Kubernetes, instead of requiring a separate database. Scaling Ambassador is as simple as changing the replicas in your deployment, or using a horizontal pod autoscaler.
Authentication. Ambassador supports authenticating incoming requests. When configured, Ambassador will check with a third party authentication service prior to routing an incoming request.
Integrated Diagnostics. Ambassador includes a diagnostics service so that you can quickly debug issues associated with configuring Ambassador. Just go to http://<ambassafor-url>/ambassador/v0/diag/ and you will see the diagnostics screen:

This shows the current Ambassador state and all routes known to Ambassador. To get the information about specific route click on it and you will see this route:

Argo

Argo is an open-source, container-native workflow engine for Kubernetes. It is a foundation for Kubeflow Pipelines (see Part 8 of this series). Kubeflow installs all of the Argo components.

Note: Although Kubeflow installs all of the Argo components, the Argo UI installed by Kubeflow is not available. To enable the Argo UI, go to the argo-ui deployment in the OpenShift console and edit the environment portion in deployment.yaml as follows:

        - env:
            - name: ARGO_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: IN_CLUSTER
              value: 'true'
            - name: ENABLE_WEB_CONSOLE
              value: 'false'
            - name: BASE_HREF
              value: /

Once this is done you can create a route to a argo-ui and use it to access the UI.

You can install Argo locally following these instructions. On MacOS, Argo installation can be done using HomeBrew:

$ brew install argoproj/tap/argo

Before using Argo, it is necessary to create an additional role and role binding. To be able to run Argo successfully, additional RBAC permissions have to be given to the Argo service account under whichworkflow-controller is running. To do this, create the following YAML file defining both the role and role binding and save it locally as argo-role.yaml:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argo-role
  labels:
    app: argo  
rules:
- apiGroups: ["argoproj.io"]
  resources: ["workflows", "workflows/finalizers"]
  verbs: ["get", "list", "watch", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argo-rolebinding
  labels:
    app: argo 
roleRef:
  kind: Role
  name: argo-role
subjects:
  - kind: ServiceAccount
    name: argo

Next, run the following command to install it:

$ oc apply -f argo-role.yaml -n kubeflow

It is also necessary to give additional (privileged) permissions to an argo role using the following command:

$ oc adm policy add-scc-to-user privileged -z argo -nkubeflow

Now we can run the the Argo examples, following these instructions. After starting the examples¹, you will see the following:

$ argo list -n kubeflow
NAME                STATUS      AGE   DURATION
loops-maps-4mxp5    Succeeded   30m   12s
hello-world-wsxbr  Succeeded   39m   15s

You can also get information about specific workflow execution, for example with the following command:

$ argo get hello-world-wsxbr -n kubeflow
Name:                hello-world-wsxbr
Namespace:           kubeflow
ServiceAccount:      default
Status:              Succeeded
Created:             Tue Feb 12 10:05:04 -0600 (2 minutes ago)
Started:             Tue Feb 12 10:05:04 -0600 (2 minutes ago)
Finished:            Tue Feb 12 10:05:23 -0600 (1 minute ago)
Duration:            19 seconds

STEP                  PODNAME            DURATION  MESSAGE
 ? hello-world-wsxbr  hello-world-wsxbr  18s

To get the execution logs, use this command:

$ argo logs hello-world-wsxbr -n kubeflow
 _____________ 
< hello world >
 ------------- 
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/

You can also delete a specific workflow using the following command:

$ argo delete hello-world-wsxbr -n kubeflow

Alternatively you can get the same information using the Argo UI:

You can also look at the details of the flow by clicking a specific workflow:

Minio

Another supported component installed by Kubeflow is Minio, a high-performance distributed object storage server, designed for large-scale private cloud infrastructure. Minio can be deployed in several different configurations:

Single Docker mode when Minio runs in a single Docker container installed on your machine.
Distributed Minio lets you pool multiple drives (even on different machines) into a single object storage server. As drives are distributed across several nodes, distributed Minio can withstand multiple node failures and yet ensure full data protection.
Minio Gateway allows to put S3 APIs on top of Azure Blob Storage, Google Cloud Storage or NAS storage.

Kubeflow installs both the Minio server and UI. This allows you to expose a minio-service as a Route and get to the Minio UI to explore its content:

In addition, you can also install the Minio CLI (mc) on your workstation. For MacOS, use HomeBrew:

$ brew install minio/stable/minio

Configure the Minio access point like this:

$ mc config host add minio http://[your-minio-service-URL] minio minio123

Where the Minio service URL is the URL of the Minio route. This allows you to use mc to access information inside Minio in the cluster:

$ mc ls minio
[2018-12-13 18:23:41 CST]     0B mlpipeline/

Additional mc commands can be found here. Minio also provides support for several languages, including Java, Python and Go. We will show some examples of Minio usage in a Jupyter notebook in the next post.

Spartakus

Project Spartakus aims at collecting usage information about Kubernetes clusters. This information will help the Kubernetes development team to understand what people are doing with a cluster and how to better utilize it.

Spartakus differentiates between two roles (both supported by the same program):

A “volunteer” periodically generates reports using the Kubernetes API and publishes it to the collector.
A collector receives reports from volunteers and stores them in a reporting back-end.

Each cluster you're reporting on is uniquely identified by a user-provided cluster identifier. Reports include a user-provided cluster identifier, the version strings of your Kubernetes master, and some information about each node in the cluster, including the operating system version, kubelet version, container runtime version, as well as CPU and memory capacity. An example of a report can be found here.

Kubeflow installs a Spartakus volunteer with the following configuration:

- volunteer
- '--cluster-id=551974437'
- '--database=https://stats-collector.kubeflow.org'

Change the database URL to set up your own Spartakus database location for data collection.

Additionally, Kubeflow provides support for Istio for TensorFlow Serving (Part 7 in this series). It provides additional metrics for Tensorflow Serving and simplifies updating models. It also provides integration with Prometheus for metrics collection while components run.

That’s all for this part. Check out the next post on using JupyterHub with Kubeflow, and thanks for reading!

p.s. If you’d like to get professional guidance on best-practices and how-tos with Machine Learning, simply contact us to learn how Lightbend can help.

PART 4: KUBEFLOW AND JUPYTERHUB

¹ Coinflip example implementation requires some additional capabilities, so I skipped it. ?

Author

Boris Lublinsky

Principal Architect, Lightbend, Inc.

Boris Lublinsky is a Principal Architect at Lightbend. Boris has over 30 years experience in enterprise, technical architecture, and software engineering. He is an active member of OASIS SOA RM committee, co-author of Applied SOA: Service-Oriented Architecture and Design Strategies (Wiley), Professional Hadoop Solutions (Wiley), Serving Machine Learning Models (O’Reilly) and Kubeflow for Machine Learning: From Lab to production (O’Reilly).

The Total Economic Impact™
Of Lightbend Akka

139% ROI
50% to 75% faster time-to-market
20x increase in developer throughput
<6 months Akka pays for itself

Read the full report

February 28, 2019