Kubeflow
Per the Kubeflow documentation,
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
Installation
NOTE: As of this writing, the installation guidance only works for a single-node MicroK8s cluster. Do not add the Pi nodes as workers. The first apparent issue: the
hostpath-storage
addon creates a PersistentVolume on the master node (zephyrus
) but the PersistentVolumeClaim (PVC) created by Juju as part of installing Kubeflow maps to a Pi node, meaning the PVC will pend indefinitely.It is also possible that the RAM requirements of Kubeflow would exceed those available on the Pi nodes.
As discussed in Managing a K8s cluster, we will
install
Charmed Kubeflow using juju
.
With microk8s
and juju
already installed, the steps are:
- Bootstrap Juju to MicroK8s, deploying a controller to MicroK8s' Kubernetes:
juju bootstrap microk8s
The contoller is Juju's agent, running on Kubernetes, which can be used to
deploy and control the components of Kubeflow. The controller works with
models
, which map to namespaces in Kubernetes.
- Add a model! For Kubeflow, the model must be named
kubeflow
:
juju add-model kubeflow
- Deploy a Kubeflow bundle. We will deploy a lighter option,
kubeflow-lite
:
juju deploy kubeflow-lite --trust
- Wait for and monitor the deployment process. It can take tens of minutes:
watch -c juju status --color
Dashboard access
With Kubeflow deployed, the next step is dashboard access. The dashboard
is accessed through a central istio-ingressgateway
. We first need to
find the external IP assigned to its LoadBalancer. To do that, run the
following:
k -n kubeflow get svc istio-ingressgateway-workload \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}'
NOTE: Here,
k
is an alias formicrok8s kubectl
.
For this documentation, we will presume the IP is 192.168.1.193
.
Next, set up Dex authentication credentials:
juju config dex-auth static-username=admin
juju config dex-auth static-password=password
NOTE: These are trivial credentials, feel free to choose a more secure password.
Finally, access the dashboard with the IP captured above via browser, and log in using your chosen credentials.
NOTE: As of this writing, only insecure HTTP access is working. Secure HTTP access is being refused. Documentation will be updated when this is resolved.
Challenges
JupyterHub
With JupyterHub, there were a couple problems:
- Version assumptions in the Dashboard. The Kubeflow Dashboard Notebooks
links make assumptions about the API endpoints of the running JupyterLab
servers. Trying to run the latest Jupyter images (e.g.,
tensorflow/tensorflow:latest-jupyter
) causes errors when trying to connect to the running notebook. - Inability to save notebooks. The notebooks ran fine when selecting older images (such as those suggested in the images dropdown) but were not saveable. JupyterLab would regularly display an error modal window indicating that autosave could not be completed because the targeted location on disk was read-only.
Kubeflow Pipelines
Kubeflow Pipelines worked fine when using the Kubeflow Pipelines compiler,
as demonstrated in the
Charmed Kubeflow example.
However, attempts to use the
kfp.Client
class,
ultimately met with SSL errors attempting to interact with the ml-pipeline
service, raising errors like the following:
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1131)
Katib and Tensorboard
The hyperparameter tuning and experiment visualization tools were out of
scope of this exploration. Despite being visible on the Kubeflow Dashboard,
the kubeflow-lite
bundle does not include Katib or Tensorboard controllers.