Workflow using ARGO & Kubernetes

Before we start you’ll need to have the following packages installed on your computer (I’ve been using Mac -High Sierra 10.13.6)
Requirements
Running a Kubernetes cluster
In order to run Argo you’ll have to run a Kubernetes cluster, Since we’re running it locally we’re going to use minikube.
minikube start --vm-driver=hyperkit --kubernetes-version v1.10.0
minikube will start the cluster passing--vm-driver
which tell minikube to use hyperkit as the virtual machine driver.
Another option is to set the virtual machine driver by default using the command:minikube config set vm-driver hyperkit
- To check if the cluster is running, type:
minikube status
- Once the cluster is up and running you can type:
minikube dashboard
this will open your browser and take you to the K8 dashaboard.
Adding Argo to our Kubernetes cluster
- Argo’s requirements are having Kubernetes 1.9 or later (we’re using 1.10.0)
kubectl
and~/.kube/config
file. - I’ve used brew to install Argo
brew install argoproj/tap/argo
- Create
argo
namespacekubectl create ns argo
- Load the default configuration from the Argo github repository
kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/v2.2.1/manifests/install.yaml

- This step is optional, Let’s give argo admin permission
kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default
Running your first Argo task
argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
the hello-world.yaml represent the task:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
— name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: [“hello world”]
We can print a list of the running argo tasks using argo list
NAME STATUS AGE DURATION
hello-world-j6bpn Succeeded 3m 1m
To see the sepcific task in my case I had to run: argo get hello-world-j6bpn
Name: hello-world-j6bpn
Namespace: default
ServiceAccount: default
Status: Succeeded
Created: Sat Oct 27 17:46:24 -0700 (4 minutes ago)
Started: Sat Oct 27 17:46:24 -0700 (4 minutes ago)
Finished: Sat Oct 27 17:47:46 -0700 (3 minutes ago)
Duration: 1 minute 22 seconds
STEP PODNAME DURATION MESSAGE
✔ hello-world-j6bpn hello-world-j6bpn 1m
To view the logs from the task I had to type argo logs hello-world-j6bpn
Running your second Argo tasks
Look at you, you’re already a pro!
- Submit the tasks:
argo submit — watch https://raw.githubusercontent.com/argoproj/argo/master/examples/loops-maps.yaml
This the same as adding a YAML file with sets of configuration:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: loops-maps-
spec:
entrypoint: loop-map-example
templates:
— name: loop-map-example
steps:
— — name: test-linux
template: cat-os-release
arguments:
parameters:
— name: image
value: “{{item.image}}”
— name: tag
value: “{{item.tag}}”
withItems:
— { image: ‘debian’, tag: ‘9.1’ }
— { image: ‘debian’, tag: ‘8.9’ }
— { image: ‘alpine’, tag: ‘3.6’ }
— { image: ‘ubuntu’, tag: ‘17.10’ }
- name: cat-os-release
inputs:
parameters:
- name: image
- name: tag
container:
image: "{{inputs.parameters.image}}:{{inputs.parameters.tag}}"
command: [cat]
args: [/etc/os-release]
Argo UI
Argo comes with some basic UI, in order to get access to it you’ll need to port forwarding the UI port using: kubectl -n argo port-forward deployment/argo-ui 8001:8001


What’s Next?
Argo is a great framework for processing data, running ETL processes and etc... Its very similar to AirFlow and other workflow frameworks, A good analogy will be, if Airflow is Django Argo is Flask (to a none python developers it mean that AirFlow is battery included compare to Argo).
With a simple UI and a very easy integration with Kubernetes I highly recommend using Argo.
There are number of features Argo support (taken from argo’s github page):
- DAG or Steps based declaration of workflows
- Artifact support (S3, Artifactory, HTTP, Git, raw)
- Step level input & outputs (artifacts/parameters)
- Timeouts (step & workflow level)
- Retry (step & workflow level) and resubmit (memoized)
- Suspend & Resume
- Cancellation
- K8s resource orchestration
- Exit Hooks (notifications, cleanup)
- Garbage collection of completed workflow
- Scheduling (affinity/toleration/node selectors)
- Volumes (ephemeral/existing)
- Parallelism limits
- Daemoned steps
- DinD (docker-in-docker)
- Script steps
Currently, Argo is still kinda new but I’m sure in the next year we’re going to see a whole bunch of container for processing data that run with Argo.
Errors:
- Minikube can’t start a cluster:
Waiting for SSH to be available…
I was trying to create a new or start an existing cluster I get this error:minikube start — logtostderr — v=3 — vm-driver=hyperkit
Starting local Kubernetes v1.10.0 cluster…
Starting VM…
I1221 14:34:17.937881 29522 utils.go:100] retry loop 0
I1221 14:34:17.937982 29522 cluster.go:74] Skipping create…Using existing machine configuration
cluster.go:82] Machine state: Stopped
(minikube) Using UUID ....
(minikube) Generated MAC ....
(minikube) Starting with cmdline: loglevel=3 user=docker console=ttyS0 console=tty0 noembed nomodeset norestore waitusb=10 systemd.legacy_systemd_cgroup_controller=yes base host=minikube
Waiting for SSH to be available… - Solution:
Try to delete the hyperkit process id, stop minikube and delete the cluster
rm ~/.minikube/machines/minikube/hyperkit.pid
minikube stop
minikube delete
I hope you find it useful, Leave your comments below, and I encourage you to read more about Argo in their Github page and come up with your own workflow.