Workflow using ARGO & Kubernetes

argo — workflow engine for Kubernetes

Before we start you’ll need to have the following packages installed on your computer (I’ve been using Mac -High Sierra 10.13.6)

Requirements

  • Install Hypervisor (I’ve been using HyperKit)
  • Install KubeCTL
  • Install MiniKube — Great tool for running K8 locally.
  • To make sure everything was installed correctly just open a new terminal window and run: hyperkit -v , kubectl -v , minikube

Running a Kubernetes cluster

In order to run Argo you’ll have to run a Kubernetes cluster, Since we’re running it locally we’re going to use minikube.

  • minikube start --vm-driver=hyperkit --kubernetes-version v1.10.0 minikube will start the cluster passing --vm-driver which tell minikube to use hyperkit as the virtual machine driver.
    Another option is to set the virtual machine driver by default using the command: minikube config set vm-driver hyperkit
  • To check if the cluster is running, type: minikube status
  • Once the cluster is up and running you can type: minikube dashboard this will open your browser and take you to the K8 dashaboard.

Adding Argo to our Kubernetes cluster

After loading Argo configuration you should be able to find it in the Kubernetes dashboard
  • This step is optional, Let’s give argo admin permission
    kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default

Running your first Argo task

argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml

the hello-world.yaml represent the task:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
— name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: [“hello world”]

We can print a list of the running argo tasks using argo list

NAME                STATUS      AGE   DURATION
hello-world-j6bpn Succeeded 3m 1m

To see the sepcific task in my case I had to run: argo get hello-world-j6bpn

Name: hello-world-j6bpn
Namespace: default
ServiceAccount: default
Status: Succeeded
Created: Sat Oct 27 17:46:24 -0700 (4 minutes ago)
Started: Sat Oct 27 17:46:24 -0700 (4 minutes ago)
Finished: Sat Oct 27 17:47:46 -0700 (3 minutes ago)
Duration: 1 minute 22 seconds
STEP PODNAME DURATION MESSAGE
✔ hello-world-j6bpn hello-world-j6bpn 1m

To view the logs from the task I had to type argo logs hello-world-j6bpn

Running your second Argo tasks

Look at you, you’re already a pro!

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: loops-maps-
spec:
entrypoint: loop-map-example
templates:
— name: loop-map-example
steps:
— — name: test-linux
template: cat-os-release
arguments:
parameters:
— name: image
value: “{{item.image}}”
— name: tag
value: “{{item.tag}}”
withItems:
— { image: ‘debian’, tag: ‘9.1’ }
— { image: ‘debian’, tag: ‘8.9’ }
— { image: ‘alpine’, tag: ‘3.6’ }
— { image: ‘ubuntu’, tag: ‘17.10’ }
- name: cat-os-release
inputs:
parameters:
- name: image
- name: tag
container:
image: "{{inputs.parameters.image}}:{{inputs.parameters.tag}}"
command: [cat]
args: [/etc/os-release]

Argo UI

Argo comes with some basic UI, in order to get access to it you’ll need to port forwarding the UI port using: kubectl -n argo port-forward deployment/argo-ui 8001:8001

List of all Argo tasks
Argo specific task (That’s our second example)

What’s Next?

Argo is a great framework for processing data, running ETL processes and etc... Its very similar to AirFlow and other workflow frameworks, A good analogy will be, if Airflow is Django Argo is Flask (to a none python developers it mean that AirFlow is battery included compare to Argo).

With a simple UI and a very easy integration with Kubernetes I highly recommend using Argo.

There are number of features Argo support (taken from argo’s github page):

  • DAG or Steps based declaration of workflows
  • Artifact support (S3, Artifactory, HTTP, Git, raw)
  • Step level input & outputs (artifacts/parameters)
  • Timeouts (step & workflow level)
  • Retry (step & workflow level) and resubmit (memoized)
  • Suspend & Resume
  • Cancellation
  • K8s resource orchestration
  • Exit Hooks (notifications, cleanup)
  • Garbage collection of completed workflow
  • Scheduling (affinity/toleration/node selectors)
  • Volumes (ephemeral/existing)
  • Parallelism limits
  • Daemoned steps
  • DinD (docker-in-docker)
  • Script steps

Currently, Argo is still kinda new but I’m sure in the next year we’re going to see a whole bunch of container for processing data that run with Argo.

Errors:

  • Minikube can’t start a cluster:
    Waiting for SSH to be available…
    I was trying to create a new or start an existing cluster I get this error:
    minikube start — logtostderr — v=3 — vm-driver=hyperkit
    Starting local Kubernetes v1.10.0 cluster…
    Starting VM…
    I1221 14:34:17.937881 29522 utils.go:100] retry loop 0
    I1221 14:34:17.937982 29522 cluster.go:74] Skipping create…Using existing machine configuration
    cluster.go:82] Machine state: Stopped
    (minikube) Using UUID ....
    (minikube) Generated MAC ....
    (minikube) Starting with cmdline: loglevel=3 user=docker console=ttyS0 console=tty0 noembed nomodeset norestore waitusb=10 systemd.legacy_systemd_cgroup_controller=yes base host=minikube
    Waiting for SSH to be available…
  • Solution:
    Try to delete the hyperkit process id, stop minikube and delete the cluster
rm ~/.minikube/machines/minikube/hyperkit.pid
minikube stop
minikube delete

I hope you find it useful, Leave your comments below, and I encourage you to read more about Argo in their Github page and come up with your own workflow.

--

--

--

Rational optimist, Dad, Tech founder, Environmentalist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Implement a serverless CI/CD pipeline with AWS (Amazon Web Services).

Math For Data Science (Propositional Logic) — 1

What Is Alibaba Cloud Service Mesh (ASM)?

The better way to use Geohashing

Organize Your Content Calendar With These 4 Tools

SRE : Millions spent and the fight to become an ‘Anti-fragile’ is still on….

How to reduce your AWS cost — Part 1

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Doron Segal

Doron Segal

Rational optimist, Dad, Tech founder, Environmentalist

More from Medium

Managing databricks cluster policy access using terraform.

Deploying Airflow in Local Kubernetes Cluster: Part II

Running a spark job using spark on k8s operator

Deploying Airflow 2 on EKS using Terraform, Helm and ArgoCD — Part 2/2