Your First Workflow¶

This guide will cover writing your first workflow. The focus will be on the structure of the workflow and the pitfalls involved in making one, rather than any specific science. But if you'd rather skip straight to the finished template, you can find it at the end!

Boilerplate to start¶

Start with the following boilerplate. With this, you can jump straight to adding your steps/logic.

There are three significant parts to this boilerplate. Note that the kind is currently Workflow - not ClusterWorkflowTemplate at this point. This is for convenience to allow you to submit the workflow directly when testing.

1) Metadata¶

At the top we see a metadata section. Here you must name the template; the name must be unique for the namespace (i.e. session ID) you are using. There is also the option to add a science-group, title and description. These fields are all optional but it is recommended to fill them in as they're used when filtering and searching for templates. At the time of writing, these are the only fields, but these will be expanded in future.

2) The Entrypoint¶

All workflows need an entrypoint to start from. In workflows where only one step is used, the entry-point can be a normal template. Here, however, we use a dag (directed acyclic graph). The dag is itself a template that lets us link multiple other templates, identified as tasks.

3) Volume Mounts¶

For the sake of security, no workflow can write to its own filesystem - but in practice, almost all will need to. To facilitate this, add one (or more) volumeClaimTemplates to the workflow spec. When using the volumeClaimTemplate in multiple steps, the steps will share the same files.

Volume Mounts

When creating artifacts, you must use a volume mount to store them in. Be careful not to overwrite artifacts in parallel steps - particularly if reusing the same template multiple times or using loops.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: boilerplate-example
  labels:
    workflows.diamond.ac.uk/science-group: workflows-examples
  annotations:
    workflows.argoproj.io/title: boiler-plate-example for docs
    workflows.argoproj.io/description: |
        This is an example demo-ing the boilerplate

spec:
  entrypoint: workflow-entry
  volumeClaimTemplates:
  - metadata:
      name: tmpdir
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path

  templates:
  - name: step-one
    script:
      image: busybox
      command: [bash]
      source: |
        echo "Hello world"

  - name: workflow-entry
    dag:
      tasks:
      - name: step-one
        template: step-one

Validating The Workflow¶

Now you have the boilerplate, it is worth knowing how to check if the workflow is actually correct. Load the workflows module to access the workflows tool.

module load workflows
workflows lint workflow.yaml

If all has gone to plan, this should return no linting errors.

Adding Some Functionality¶

The goal is to create a workflow that will accept some inputs, do some pre-processing and then run analysis on the result of that pre-processing.

The workflow will take in a start, stop and step and then plot a sine function across that range. This will be built from the ground-up, starting first with the individual steps and then linking together those steps to form a complete workflow.

1) Writing The Tasks¶

First, we must install required dependencies. I am using scripts here, rather than containers for clarity, but the actual functionality can come from a range of inputs - including local-files. Remember the files-system is readonly - so we must create a venv in a writeable location and then install the requirements into that.

No Caching

Images often will not have a .cache directory, which can lead to issues when installing packages using pip. To avoid this, use the --no-cache-dir flag, as seen below.

  - name: install-dependencies
    script:
      image: python:3.10
      volumeMounts:
      - name: tmpdir
        mountPath: /tmp
      command: [bash]
      source: |
        python -m venv /tmp/venv
        /tmp/venv/bin/pip install --no-cache-dir numpy matplotlib

Now we have have our venv configured, we can do some pre-processing to inform the next step. The pre-processing step accepts our bounds and step as we described above. Here I print the range and output it as an artifact for convenience, but this isn't necessary. See the creating artifacts page for further information about this.

There are many ways to pass different types of information between workflow steps, used for both data-transfer and conditional steps/looping. See the examples section for more complex demonstrations of these.

  - name: pre-processing
    inputs:
      parameters:
        - name: start
        - name: stop
        - name: step
    script:
      image: python:3.10
      volumeMounts:
      - name: tmpdir
        mountPath: /tmp
      command: [/tmp/venv/bin/python]
      source: |
        import numpy as np
        import json
        start = {{inputs.parameters.start}}
        stop = {{inputs.parameters.stop}}
        step = {{inputs.parameters.step}}

        vals = np.arange(start,stop,step).tolist()
        with open("/tmp/data.json", "w") as f:
          json.dump(vals, f)
    outputs:
      artifacts:
      - name: gridPoints
        path: /tmp/data.json
        archive:
          none: { }

Now its time to plot a figure. This step will expect a list of gridPoints and compute the according function, then plot and save the figure.

  - name: plot-the-figure
    inputs:
      parameters:
        - name: gridPoints
    script:
      image: python:3.10
      volumeMounts:
      - name: tmpdir
        mountPath: /tmp
      command: [/tmp/venv/bin/python]
      source: |
        import matplotlib.pyplot as plt
        from math import sin
        import json

        with open("/tmp/data.json", "r") as f:
          x = json.load(f)
        y = [sin(val) for val in x]
        plt.plot(x,y)
        plt.savefig("/tmp/output_fig.png")
    outputs:
      artifacts:
      - name: sin-figure
        path: "/tmp/output_fig.png"
        archive:
          none: { }

The bulk of the work is now done, so we can now fill in the DAG from our original boilerplate. This workflow has three steps with linear dependencies, so arranging this shouldn't be difficult. Note the templating in the pre-processing step. By defining the parameter in the top level arguments, these can be easily overwritten when re-using the template. The respective arguments can be seen below.

  - name: workflow-entry
    dag:
      tasks:
      - name: install-dependencies
        template: install-dependencies

      - name: pre-processing
        dependencies: [install-dependencies]
        template: pre-processing
        arguments:
          parameters:
          - name: start
            value: "{{workflow.parameters.start}}"
          - name: stop
            value: "{{workflow.parameters.stop}}"
          - name: step
            value: "{{workflow.parameters.step}}"

      - name: plot-figure
        dependencies: [pre-processing]
        template: plot-the-figure

spec:
  entrypoint: workflow-entry
  arguments:
    parameters:
      - name: start
        value: "0"
      - name: stop
        value: "10"
      - name: step
        value: "0.1"

Now we have a whole workflow! After putting all the parts together and linting it, you can run it to verify the behavior. This test workflow should run on the Argus cluster, so enter the command

module load argus

Also load the workflows module, if you haven't already. Submit the workflow with the argo tool, using your session ID as the namespace:

argo submit formatted-workflow.yaml -n <SESSION-ID> --server https://kubernetes.workflows.diamond.ac.uk

Once validated, switch the kind to ClusterWorkflowTemplate and that should be finished!

The whole workflow can be viewed below

The Complete Workflow¶

Complete Workflow Template

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: boilerplate-example
  labels:
    workflows.diamond.ac.uk/science-group: workflows-examples
  annotations:
    workflows.argoproj.io/title: boiler-plate-example for docs
    workflows.argoproj.io/description: |
        This is an example demo-ing the boilerplate

spec:
  entrypoint: workflow-entry
  arguments:
    parameters:
      - name: start
        value: "0"
      - name: stop
        value: "10"
      - name: step
        value: "0.1"
  volumeClaimTemplates:
  - metadata:
      name: tmpdir
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path

  templates:
  - name: install-dependencies
    script:
      image: python:3.10
      volumeMounts:
      - name: tmpdir
        mountPath: /tmp
      command: [bash]
      source: |
        python -m venv /tmp/venv
        /tmp/venv/bin/pip install numpy matplotlib

  - name: pre-processing
    inputs:
      parameters:
        - name: start
        - name: stop
        - name: step
    script:
      image: python:3.10
      volumeMounts:
      - name: tmpdir
        mountPath: /tmp
      command: [/tmp/venv/bin/python]
      source: |
        import numpy as np
        import json
        start = {{inputs.parameters.start}}
        stop = {{inputs.parameters.stop}}
        step = {{inputs.parameters.step}}

        vals = np.arange(start,stop,step).tolist()
        with open("/tmp/data.json", "w") as f:
          json.dump(vals, f)
    outputs:
      artifacts:
      - name: gridPoints
        path: /tmp/data.json
        archive:
          none: { }

  - name: plot-the-figure
    script:
      image: python:3.10
      volumeMounts:
      - name: tmpdir
        mountPath: /tmp
      command: [/tmp/venv/bin/python]
      source: |
        import matplotlib.pyplot as plt
        from math import sin
        import json

        with open("/tmp/data.json", "r") as f:
          x = json.load(f)
        y = [sin(val) for val in x]
        plt.plot(x,y)
        plt.savefig("/tmp/output_fig.png")
    outputs:
      artifacts:
      - name: sin-figure
        path: "/tmp/output_fig.png"
        archive:
          none: { }

  - name: workflow-entry
    dag:
      tasks:
      - name: install-dependencies
        template: install-dependencies

      - name: pre-processing
        dependencies: [install-dependencies]
        template: pre-processing
        arguments:
          parameters:
          - name: start
            value: "{{workflow.parameters.start}}"
          - name: stop
            value: "{{workflow.parameters.stop}}"
          - name: step
            value: "{{workflow.parameters.step}}"

      - name: plot-figure
        dependencies: [pre-processing]
        template: plot-the-figure