Data Science Pipelines workaround for an object storage connection with a self-signed certificate

Solution Unverified - Updated

Environment

  • Red Hat OpenShift Data Science
    • Version: < 2.5

Issue

  • Pipeline server fails when using object storage "signed by unknown authority"

Resolution

Summary

This workaround involves 5 high-level steps:

  1. Manually disable object storage health checks.
  2. Store your certificate authority certificate (CA) in a configmap.
  3. Mount that configmap as a volume to the Data Science Pipelines apiserver pod.
  4. Mount that configmap as a volume to the Data Science Pipelines persistence agent pod.
  5. Manually edit the artifact upload script.

Procedure

Navigate to your ODH dashboard and create a new Data Science Project.

image21

Add a data connection that points to your instance of object storage that uses a self-signed certificate. Make sure to have a writable bucket in your object storage, and paste that bucket name into the bucket field in the dialog. Note that I’m using an https url, and that https url does indeed have a self-signed (i.e. not publicly trusted) certificate.

image13

Click the Create a pipeline server button. Select the data connection you just created, and click Configure.

image12

If you wait for it, this pipeline server will fail to come up. Let’s proceed to fix that. (No need to wait for it to fail.)

High level step 1 - Manually disable object storage health checks

Note: This step is optional. If the 'disableHealthCheck' parameter is not present in the DSPA CR, please skip this step and proceed with step 2.

Clicking Configure creates a Data Science Pipelines Application custom resource in your data science project’s namespace.

Switch over to the OpenShift console. In the OpenShift console, navigate to Administration > CustomResourceDefinitions > Data Science Pipelines Application

Select that Data Science Pipelines Application custom resource you just created and disable the object storage health check by setting disableHealthCheck to true. Click Save.

image3 image22

High level step 2 - Store your certificate authority certificate (CA) in a configmap.

Make sure you have your CA certificate saved locally in a file (usually has a .pem file extension). If you don’t have it, this one liner can save it for you:
(replace object_storage_server_address with your object storage server’s address)

openssl s_client -showcerts -connect object_storage_server_address:443 </dev/null 2>/dev/null|openssl x509 -outform PEM > myCA.pem

Navigate to Workloads > ConfigMaps, and click Create ConfigMap. We’ll use the name my-ca. Use myCA.pem for the key. Paste in the contents of the myCA.pem file from above. Click Create.

image9

High level step 3 - Mount that configmap as a volume to the Data Science Pipelines apiserver pod

Still in the OpenShift console, navigate to Workloads > Pods and select your data science project’s namespace. You should see 4 pods, 1 or 2 of which will be in CrashLoopBackoff. If you see only 1 mariadb pod here, make sure you disabled the health check above.

image17

Navigate to Workloads > Deployments. Select the ds-pipeline-pipelines-definition pod (this is the Data Science Pipelines apiserver pod) and scale it to 0 pods.

image15 image19

Navigate back to Workloads > Deployments > ds-pipeline-pipelines-definition. Click the YAML tab. We’re going to paste in two snippets of yaml:

  1. the volume (at the template level)
  2. the volume mount (at the container level)

First, fold the containers block to easily find the volumes element. Paste in this yaml as a new volume, and click Save.

        - name: my-ca
          configMap:
            name: my-ca
            defaultMode: 420
image16 image20

After saving, wait a few seconds, and then click Reload.

Scroll back down to the containers element and fold each container (there are 2).

image11

There are two containers – the apiserver, and an oauth proxy sidecar. We need to add a volume mount to the apiserver container. Expand the first one and make sure its name is name: ds-pipeline-api-server. If for some reason the first container is the oauth sidecar, close this one and expand the second one, and make sure it is the name: ds-pipeline-api-server container. I like to fold some of the blocks to make it fit on one screen.

image24

There is not currently a volumeMounts block, so paste this yaml at the same level as the image block. Click Save.

          volumeMounts:
            - name: my-ca
              mountPath: /etc/ssl/certs/myCA.pem
              subPath: myCA.pem
image14

Click the Details tab and scale up the ds-pipeline-pipelines-definition pod count to 1.

image6

If you navigate to Pods, this pod is no longer CrashLoopBackoff.

image25

High level step 4 - Mount that configmap as a volume to the Data Science Pipelines persistence agent pod

Navigate to Workloads > Deployments > ds-pipeline-persistenceagent-pipelines-definition. Scale this deployment down to 0.

image10

Click the YAML tab. We’re going to paste in two snippets of yaml:

  1. the volume (at the template level)
  2. the volume mount (at the container level)

Fold the only pod definition in containers block.

image2

The volumes element doesn’t exist by default, so create it at the same level as schedulerName. Paste this entire block and click Save.

      volumes:
        - name: my-ca
          configMap:
            name: my-ca
            defaultMode: 420
image4

After saving, wait a few seconds, and then click Reload.

Scroll back down to the containers element and fold some of the high level elements in the container definition.

image1

There is not currently a volumeMounts block, so paste this yaml at the same level as the image block. Click Save.

          volumeMounts:
            - name: my-ca
              mountPath: /etc/ssl/certs/myCA.pem
              subPath: myCA.pem

image7

Click the Details tab and scale up the ds-pipeline-pipelines-definition pod count to 1.

image23

At this point your pipelines server is up. One last step.

High level step 5 - Manually edit the artifact upload script

In the OpenShift console, navigate to Workloads > ConfigMaps > ds-pipeline-artifact-script-pipelines-definition. Scroll down. Copy the contents of the artifact_script element (use the copy to clipboard button) and paste it into an empty text file.

image18

We need to edit the two calls to aws s3. Add --no-verify-ssl between s3 and --endpoint, on both lines that begin with aws s3.

image5

Back to the OpenShift console, navigate to Workloads > ConfigMaps > Create ConfigMap. Name it custom-artifacts-script. Add the key name artifact_script . Paste in the edited script above to the value text box. Click Create.

image26

Navigate to Administration > CustomResourceDefinitions > Data Science Pipelines Application > Instances
And select yours. Click the YAML tab.

Add this yaml block under the apiServer element. Click Save.

    artifactScriptConfigMap:
      name: custom-artifacts-script
      key: artifact_script
image8

You should now be able to create a pipeline, run it, and have the artifacts saved to your object storage.

Root Cause

Background

As of Data Science Pipelines version 1.5.0, it is required to use an object storage connection (“data connection” in the ODH Dashboard) that uses eitherhttpor https with a certificate signed by a publicly trusted certificate authority (commonly called a “valid certificate”). It is not supported to use an object storage connection with a self-signed or otherwise invalid certificate. This document describes a workaround that will allow you to use a self-signed certificate.

Caveats:
We plan to add built-in support for an object storage connection with a self-signed certificate in Data Science Pipelines 1.6, so this document should quickly become outdated.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.