Data Science Pipelines workaround for an object storage connection with a self-signed certificate
Environment
- Red Hat OpenShift Data Science
- Version: < 2.5
Issue
- Pipeline server fails when using object storage "signed by unknown authority"
Resolution
Summary
This workaround involves 5 high-level steps:
- Manually disable object storage health checks.
- Store your certificate authority certificate (CA) in a configmap.
- Mount that configmap as a volume to the Data Science Pipelines apiserver pod.
- Mount that configmap as a volume to the Data Science Pipelines persistence agent pod.
- Manually edit the artifact upload script.
Procedure
Navigate to your ODH dashboard and create a new Data Science Project.
Add a data connection that points to your instance of object storage that uses a self-signed certificate. Make sure to have a writable bucket in your object storage, and paste that bucket name into the bucket field in the dialog. Note that I’m using an https url, and that https url does indeed have a self-signed (i.e. not publicly trusted) certificate.
Click the Create a pipeline server button. Select the data connection you just created, and click Configure.
If you wait for it, this pipeline server will fail to come up. Let’s proceed to fix that. (No need to wait for it to fail.)
High level step 1 - Manually disable object storage health checks
Note: This step is optional. If the 'disableHealthCheck' parameter is not present in the DSPA CR, please skip this step and proceed with step 2.
Clicking Configure creates a Data Science Pipelines Application custom resource in your data science project’s namespace.
Switch over to the OpenShift console. In the OpenShift console, navigate to Administration > CustomResourceDefinitions > Data Science Pipelines Application
Select that Data Science Pipelines Application custom resource you just created and disable the object storage health check by setting disableHealthCheck to true. Click Save.
High level step 2 - Store your certificate authority certificate (CA) in a configmap.
Make sure you have your CA certificate saved locally in a file (usually has a .pem file extension). If you don’t have it, this one liner can save it for you:
(replace object_storage_server_address with your object storage server’s address)
openssl s_client -showcerts -connect object_storage_server_address:443 </dev/null 2>/dev/null|openssl x509 -outform PEM > myCA.pem
Navigate to Workloads > ConfigMaps, and click Create ConfigMap. We’ll use the name my-ca. Use myCA.pem for the key. Paste in the contents of the myCA.pem file from above. Click Create.
High level step 3 - Mount that configmap as a volume to the Data Science Pipelines apiserver pod
Still in the OpenShift console, navigate to Workloads > Pods and select your data science project’s namespace. You should see 4 pods, 1 or 2 of which will be in CrashLoopBackoff. If you see only 1 mariadb pod here, make sure you disabled the health check above.
Navigate to Workloads > Deployments. Select the ds-pipeline-pipelines-definition pod (this is the Data Science Pipelines apiserver pod) and scale it to 0 pods.
Navigate back to Workloads > Deployments > ds-pipeline-pipelines-definition. Click the YAML tab. We’re going to paste in two snippets of yaml:
- the volume (at the template level)
- the volume mount (at the container level)
First, fold the containers block to easily find the volumes element. Paste in this yaml as a new volume, and click Save.
- name: my-ca
configMap:
name: my-ca
defaultMode: 420
After saving, wait a few seconds, and then click Reload.
Scroll back down to the containers element and fold each container (there are 2).
There are two containers – the apiserver, and an oauth proxy sidecar. We need to add a volume mount to the apiserver container. Expand the first one and make sure its name is name: ds-pipeline-api-server. If for some reason the first container is the oauth sidecar, close this one and expand the second one, and make sure it is the name: ds-pipeline-api-server container. I like to fold some of the blocks to make it fit on one screen.
There is not currently a volumeMounts block, so paste this yaml at the same level as the image block. Click Save.
volumeMounts:
- name: my-ca
mountPath: /etc/ssl/certs/myCA.pem
subPath: myCA.pem
Click the Details tab and scale up the ds-pipeline-pipelines-definition pod count to 1.
If you navigate to Pods, this pod is no longer CrashLoopBackoff.
High level step 4 - Mount that configmap as a volume to the Data Science Pipelines persistence agent pod
Navigate to Workloads > Deployments > ds-pipeline-persistenceagent-pipelines-definition. Scale this deployment down to 0.
Click the YAML tab. We’re going to paste in two snippets of yaml:
- the volume (at the template level)
- the volume mount (at the container level)
Fold the only pod definition in containers block.
The volumes element doesn’t exist by default, so create it at the same level as schedulerName. Paste this entire block and click Save.
volumes:
- name: my-ca
configMap:
name: my-ca
defaultMode: 420
After saving, wait a few seconds, and then click Reload.
Scroll back down to the containers element and fold some of the high level elements in the container definition.
There is not currently a volumeMounts block, so paste this yaml at the same level as the image block. Click Save.
volumeMounts:
- name: my-ca
mountPath: /etc/ssl/certs/myCA.pem
subPath: myCA.pem
Click the Details tab and scale up the ds-pipeline-pipelines-definition pod count to 1.
At this point your pipelines server is up. One last step.
High level step 5 - Manually edit the artifact upload script
In the OpenShift console, navigate to Workloads > ConfigMaps > ds-pipeline-artifact-script-pipelines-definition. Scroll down. Copy the contents of the artifact_script element (use the copy to clipboard button) and paste it into an empty text file.
We need to edit the two calls to aws s3. Add --no-verify-ssl between s3 and --endpoint, on both lines that begin with aws s3.
Back to the OpenShift console, navigate to Workloads > ConfigMaps > Create ConfigMap. Name it custom-artifacts-script. Add the key name artifact_script . Paste in the edited script above to the value text box. Click Create.
Navigate to Administration > CustomResourceDefinitions > Data Science Pipelines Application > Instances
And select yours. Click the YAML tab.
Add this yaml block under the apiServer element. Click Save.
artifactScriptConfigMap:
name: custom-artifacts-script
key: artifact_script
You should now be able to create a pipeline, run it, and have the artifacts saved to your object storage.
Root Cause
Background
As of Data Science Pipelines version 1.5.0, it is required to use an object storage connection (“data connection” in the ODH Dashboard) that uses eitherhttpor https with a certificate signed by a publicly trusted certificate authority (commonly called a “valid certificate”). It is not supported to use an object storage connection with a self-signed or otherwise invalid certificate. This document describes a workaround that will allow you to use a self-signed certificate.
Caveats:
We plan to add built-in support for an object storage connection with a self-signed certificate in Data Science Pipelines 1.6, so this document should quickly become outdated.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.