Quay upgrade from 3.3 to 3.6: quay-enterprise-quay-postgres-migration not ready

Solution Verified - Updated

Environment

  • Red Hat Quay 3.3

Issue

  • Upgrading quay from 3.3 to 3.4.x or 3.6.x will get stuck and POD quay-enterprise-quay-postgres-migration-xxxxxx will remain with 1 out of 2 containers in ready: false status (quay-postgres-migration-cleanup)

Resolution

Issue was reported in This content is not included.PROJQUAY-2780

To allow upgrade process to complete one could:

  1. Run the upgrade from 3.3.x to 3.4.x or 3.6.x.
  2. If the quay-postgres-migration-cleanup container fails due to a connection timeout, restart that container after the quay-postgres-migration container has successfully started.
  3. The quay-postgres-migration-cleanup container should run and the upgrade will continue as normal

Another alternative is to wait for the container quay-postgres-migration to start the DB and then access the node where the POD is scheduled onto and stop the cleanup container (crictl stop ) to let the POD start it back after the DB is up.

Another option, although error-prone, is to access the quay-postgres-migration-cleanup container through remote shell (rsh) and run the psql command the container runs manually:

rm -f /tmp/change-username.sql /tmp/check-user.sql; echo" ALTER ROLE \ "$ OLD_DB_USERNAME \" RENAME TO \ "$ NEW_DB_USERNAME \"; ALTER DATABASE \ "$ OLD_DB_NAME \" RENAME TO \ "$ NEW_DB_NAME \"; "> /tmp/change-username.sql; echo" SELECT 1 FROM pg_roles WHERE rolname = '$ NEW_DB_USERNAME'; "> / tmp / check-user .sql; psql -h localhost -f /tmp/check-user.sql | grep -q 1 || psql -h localhost -f /tmp/change-username.sql;

Note: Environment variables should be already set within the container.

This allows the execution of the postgres command to alter the role and db ownership and the upgrade completes.

Root Cause

  • The cleanup command which runs in the quay-postgres-migration-cleanup POD might face a race condition where postgres DB did not start within the first 20 seconds in container quay-postgres-migration and the cleanup container throws:
psql: could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refuse 0/1 and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
  • In container quay-postgres-migration, we see the DB starts later, but the script from cleanup container waits 600 seconds to start over.
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.