Satellite manifest refresh task gets blocked by existing jobs
Environment
- Red Hat Satellite 6
Issue
-
Satellite Manifest refresh task getting failed with
Job blocked by the following existing jobs:error. -
The following error message could be seen in
/var/log/foreman/production.log:2024-09-10T13:53:13 [E|bac|6bbcd1c7] Job blocked by the following existing jobs: 8a828eb78b1513ee018b4226046f1285 (Katello::Errors::UpstreamCandlepinError)
Resolution
SOLUTION 1
- Download a fresh copy of the manifest using This content is not included.Red Hat Customer Portal and upload it to
Red Hat SatelliteServer. If still error exists proceed with theSolution 2.
SOLUTION 2
-
If the
Actions::Candlepin::Owner::StartUpstreamExportdynflow step of the manifest refresh task is the failing one:-
Verify whether the blocking job is running on the Satellite server or not, by running the following PostgreSQL query on the Satellite server and examining the output:
# su - postgres -c "psql candlepin -c \"select * from cp_async_jobs where id = '8a828eb78b1513ee018b4226046f1285' ;"\"""Replace
8a828eb78b1513ee018b4226046f1285in the above query with the ID of the job in the error message. -
If the query in step (1) above returns 0 rows like this:
# su - postgres -c "psql candlepin -c \"select * from cp_async_jobs where id = '8a828eb78b1513ee018b4226046f1285' ;"\""" id | created | updated | version | name | job_key | job_group | origin | executor | principal | owner_id | correlation_id | previous_state | state | attempts | max_attempts | start_time | end_time | log_level | log_execution_details | job_result ----+---------+---------+---------+------+---------+-----------+--------+----------+-----------+----------+----------------+----------------+------- +----------+--------------+------------+----------+-----------+-----------------------+------------ (0 rows)then the issue is on the Candlepin instance running on the Red Hat Customer Portal. During a manifest refresh, Katello communicates with this Candlepin instance. Please raise a support case. In the support case, refer to this KCS and point that, a fix must be applied on the Red Hat Customer Portal side.
-
Wait for confirmation by Red Hat Support that the issue has been resolved on Candlepin side, and once it is confirmed, refresh the manifest again.
-
-
If the failing dynflow step of the manifest refresh task is another step and not the
Actions::Candlepin::Owner::StartUpstreamExportstep:-
Before proceeding further, ensure that a good VM snapshot or backup of the Satellite server is present.
-
Now, here in the example, the job id is
8a828eb78b1513ee018b4226046f1285. Confirm that this job ID exists in the Candlepin database:# su - postgres -c "psql candlepin -c \"select * from cp_async_jobs where id = '8a828eb78b1513ee018b4226046f1285' ;"\""" id | created | updated | version | name | job_key | job_group | origin | executor | principal | owner_id | correlation_id | previous_state | state | attempts | max_attempts | start_time | end_time | log_level | log_execution_details | job_result ----------------------------------+-------------------------------+-------------------------------+---------+-----------------+-----------+-----------+------------------------+----------+---------------+----------------------------------+--------------------------------------+-----------------------+----------+--------------+------------+----------+-----------+-----------------------+------------ 8a828eb78b1513ee018b4226046f1285 | 2023-10-18 15:07:49.679+05:30 | 2023-10-18 15:07:49.694+05:30 | 1 | Import Manifest | ImportJob | | satellite.example.local | | foreman_admin | 8a828eb75aa25c6d015aa273f7e80000 | a9e4431b-829d-439c-9cdc-1a3ef11efec6 | 0 | 3 | 0 | 1 | | | | t | -
Delete that job manually from the
Candlepindatabase:# systemctl stop tomcat # su - postgres -c "psql candlepin" DELETE from cp_async_jobs where id = '8a828eb78b1513ee018b4226046f1285' ; select * from cp_async_jobs where id = '8a828eb78b1513ee018b4226046f1285' ; exit # systemctl start tomcat # sleep 20 && hammer ping -
Refresh the manifest again.
-
For more KB articles/solutions related to Red Hat Satellite 6.x Manifest Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Manifest Issues
Root Cause
-
When
Actions::Candlepin::Owner::StartUpstreamExportis the failing dynflow step, then RSHM candlepin instance got stuck on some auxiliary task, blocking any request to export manifest. -
When some other dynflow step is failing, then some old import manifest import task was not completed (or maybe virt-who processing got stuck) which conflict with Satellite refresh Manifest.
Diagnostic Steps
-
From satellite task export Manifest refresh task fails with warning blocked by the following existing jobs, when the failing dynflow step shows something like:
``` statusPath: "/jobs/2c94d175917f0f3a0191db9c0662397c" resultData: 'Job blocked by the following existing jobs: 2c947772917f0de60191dae061792706' ``` -
Identify what dynflow step is failing! As the resolution VERY depends on the particular step.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.