What repository consumes the most disk space under /var/lib/pulp on Satellite6?
Environment
Red Hat Satellite 6.10+
Issue
- how to identify what repository consumes the most disk space under
/var/lib/pulp? - if I disable a repository, how much disk space would be freed by pulp orphan removal due to that?
Resolution
Since Satellite 6.16
Since Satellite 6.16, pulp offers a summary information similar to what is required:
# sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' DJANGO_SETTINGS_MODULE='pulpcore.app.settings' pulpcore-manager repository-size --include-on-demand
..
{
"name": "Red_Hat_Enterprise_Linux_8_for_x86_64_-_BaseOS_RPMs_8-61225",
"href": "/pulp/api/v3/repositories/rpm/rpm/0193166b-9f48-72e9-acf4-7735fd78b2f7/",
"disk-size": 2171,
"on-demand-size": 37669070300
},
{
"name": "Red_Hat_Enterprise_Linux_9_for_x86_64_-_AppStream_RPMs_9-102968",
"href": "/pulp/api/v3/repositories/rpm/rpm/01931683-e2a8-70a5-bd1a-d7d381fd809a/",
"disk-size": 0,
"on-demand-size": 0
},
{
"name": "Red_Hat_Enterprise_Linux_9_for_x86_64_-_AppStream_RPMs_9-5645",
"href": "/pulp/api/v3/repositories/rpm/rpm/0193166b-9e98-7046-966f-66f89a5feeb5/",
"disk-size": 83593391,
"on-demand-size": 74685095836
},
..
Ignore the twin-named repos with empty content, they stand for CV versions repos.
On any Satellite version:
Upload the attached Python script to Satellite and rename it to pulp-diskspace-repo-usage.py. Run it as follows:
cat pulp-diskspace-repo-usage.py | sudo -u pulp SORTBY=OwnSize PULP_SETTINGS='/etc/pulp/settings.py' DJANGO_SETTINGS_MODULE='pulpcore.app.settings' pulpcore-manager shell
Example output:
Processing repo (1/11): Red_Hat_Ansible_Engine_2_for_RHEL_8_x86_64_RPMs
Processing repo (2/11): Red_Hat_Enterprise_Linux_8_for_x86_64_-_AppStream_RPMs_8
Processing repo (3/11): Red_Hat_Enterprise_Linux_8_for_x86_64_-_BaseOS_RPMs_8
Processing repo (4/11): Red_Hat_Enterprise_Linux_8_for_x86_64_-_BaseOS_RPMs_8_6
Processing repo (5/11): Red_Hat_Enterprise_Linux_9_for_x86_64_-_AppStream_RPMs_9
Processing repo (6/11): Red_Hat_Enterprise_Linux_9_for_x86_64_-_BaseOS_RPMs_9
Processing repo (7/11): Red_Hat_Satellite_Capsule_6_12_for_RHEL_8_x86_64_RPMs
Processing repo (8/11): Red_Hat_Satellite_Client_6_for_RHEL_8_x86_64_RPMs
Processing repo (9/11): Red_Hat_Satellite_Maintenance_6_12_for_RHEL_8_x86_64_RPMs
Processing repo (10/11): ZOO_repo
Processing repo (11/11): zooRepo_testOrg
Reponame Copies RemotePkgs DownPkgs OwnPkgs RemoteSize DownSize OwnSize
-------- ------ ---------- -------- ------- ---------- -------- -------
Red_Hat_Enterprise_Linux_8_for_x86_64_-_BaseOS_RPMs_8 3 24567 37 32 42.3 GiB 308.9 MiB 308.9 MiB
Red_Hat_Enterprise_Linux_8_for_x86_64_-_AppStream_RPMs_8 3 29168 83 78 87.0 GiB 169.6 MiB 169.6 MiB
Red_Hat_Enterprise_Linux_8_for_x86_64_-_BaseOS_RPMs_8_6 1 23268 22 11 40.2 GiB 149.7 MiB 148.4 MiB
Red_Hat_Satellite_Capsule_6_12_for_RHEL_8_x86_64_RPMs 1 297 236 227 400.5 MiB 60.0 MiB 59.4 MiB
Red_Hat_Enterprise_Linux_9_for_x86_64_-_AppStream_RPMs_9 1 8856 16 11 24.6 GiB 43.9 MiB 43.9 MiB
Red_Hat_Enterprise_Linux_9_for_x86_64_-_BaseOS_RPMs_9 1 2878 15 10 3.5 GiB 29.8 MiB 29.8 MiB
Red_Hat_Ansible_Engine_2_for_RHEL_8_x86_64_RPMs 2 58 15 11 534.4 MiB 4.9 MiB 4.9 MiB
Red_Hat_Satellite_Client_6_for_RHEL_8_x86_64_RPMs 3 18 15 10 35.0 MiB 658.2 KiB 252.2 KiB
Red_Hat_Satellite_Maintenance_6_12_for_RHEL_8_x86_64_RPMs 1 12 18 10 1.0 MiB 341.3 KiB 55.7 KiB
ZOO_repo 2 64 49 9 153.4 KiB 105.1 KiB 18.4 KiB
zooRepo_testOrg 1 64 44 4 153.4 KiB 93.4 KiB 6.7 KiB
Columns description:
Reponame= name of (katello root) repository the user synchronized from CDN or from 3rd party repoCopies= number of published copies of the repository. Substract 1 and you get number of CVs the repo is published in. Basic info how much CVs you need to update in case you want to disable the repoRemotePkgs= number of packages on the remote server where the repo is synchronized from; if the repo would have Immediate download policy and would be synced, all these packages would be downloadedDownPkgs= number of all artifacts downloaded or generated (i.e. metadata) for the repo or its clones (downloaded (also) for this repo). Usually it means RPM packages, but also metadata snippets or e.g. ISO images or other repo content, depending on the repo typeOwnPkgs= number of the artifacts that just this repository clones do use (own to this repo). E.g. if packagesos-4.5.0-1.el8is present in RHEL8 repo and 8.6 EUS repo and 8.4 AUS repo, then this package won't be counted here. Since in a case you would disable say 8.4 AUS repo, this particular package can't be removed from the disk as other repos still use it.RemoteSize= overall size of the repository on the remote server where the repo is synchronized from; if the repo would have Immediate download policy and would be synced, it would need this amount of disk space (some might be shared with other repos if they contain exactly the same package(s))DownSize= size of all artifacts downloaded or generated (i.e. metadata) for the repo / its clones (downloaded (also) for this repo).OwnSize= size of the artifacts stored just for this repository clones (own to this repo). This means, if you would disable this repo (after removing it from all CVs), and if you would run orphan cleanup, this amount of disk space should be freed.
The output can be ordered by any column - just put the column name to the SORTBY env.variable (as SORTBY=OwnSize above).
What types of repositories the script deals with?
It was verified to work well for RPM and docker repositories.
How to identify repositories with UUID as name?
In case Reponame contains a UUID and nothing descriptive, the pulp repo name was created in Satellite 6.9 or older which did not follow the naming convention. There are two equivalent ways of getting Organization label and Repository label from foreman/katello here - just replace 0c85a598-adf2-4910-b64a-889e7d458d9a used in below example:
# echo "rrepo = Katello::Repository.where(:pulp_id => \"0c85a598-adf2-4910-b64a-889e7d458d9a\").first().root; puts(\"Organization: #{rrepo.organization.label} Repo: #{rrepo.label}\")" | foreman-rake console
Loading production environment (Rails 6.0.4.7)
Switch to inspect mode.
rrepo = Katello::Repository.where(:pulp_id => "0c85a598-adf2-4910-b64a-889e7d458d9a").first().root; puts("Organization: #{rrepo.organization.label} Repo: #{rrepo.label}")
Organization: Default_Organization Repo: Red_Hat_Satellite_Capsule_6_9_for_RHEL_7_Server_RPMs_x86_64
nil
#
# su - postgres -c "psql foreman -c \"SELECT t.label,krr.label FROM katello_repositories AS kr INNER JOIN katello_root_repositories AS krr ON krr.id = kr.root_id INNER JOIN katello_products AS kp ON kp.id = krr.product_id INNER JOIN taxonomies AS t ON t.id = kp.organization_id WHERE kr.pulp_id = '0c85a598-adf2-4910-b64a-889e7d458d9a';\""
label | label
----------------------+-------------------------------------------------------------
Default_Organization | Red_Hat_Satellite_Capsule_6_9_for_RHEL_7_Server_RPMs_x86_64
(1 row)
#
For more KB articles/solutions related to Red Hat Satellite 6.x Pulp 3.0 Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Pulp 3.0-related Issues
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.