How to manually apply Hive DB retention period for Metering

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform

Issue

  • Hive storage is full and there is no documentation on how to set a proper retention period.

NOTE: Since OCP 4.6, is possible to configure a retention period but only for metering Report custom resources.

Resolution

NOTE: Metering doesn't care about how you backup the PVC as long as it contains the correct data.

  • Connect to the corresponding presto pod:
$ oc -n openshift-metering exec -it "$(oc -n openshift-metering get pods -l app=presto,presto=coordinator -o name | cut -d/ -f2)"  -- bash
  • Set the following variables accordingly with your desired retention period and DB name:
bash-4.4$ DAYS=7
bash-4.4$ DB="metering"

NOTE: The retention is calculated as CURRENT DATE - DAYS.

  • Extract table list avoiding _raw (ReportDataSource) and report_openshift_ tables:
bash-4.4$ TABLES=$(presto-cli --server https://presto:8080 --catalog hive --schema ${DB} --user root --keystore-path /opt/presto/tls/keystore.pem --execute "SELECT table_name FROM information_schema.tables WHERE table_type = 'BASE TABLE' AND table_schema = 'metering';" | grep datasource | tr -d '"')

NOTE: Only datasource_openshift_ tables contain DT partitions that we can remove with this process.


Hard pruning mode

  • Extract the DATES per TABLE and perform the DELETE operation, all in one single command as follows:
bash-4.4$ for i in ${TABLES}; do DATES=$(presto-cli --server https://presto:8080 --catalog hive --schema ${DB} --user root --keystore-path /opt/presto/tls/keystore.pem --execute "SELECT format('(%s)', dates) FROM( SELECT array_join(array_distinct(array_agg(format('''%s''', dt))), ',') AS dates FROM ${i} WHERE date(dt) < (current_date - interval '${DAYS}' day) );" | tr -d '"') && echo -n "-> ${i} ... " && [[ ${DATES} != "(null)" ]] && presto-cli --server https://presto:8080 --catalog hive --schema ${DB} --user root --keystore-path /opt/presto/tls/keystore.pem --execute "DELETE FROM ${i} WHERE dt in ${DATES}" || echo "NULL"; done

NOTE: This command is time and resources consuming because it has to go through all rows to extract the corresponding DT partitions (DATES) to be removed, see "Soft pruning" below if needed.


Soft pruning mode

  • Extract DATES list (as a proper formatted string) from the TABLES:
bash-4.4$ DATES=$(presto-cli --server https://presto:8080 --catalog hive --schema ${DB} --user root --keystore-path /opt/presto/tls/keystore.pem --execute "SELECT format('(%s)', dates) FROM( SELECT array_join(array_distinct(array_agg(format('''%s''', dt))), ',') AS dates FROM $(echo ${TABLES} | cut -d " " -f1) WHERE date(dt) < (current_date - interval '${DAYS}' day) );" | tr -d '"')

NOTE: This command may take long time depending on the amount of rows to be processed, please be patient. To expedite the process, the first table from the list is used.

  • (Optional) check the content of both variables before removal, for example:
bash-4.4$ for i in ${TABLES}; do echo ${i}; done
datasource_openshift_metering_node_allocatable_cpu_cores
datasource_openshift_metering_node_allocatable_memory_bytes
datasource_openshift_metering_node_capacity_cpu_cores
datasource_openshift_metering_node_capacity_memory_bytes
datasource_openshift_metering_persistentvolumeclaim_capacity_bytes
datasource_openshift_metering_persistentvolumeclaim_phase
datasource_openshift_metering_persistentvolumeclaim_request_bytes
datasource_openshift_metering_persistentvolumeclaim_usage_bytes
datasource_openshift_metering_pod_limit_cpu_cores
datasource_openshift_metering_pod_limit_memory_bytes
datasource_openshift_metering_pod_persistentvolumeclaim_request_info
datasource_openshift_metering_pod_request_cpu_cores
datasource_openshift_metering_pod_request_memory_bytes
datasource_openshift_metering_pod_usage_cpu_cores
datasource_openshift_metering_pod_usage_memory_bytes

bash-4.4$ echo $DATES
('2021-01-18','2021-01-17')

NOTE: DATES format has to be exactly like above ('<d1>','<d2>','<d3>',...) in order to fed the whole string at once into the deletion SQL sentence below.

  • Delete the old partitions for each table:
bash-4.4$ for i in ${TABLES}; do echo -n "-> ${i} ... " && presto-cli --server https://presto:8080 --catalog hive --schema ${DB} --user root --keystore-path /opt/presto/tls/keystore.pem --execute "DELETE FROM ${i} WHERE dt in ${DATES}"; done

NOTE: This command will produce an output like -> datasource_openshift_metering_node_allocatable_cpu_cores ... DELETE, this means a successfull removal of the date partitions. If you see ... DELETE: 0 rows, that means the table(s) didn't contain the requested partition dates, in this case, if needed, you can try again just reducing the DAYS variable and re-running step4 to gather a new DATES list.


Troubleshooting

Specially when running Hard pruning mode is possible to see the following errors for some tables depending on the DB size:

Query 20210126_142303_04144_7ujtp failed: Query exceeded per-node user memory limit of 409.60MB [Allocated: 390.01MB, Delta: 64.62MB, Top Consumers: {AggregationOperator=390.01MB}]

-> datasource_openshift_metering_persistentvolumeclaim_phase ... Query 20210126_142843_04170_7ujtp failed: line 1:80: mismatched input 'in'. Expecting: 'AND', 'OR', <EOF>

This is caused for the excessive computational effort of DATES extraction that exceeds the limits of the pod, if this happens, there are some possible workarounds:

a) Manual purge those tables using a preexisting DATES string extacted from other table.
b) Manually generating the DATES string following the format ('<d1>','<d2>','<d3>',...), for example:

$ echo ${DATES}
('2020-04-06','2020-04-07','2020-04-08','2020-04-09','2020-04-10','2020-04-11','2020-04-12','2020-04-13','2020-04-14','2020-04-15','2020-04-16','2020-04-17','2020-04-18','2020-04-19','2020-04-20','2020-04-21','2020-04-22','2020-04-23','2020-04-24','2020-04-25','2020-04-26','2020-04-27')

c) Temporarily increasing the pod limits (Content from github.com is not included.recommended-resource-limits.yaml).
d) Running the purging multiple times but with lower retention periods to reduce the computational overload.

Root Cause

  • Automatic partition-related retention properties only work on Hive 3.x and Metering uses Hive 2.3.
  • Metering is deprecated and will be removed in future versions.
SBR
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.