Backup and restore

Red Hat Edge Manager 1.2

Instructions on backing up and restoring Red Hat Edge Manager

Red Hat Edge Manager Documentation Team

Abstract

This document provides information on backing up and restoring Red Hat Edge Manager.

Preface

Back up and restore Red Hat Edge Manager for disaster recovery on Red Hat Enterprise Linux or Red Hat OpenShift Container Platform. Use flightctl-backup and flightctl-restore for full server recovery, or follow database-focused procedures when your operations team manages PostgreSQL separately.

Chapter 1. Full server backup and restore overview

Use the flightctl-backup and flightctl-restore commands to export and import the Red Hat Edge Manager server state required for disaster recovery: PostgreSQL data, PKI and mTLS material, service configuration, and deployment-specific artifacts. The commands produce a portable archive you can store, schedule, and restore on a replacement host or cluster of the same Red Hat Edge Manager version.

1.1. Backup archive contents

Before you store archives off-host or restore on a replacement host, confirm that flightctl-backup captured the components your recovery plan requires. A successful run packages at minimum:

  • Database: logical dump of the internal flightctl PostgreSQL database (when the database runs with the deployment)
  • PKI: certificate authority keys, TLS certificates, and related secrets required for enrolled devices to reconnect over mTLS without re-enrollment
  • Configuration: service configuration (for example, service-config.yaml on Red Hat Enterprise Linux or the service ConfigMap on Red Hat OpenShift Container Platform) and deployment-specific data such as the PAM Issuer volume on Podman/quadlet hosts or Helm values references on Red Hat OpenShift Container Platform

The archive also includes metadata.json (backup timestamp, Red Hat Edge Manager version, deployment type) and a companion .sha256 checksum file. Before restore, verify integrity with sha256sum -c on that file from the directory that contains the archive; the command must report OK. flightctl-restore also validates the checksum before extraction. For more information about backup archive structure and integrity verification, see Additional resources.

1.2. Full server versus database-only backup

Use the following scenarios to match your recovery goal to the right backup scope. The Related topics column lists links in Additional resources at the end of this topic.

ScenarioRecommended approachRelated topics

Disaster recovery of the entire control plane (new host or cluster, same version)

flightctl-backup and flightctl-restore <archive>

  • Backing up the full server on Red Hat Enterprise Linux
  • Backing up the full server on Red Hat OpenShift Container Platform

DBA-led or storage-level PostgreSQL backup only

Your organization’s pg_dump, snapshots, or volume backup; then flightctl-restore for KV and device reconciliation if needed

Backing up the PostgreSQL database for Red Hat Edge Manager on Red Hat Enterprise Linux

External PostgreSQL (not managed by the deployment)

Back up the database with your database team’s tooling; use flightctl-backup for PKI and configuration, or follow their runbook

External PostgreSQL databases

Note

flightctl-restore without an archive argument continues to perform device preparation only (for example, after a manual database restore). Archive-based restore is the supported path for full server recovery.

1.3. Prerequisites for backup and restore

Confirm the following before you run flightctl-backup or flightctl-restore.

1.3.1. Software versions

  • Red Hat Edge Manager release (restore): Before restore, install Red Hat Edge Manager on the replacement host or cluster at the same release recorded in the backup archive. The backup archive must be from the same Red Hat Edge Manager release as the newly installed instance.
  • Matching CLI tools: flightctl, flightctl-backup, and flightctl-restore on the machine where you run the commands must match the Red Hat Edge Manager server release. Version checks work as follows:

    • flightctl version always reports Client Version (the flightctl CLI binary). When the CLI is configured and can reach the Red Hat Edge Manager API, the same command also reports Server Version (the running Red Hat Edge Manager instance). Use flightctl login if Server Version is missing.
    • flightctl-backup version and flightctl-restore version report the version of those binaries.

      Run the commands from a host that can reach the Red Hat Edge Manager API:

      flightctl version
      flightctl-backup version
      flightctl-restore version

      Client Version, Server Version (when shown), and the backup and restore tool versions must all match. If any version differs, do not run backup or restore until you install or replace the mismatched binaries from the same Red Hat Edge Manager release as the server. On Red Hat Enterprise Linux, install or upgrade packages from the same repository channel as flightctl-services (for example, sudo dnf upgrade -y flightctl-cli). On Red Hat OpenShift Container Platform, install matching CLIs on the workstation from the product repository or the Red Hat Edge Manager UI download. Re-run the version commands above to confirm alignment before you continue. For more information about the version compatibility matrix and installing the Flight Control CLI, see Additional resources.

      Important

      Cross-version restore is not supported. For more information about backup and restore limitations, see Additional resources.

1.3.2. Tools by deployment type

ToolRed Hat Enterprise Linux (Podman/quadlet)Red Hat OpenShift Container Platform (Kubernetes/Helm)

flightctl-backup, flightctl-restore

Required on the host or a management host with access to deployment paths and credentials

Required inside the cluster (run as Kubernetes Job or CronJob with ServiceAccount and RBAC permissions)

pg_dump, psql

Required in PATH when the deployment uses the internal PostgreSQL database

Must be available inside the database pod (pre-installed in flightctl-db image); not required on the host running the backup Job

kubectl

Not required; flightctl-backup and flightctl-restore use the host filesystem and Podman on Red Hat Enterprise Linux

Not required by flightctl-backup (uses Kubernetes Go API); needed only for deploying/managing the backup Job or CronJob

helm

Not required

Not required by flightctl-backup (reads Helm release Secrets via Kubernetes API); CLI not used

sha256sum or equivalent

Required to verify archive integrity

Required to verify archive integrity (available in backup Job container)

1.3.3. Access and privileges

  • Red Hat Enterprise Linux: Root or sudo access to stop and start Flight Control systemd units, read PKI under /etc/flightctl/pki/, and run Podman commands when volumes are exported.
  • Red Hat OpenShift Container Platform: ServiceAccount with RBAC permissions to list Pods, read Secrets, and exec into Pods in the Red Hat Edge Manager namespace (for running flightctl-backup as a Job); cluster-admin or namespace admin needed to create the backup Job/CronJob initially.
  • Archive storage: Writable directory for backup output (--output) with space for the full archive; restrict permissions on backup files (archives are created with owner-only access).

1.3.4. Operational readiness

  • A maintenance window or change record for restore operations that stop application services or scale workloads to zero.
  • A tested path to copy archives off the host or cluster (backup files are local; transport to remote storage is your responsibility).
  • For restore validation, a plan to confirm API health, inventory, and device reconnection. For more information about post-restore device status changes, see Additional resources.

1.4. Backup and restore limitations

Understand the following boundaries when you plan backup and restore for Red Hat Edge Manager.

  • Same-version restore only: You can restore an archive only to a Red Hat Edge Manager instance running the same version recorded in the archive metadata. Do not attempt to restore a backup taken on one release onto a host or cluster upgraded to a newer release.
  • Full backups only: flightctl-backup creates a full snapshot. Incremental or differential backups are not supported in this release.
  • No cross-deployment restore: An archive created on a Podman/quadlet deployment cannot be restored on a Kubernetes/Helm deployment, and the reverse is not supported. The restore command detects deployment type from the archive and fails with a clear error on mismatch.
  • No built-in remote storage: The commands write a local archive (and checksum file). Copying archives to object storage, NFS, or tape is your responsibility. Red Hat Edge Manager does not ship S3, NFS, or similar backup targets.
  • No backup encryption: Archives are not encrypted by the product. Use storage-level encryption, filesystem encryption, or your organization’s secret-management practices to protect backup media.
  • No native scheduler: Red Hat Edge Manager does not run scheduled backups internally. Use cron, Ansible, Kubernetes CronJobs, or your automation platform. For more information about scheduling backups, see Additional resources.
  • External PostgreSQL: When the database runs outside the deployment, flightctl-backup does not dump that database; it prints guidance and still backs up PKI and configuration. Your database team must back up and restore external instances. For more information about external PostgreSQL databases, see Additional resources.
  • Device state after restore: Devices enrolled after the backup was taken, or devices with specification drift, can require re-approval or operator action. For more information about post-restore device status changes, see Additional resources.
  • Restore is not transactional: flightctl-restore can leave the deployment in a partial state if it fails mid-run (for example, the KV store cleared before all device records are updated). Device preparation also logs per-device KV key failures without always failing the command. For more information about recovering from a failed or partial restore, see Additional resources.

1.5. Backup archive structure and integrity verification

Each flightctl-backup run produces a timestamped archive and a companion .sha256 checksum file. Before any restore that overwrites live data, change to the directory that contains both files and run sha256sum -c on the checksum file (for example, sha256sum -c flightctl-backup-20260428T120000Z.tar.gz.sha256). The command must report OK. Do not run flightctl-restore if verification fails.

1.5.1. Archive naming

Archives use a UTC timestamp in the filename, for example:

flightctl-backup-20260428T120000Z.tar.gz
flightctl-backup-20260428T120000Z.tar.gz.sha256

1.5.2. Top-level contents

After extraction, the archive layout includes directories similar to the following:

metadata.json
db/dump.sql
pki/...
config/...
volumes/...    (Podman deployments, when applicable)
PathDescription

metadata.json

Backup timestamp, Red Hat Edge Manager version, deployment type (podman or kubernetes), and related metadata used during restore validation

db/dump.sql

PostgreSQL logical dump of the flightctl database (present when the internal database was backed up)

pki/

PKI and TLS material (filesystem copy on Red Hat Enterprise Linux; exported Secret manifests on Red Hat OpenShift Container Platform)

config/

Service configuration (service-config.yaml on Red Hat Enterprise Linux, Helm release Secret helm-release-<name>.yaml on Red Hat OpenShift Container Platform)

volumes/pam-issuer-etc.tar

PAM Issuer volume export (Podman deployments only, optional - backup continues if export fails)

1.5.3. Checksum file

The .sha256 file uses the standard sha256sum format:

<hash>  flightctl-backup-20260428T120000Z.tar.gz

Verify integrity from the directory that contains both files:

sha256sum -c flightctl-backup-20260428T120000Z.tar.gz.sha256

A failed check indicates a corrupt or incomplete transfer. Do not run flightctl-restore until you obtain a valid archive.

1.5.4. File permissions

Backup archives are created with restrictive permissions (owner read/write only). Preserve those permissions when you copy archives to backup storage.

1.6. External PostgreSQL databases

When Red Hat Edge Manager is configured to use an external PostgreSQL instance (not the database container or pod shipped with the deployment), database backup and restore remain your organization’s responsibility.

1.6.1. Backup behavior

flightctl-backup detects an external database connection and does not run pg_dump against that instance. The command logs "External database detected - database backup skipped" and continues to collect PKI materials, service configuration, and other deployment artifacts needed for control plane recovery. The resulting archive will not contain db/dump.sql, and metadata.json will show "databaseIncluded": false. == Restore behavior

flightctl-restore <archive> restores PKI and configuration from the archive. If the archive does not contain db/dump.sql, the restore command prints instructions for restoring the external database before you complete device reconciliation.

1.7. Backing up the full server on Red Hat Enterprise Linux

Run flightctl-backup on the Red Hat Enterprise Linux host that runs your Podman/quadlet Red Hat Edge Manager deployment to create a portable archive of database, PKI, configuration, and related state.

Prerequisites

  • Prerequisites for backup and restore are satisfied.
  • Sufficient disk space in the --output directory for a full archive.
  • pg_dump is available in PATH when the deployment uses the internal PostgreSQL database.

Procedure

  1. Confirm that flightctl and flightctl-backup match the running Red Hat Edge Manager server:

    Run flightctl version when the CLI can reach the Red Hat Edge Manager API (use flightctl login if needed) so you can compare Client Version, Server Version, and flightctl-backup version. For more information about prerequisites for backup and restore, see Additional resources.

    flightctl version
    flightctl-backup version
  2. Create a backup directory owned by the user that runs the command (example path):

    sudo mkdir -p /var/backups/rhem
    sudo chown "$(whoami)" /var/backups/rhem
  3. Run the backup. Use --output to set the destination directory and --config only if your service configuration is not in the default location:

    flightctl-backup --output /var/backups/rhem

    Optional: specify a custom service configuration path:

    flightctl-backup --output /var/backups/rhem --config /etc/flightctl/service-config.yaml
    Note

    If --config is not specified, the default path /etc/flightctl/service-config.yaml is used.

  4. Verify that the command created a timestamped archive and checksum file:

    ls -l /var/backups/rhem/flightctl-backup-*.tar.gz*
    sha256sum -c /var/backups/rhem/flightctl-backup-*.tar.gz.sha256
  5. Copy the archive and checksum to backup storage outside the host failure domain.

Verification

  • sha256sum -c reports OK.
  • Archive permissions are owner-read/write only (-rw-------).
  • If the deployment uses an external database, the command log includes DBA instructions and the archive still contains pki/ and config/.

1.8. Restoring the full server on Red Hat Enterprise Linux

Run flightctl-restore with a backup archive from flightctl-backup to recover Red Hat Edge Manager on a Red Hat Enterprise Linux host of the same version. The command verifies the archive checksum, restores database, PKI, and configuration, and performs device preparation so enrolled devices can reconnect without re-enrollment when PKI and endpoints are unchanged.

Prerequisites

  • A valid backup archive and matching .sha256 file from flightctl-backup on the same Red Hat Edge Manager version as the target host.
  • Prerequisites for backup and restore are satisfied.
  • Application services can be stopped during restore while PostgreSQL and the KV store remain running (same pattern as restore using quadlets and flightctl-restore).

Procedure

  1. Verify that flightctl and flightctl-restore match the running Red Hat Edge Manager server:

    Run flightctl version from a host where the CLI can reach the Red Hat Edge Manager API (use flightctl login if needed). The command reports Client Version (the CLI binary) and, when connected, Server Version (the running Red Hat Edge Manager instance). Run flightctl-restore version to report the restore binary version.

    flightctl version
    flightctl-restore version

    Compare the output: Client Version, Server Version, and the flightctl-restore version must all report the same Red Hat Edge Manager release. If Server Version is missing or any version differs, update the mismatched binaries before you continue. For more information about prerequisites for backup and restore, see Additional resources.

    Important

    Cross-version restore is not supported. The CLI client, the running server, and flightctl-restore must be on the same release before you continue.

  2. Verify archive integrity before any service changes. Run the command from the directory that contains the archive and .sha256 file, or pass the full path to the checksum file. The output must include OK:

    sha256sum -c /path/to/flightctl-backup-YYYYMMDDTHHMMSSZ.tar.gz.sha256

    If verification fails, do not continue. For more information about the checksum file, see Additional resources.

  3. Stop Red Hat Edge Manager application services only. Do not stop flightctl-db.service or flightctl-kv.service:

    sudo systemctl stop flightctl-api.service
    sudo systemctl stop flightctl-worker.service
    sudo systemctl stop flightctl-periodic.service
    sudo systemctl stop flightctl-alert-exporter.service
    sudo systemctl stop flightctl-alertmanager-proxy.service
    sudo systemctl stop flightctl-telemetry-gateway.service
    sudo systemctl stop flightctl-pam-issuer.service
    sudo systemctl stop flightctl-cli-artifacts.service
    sudo systemctl stop flightctl-alertmanager.service
    sudo systemctl stop flightctl-imagebuilder-api.service
    sudo systemctl stop flightctl-imagebuilder-worker.service
    sudo systemctl stop flightctl-ui.service
    Note

    Skip units that do not exist on your host. Do not run systemctl stop flightctl.target.

  4. If the archive-based restore requires database and KV passwords (for example, when the command connects to local services), retrieve credentials from Podman secrets as in the manual restore procedure, publish ports if your environment requires it, then run restore with the archive path:

    DB_APP_PASSWORD=$(sudo podman secret inspect flightctl-postgresql-user-password --showsecret | jq -r '.[0].SecretData')
    KV_PASSWORD=$(sudo podman secret inspect flightctl-kv-password --showsecret | jq -r '.[0].SecretData')
    export DB_PASSWORD="$DB_APP_PASSWORD" KV_PASSWORD="$KV_PASSWORD"
    flightctl-restore /path/to/flightctl-backup-YYYYMMDDTHHMMSSZ.tar.gz

    Monitor the command output until it completes successfully. flightctl-restore is not transactional: a mid-run failure can leave the deployment partially restored (for example, KV store cleared before all device annotations are updated). Do not start application services until restore finishes without error.

    Important

    If the command fails or logs per-device KV errors, keep application services stopped. For more information about recovering from a failed or partial restore, see Additional resources before you continue.

    Note

    When your release restores without separate port publishing, follow the command output from flightctl-restore --help for your version. The archive path is the first positional argument.

  5. Start application services again (same units you stopped):

    sudo systemctl start flightctl-api.service
    sudo systemctl start flightctl-worker.service
    sudo systemctl start flightctl-periodic.service
    sudo systemctl start flightctl-alert-exporter.service
    sudo systemctl start flightctl-alertmanager-proxy.service
    sudo systemctl start flightctl-telemetry-gateway.service
    sudo systemctl start flightctl-pam-issuer.service
    sudo systemctl start flightctl-cli-artifacts.service
    sudo systemctl start flightctl-alertmanager.service
    sudo systemctl start flightctl-imagebuilder-api.service
    sudo systemctl start flightctl-imagebuilder-worker.service
    sudo systemctl start flightctl-ui.service
  6. Confirm API health and review device status in the console or CLI.

Verification

  • flightctl API calls succeed.
  • Devices transition from AwaitingReconnect to online where expected.
  • No checksum or deployment-type errors appeared during restore.

1.9. Backing up the full server on Red Hat OpenShift Container Platform

Run flightctl-backup as a Kubernetes Job or CronJob inside the cluster with appropriate RBAC permissions to create an archive of database, PKI Secrets, configuration, and Helm values for disaster recovery.

Prerequisites

  • Prerequisites for backup and restore are satisfied.
  • kubectl context points at the cluster and namespace where Red Hat Edge Manager is installed.
  • helm can read the release that deployed Red Hat Edge Manager (for user-supplied values).
  • Writable local or mounted directory for --output.

Procedure

  1. Set the namespace variable for your deployment (replace rhem-chart-namespace):

    export RHEM_NS=rhem-chart-namespace
    kubectl config set-context --current --namespace="$RHEM_NS"
  2. Confirm that flightctl and flightctl-backup match the Red Hat Edge Manager deployment:

    Run flightctl version when the CLI can reach the Red Hat Edge Manager API (use flightctl login if needed) so you can compare Client Version, Server Version, and flightctl-backup version. For more information about prerequisites for backup and restore, see Additional resources.

    flightctl version
    flightctl-backup version
  3. Run the backup:

    mkdir -p "$HOME/rhem-backups"
    flightctl-backup --output "$HOME/rhem-backups"

    The command detects the Kubernetes deployment, dumps the internal database when applicable, exports PKI Secrets and the service ConfigMap, and captures Helm values used for the release.

  4. Verify the archive and checksum:

    ls -l "$HOME/rhem-backups"/flightctl-backup-*.tar.gz*
    sha256sum -c "$HOME/rhem-backups"/flightctl-backup-*.tar.gz.sha256
  5. Copy the archive pair to storage outside the cluster failure domain.

Verification

  • Checksum verification succeeds.
  • metadata.json inside the archive (optional inspection) lists deployment type kubernetes and the expected Red Hat Edge Manager version.

1.10. Restoring the full server on Red Hat OpenShift Container Platform

Run flightctl-restore with a backup archive from flightctl-backup to recover Red Hat Edge Manager on Red Hat OpenShift Container Platform at the same version. The command verifies checksum and deployment type, restores database, PKI, and configuration, and reconciles device state.

Prerequisites

  • A valid archive and .sha256 file from a Kubernetes/Helm backup of the same Red Hat Edge Manager version.
  • Prerequisites for backup and restore are satisfied.
  • Permission to scale Red Hat Edge Manager deployments to zero in the target namespace during restore.

Procedure

  1. Verify that flightctl and flightctl-restore match the Red Hat Edge Manager deployment:

    From a workstation where flightctl is configured to reach the Red Hat Edge Manager API, run flightctl version (use flightctl login first if needed). The command reports Client Version (the CLI binary) and, when connected, Server Version (the running Red Hat Edge Manager instance). Run flightctl-restore version to report the restore binary version.

    flightctl version
    flightctl-restore version

    Compare the output: Client Version, Server Version, and the flightctl-restore version must all report the same Red Hat Edge Manager release. If Server Version is missing or any version differs, update the mismatched binaries before you continue. For more information about prerequisites for backup and restore, see Additional resources.

    Important

    Cross-version restore is not supported. The CLI client, the running server, and flightctl-restore must be on the same release before you continue.

  2. Verify archive integrity before you scale down workloads. Run the command from the directory that contains the archive and .sha256 file, or pass the full path to the checksum file. The output must include OK:

    sha256sum -c /path/to/flightctl-backup-YYYYMMDDTHHMMSSZ.tar.gz.sha256

    If verification fails, do not continue. For more information about the checksum file, see Additional resources.

  3. Scale down application deployments in the Red Hat Edge Manager namespace (replace rhem-chart-namespace):

    export RHEM_NS=rhem-chart-namespace
    kubectl scale deployment flightctl-api --replicas=0 -n "$RHEM_NS"
    kubectl scale deployment flightctl-worker --replicas=0 -n "$RHEM_NS"
    kubectl scale deployment flightctl-periodic --replicas=0 -n "$RHEM_NS"
    kubectl scale deployment flightctl-alert-exporter --replicas=0 -n "$RHEM_NS"
    kubectl scale deployment flightctl-alertmanager-proxy --replicas=0 -n "$RHEM_NS"
    kubectl get pods -n "$RHEM_NS"
  4. Run restore with the archive path. When the command needs database and KV access from your workstation, retrieve Secrets and use port-forwarding as documented in the manual PostgreSQL restore procedure on Red Hat OpenShift Container Platform. For more information about the restore procedure (manual database steps), see Additional resources. Then pass credentials:

    DB_APP_PASSWORD=$(kubectl get secret flightctl-db-app-secret -n "$RHEM_NS" -o jsonpath='{.data.userPassword}' | base64 -d)
    KV_PASSWORD=$(kubectl get secret flightctl-kv-secret -n "$RHEM_NS" -o jsonpath='{.data.password}' | base64 -d)
    export DB_PASSWORD="$DB_APP_PASSWORD" KV_PASSWORD="$KV_PASSWORD"
    flightctl-restore /path/to/flightctl-backup-YYYYMMDDTHHMMSSZ.tar.gz

    Monitor the command output until it completes successfully. flightctl-restore is not transactional: a mid-run failure can leave the deployment partially restored (for example, KV store cleared before all device annotations are updated). Do not scale deployments back up until restore finishes without error.

    Important

    If the command fails or logs per-device KV errors, keep workloads scaled to zero. For more information about recovering from a failed or partial restore, see Additional resources before you continue.

    Note

    Apply Helm values from the archive manually if the restore output directs you to config/helm-values.yaml in the extracted layout. Your runbook should match the version of flightctl-restore you are running.

  5. Scale deployments back to their normal replica counts:

    kubectl scale deployment flightctl-api --replicas=1 -n "$RHEM_NS"
    kubectl scale deployment flightctl-worker --replicas=1 -n "$RHEM_NS"
    kubectl scale deployment flightctl-periodic --replicas=1 -n "$RHEM_NS"
    kubectl scale deployment flightctl-alert-exporter --replicas=1 -n "$RHEM_NS"
    kubectl scale deployment flightctl-alertmanager-proxy --replicas=1 -n "$RHEM_NS"
    kubectl get pods -n "$RHEM_NS"
  6. Validate the API, console, and device inventory.

Verification

  • Pods are running and the API responds.
  • Devices reconnect without full re-enrollment when PKI and endpoints are unchanged.

1.11. Scheduling backups

Red Hat Edge Manager does not include a built-in backup scheduler. Run flightctl-backup from cron on Red Hat Enterprise Linux or from a Kubernetes CronJob on Red Hat OpenShift Container Platform, similar to how you automate other operational tasks. Retention (for example, keep the last n archives) is implemented in your wrapper script or storage lifecycle policy.

1.11.1. Red Hat Enterprise Linux example (cron)

Run backup daily at 02:00, write archives to /var/backups/rhem, and delete archives older than 14 days:

# /etc/cron.d/rhem-backup (example — adjust paths and retention to your environment)
0 2 * * * root /usr/local/bin/flightctl-backup --output /var/backups/rhem && \
  find /var/backups/rhem -name 'flightctl-backup-*.tar.gz' -mtime +14 -delete
Note

Install flightctl-backup on the host where cron runs (typically the Red Hat Edge Manager Red Hat Enterprise Linux system). Ensure the job runs non-interactively and that log output is captured by your logging stack.

1.11.2. Red Hat OpenShift Container Platform example (CronJob)

Schedule backup from a Job that runs flightctl-backup on a management image or an admin workstation with cluster credentials. The following manifest is illustrative; replace the image, command, volume mounts, and service account with values that match your cluster security policy:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: rhem-backup
  namespace: rhem-chart-namespace
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: registry.redhat.io/edge-manager/edge-manager-rhel9:1.2
            command:
            - /bin/sh
            - -c
            - flightctl-backup --output /backup && cp /backup/flightctl-backup-*.tar.gz* /pvc/
            volumeMounts:
            - name: backup-pvc
              mountPath: /backup
          volumes:
          - name: backup-pvc
            persistentVolumeClaim:
              claimName: rhem-backup-pvc
Note

Image version must match your deployed Red Hat Edge Manager version.

Note

Ansible Automation Platform on Red Hat OpenShift Container Platform uses a dedicated backup operator for long-term cluster backup patterns. Red Hat Edge Manager documents CronJob-based scheduling so you can integrate with your existing backup platform (for example, copy the PVC snapshot or archive object to OpenShift APIs for Data Protection or enterprise backup). You are not required to deploy a separate Red Hat Edge Manager operator for backups.

1.11.3. Verification in scheduled jobs

After each scheduled run:

  • Confirm the archive and .sha256 files exist.
  • Run sha256sum -c on the newest archive.
  • Periodically perform a full restore test in a non-production environment. For more information about testing backups, see Additional resources.

1.12. Troubleshooting backup and restore

Use this table to resolve common flightctl-backup and flightctl-restore failures. Exact messages can vary by release; compare CLI output to flightctl-backup --help and flightctl-restore --help for your version.

SymptomLikely causeWhat to do

sha256sum -c fails

Corrupt or partial archive copy

Re-copy the archive from the source; re-run backup if the source file is damaged

Restore reports checksum mismatch before extraction

Archive or .sha256 file altered

Do not restore; obtain a known-good archive from backup storage

Restore reports deployment type mismatch

Archive from Podman host restored on Red Hat OpenShift Container Platform, or the reverse

Take a new backup on the target deployment type, or rebuild the environment to match the archive metadata

Backup fails: pg_dump not found

Client tools not installed on the backup execution host

Install PostgreSQL client packages so pg_dump is in PATH

Backup or restore fails: database connection error

Wrong credentials, database not running, or network policy blocking access

Verify database pod or container is running; confirm passwords from Secrets or Podman secrets; on Red Hat OpenShift Container Platform, check port-forward and namespace

Backup completes but db/dump.sql is absent

External PostgreSQL configured

Follow printed instructions and your DBA runbook. For more information about external PostgreSQL databases, see Additional resources.

Restore fails: version mismatch

flightctl, flightctl-backup, or flightctl-restore does not match server version

Install or replace mismatched binaries from the same Red Hat Edge Manager release as the server. For more information about prerequisites for backup and restore, see Additional resources.

Devices do not reconnect after restore

PKI not restored, wrong archive, or network/DNS change

Verify pki/ was restored; confirm server URL and certificates unchanged. For more information about post-restore device status changes, see Additional resources.

ConflictPaused or AwaitingReconnect for many devices

Expected after point-in-time restore

For more information about post-restore device status changes, see Additional resources.

flightctl-restore exits with an error

Archive restore, database import, PKI restore, or device preparation stopped before completion

Keep application services stopped or workloads scaled to zero. Read the command output and logs. Fix the reported error (credentials, connectivity, permissions, disk space). Re-run flightctl-restore with the same archive path while services remain down. Do not start the API until the command completes successfully.

Restore command finished but logs show device or KV key errors

Device preparation is not transactional: the KV store can be cleared and database device rows updated even when later steps fail; per-device awaiting-reconnection KV keys can fail individually while the command still exits successfully

Keep services stopped. Re-run flightctl-restore with the same archive path, or without an archive if the database and PKI are already restored and you only need device preparation. Compare the device count in the completion log with your inventory. For more information about recovering from a failed or partial restore, see Additional resources.

1.12.1. Recovering from a failed or partial restore

flightctl-restore does not roll back automatically. If the command fails or stops mid-run, assess what completed before you bring application services back online.

What device preparation changes

When the command reaches device preparation (with or without an archive), it typically performs these steps in order:

  1. Clears the KV store (removes cached control-plane data).
  2. Updates enrolled devices in PostgreSQL (sets the device-controller/awaitingReconnect annotation, clears last_seen, and sets summary status to AwaitingReconnect for affected devices).
  3. Updates non-approved enrollment requests for restore.
  4. Adds per-device awaiting-reconnection keys in the KV store.

If step 1 succeeds and a later step fails, the command exits with an error but the KV store can already be empty. If step 4 reports errors for individual devices, the command can still exit successfully; check the logs for Failed to add awaiting reconnection key or similar messages.

Archive-based restore can also fail earlier while importing the database dump, PKI, or configuration. In that case, the replacement host or cluster may be inconsistent until you complete a successful restore.

Recovery steps

  1. Keep Red Hat Edge Manager application services stopped on Red Hat Enterprise Linux, or keep API and worker deployments scaled to zero on Red Hat OpenShift Container Platform. Do not start the control plane for production use until restore completes successfully.
  2. Collect the flightctl-restore output and logs. Note whether the failure occurred during archive import or during device preparation.
  3. Fix the root cause (for example, database or KV connectivity, wrong DB_PASSWORD or KV_PASSWORD, port-forward dropped, or insufficient permissions).
  4. Re-run restore:

    • Full archive restore not yet complete: Run flightctl-restore again with the same archive path and credentials while services remain stopped.
    • Database and PKI already restored; only device preparation incomplete: Run flightctl-restore without an archive argument (same credentials and connectivity as your manual restore procedure) to retry device preparation.
  5. Review the completion log line for counts of devices and awaiting-reconnection keys updated. Investigate any devices that remain offline or lack AwaitingReconnect status after services are running. For more information about post-restore device status changes, see Additional resources.

If repeated restore attempts fail or the instance remains inconsistent, reinstall Red Hat Edge Manager at the target release from a clean state and run flightctl-restore again from a known-good archive, following your tested disaster-recovery runbook.

If problems persist, collect flightctl-backup or flightctl-restore logs, archive metadata.json, and deployment type, and open a support case with your Red Hat Edge Manager support channel.

Chapter 2. Backing up the PostgreSQL database for Red Hat Edge Manager on Red Hat Enterprise Linux

Red Hat Edge Manager stores control-plane data in PostgreSQL when you install the flightctl-services RPM on Red Hat Enterprise Linux. This topic covers database backup scope, recovery point objectives (RPOs), and practices that complement backups for an RPM deployment. For more information about backing up the full server on Red Hat Enterprise Linux, see Additional resources.

2.1. Overview

PostgreSQL is the primary data store for the Red Hat Edge Manager control plane. It holds device and fleet configuration, organizations, templates, user-related metadata, and other system state required to operate Flight Control.

You must define a Recovery Point Objective (RPO) that matches your business requirements, compliance obligations, and risk tolerance. Typical baselines include the following:

  • Mission-critical environments: RPO on the order of 15 minutes
  • Non-production or development environments: RPO up to 24 hours

For full control plane recovery (database, PKI, and configuration in one archive), use flightctl-backup and flightctl-restore. For more information about full server backup and restore overview, see Additional resources.

Important

You are responsible for retention, scheduling, and off-host storage of backup archives. Red Hat Edge Manager provides backup and restore commands and documents strategy; it does not manage your backup infrastructure or remote storage targets.

2.2. Database backup scope

This topic describes the database portion of backup scope. For a single archive that includes the database, PKI, and service configuration, use flightctl-backup. For more information about backing up the full server on Red Hat Enterprise Linux, see Additional resources.

To recover control plane data after loss or corruption when you manage PostgreSQL separately, back up the database that contains the flightctl schema (including all application data tables), together with the following:

  • Data: Devices, fleets, organizations, templates, and related records
  • Schema: Table definitions and migrations that match your installed Red Hat Edge Manager release
  • Security metadata: Database roles, grants, and users required for the application to access the database

The database runs as part of the Flight Control services you manage with systemctl on the Red Hat Enterprise Linux host (for example, under flightctl.target). Use the backup tooling your organization standardizes on (for example, logical dumps or filesystem snapshots) as long as you can restore a consistent copy of the flightctl database that matches your operational runbooks.

2.3. Testing backups

Validate backups regularly. Untested backups are a common source of failed recoveries.

  1. Create an isolated test environment that mirrors production closely enough to exercise a restore (same major Red Hat Edge Manager version and similar configuration).
  2. Restore from your most recent backup using flightctl-restore with a flightctl-backup archive, or using your chosen database-only restore procedure.
  3. Verify data integrity by listing Flight Control resources (for example, devices and fleets) and comparing them to a known-good source.
  4. Confirm that the Red Hat Edge Manager API and web console behave as expected against the restored database.
  5. Record any gaps, errors, and corrective actions in your runbooks.

2.4. Recommended practices

Follow these practices to complement your backup strategy for Red Hat Edge Manager on Red Hat Enterprise Linux.

  • Full server backups: Run flightctl-backup on a schedule that meets your RPO and copy archives off the host. For more information about backing up the full server on Red Hat Enterprise Linux, see Additional resources.
  • Git-backed configuration: Store declarative configuration in Git and sync it with the control plane using Repository resources and gitRef in device and fleet specifications where appropriate. For more information about managing the device configuration from a Git repository on the CLI, see Additional resources.
  • Separate failure domain for Git: Host Git repositories outside the same failure domain as the Red Hat Edge Manager host when possible (for example, a managed Git service or a separate cluster).
  • Back up deployment configuration: flightctl-backup includes service configuration (for example, /etc/flightctl/service-config.yaml and drop-in files under /etc/flightctl/conf.d/) and PKI under /etc/flightctl/. Also track configuration in Git or configuration management so you can rebuild the host consistently if needed.
  • Version and tag changes: Track changes to configuration and automation with version control and tags so you can roll back or redeploy predictably.
  • Restoring on Red Hat Enterprise Linux: When you need to recover the database on the host. For more information about restoring the PostgreSQL database for Red Hat Edge Manager on Red Hat Enterprise Linux, see Additional resources.

Chapter 3. Restoring the PostgreSQL database for Red Hat Edge Manager on Red Hat Enterprise Linux

Use this topic when you recover the PostgreSQL database with your own tooling (for example, pg_restore or storage snapshots) and then reconcile Red Hat Edge Manager state. For disaster recovery from a flightctl-backup archive, use the full server restore procedure. For more information about restoring the full server on Red Hat Enterprise Linux, see Additional resources.

A typical flightctl-services deployment runs application containers as systemd-managed Podman quadlets. During manual restore you must keep the PostgreSQL and KV store services running while application services are stopped: the procedure below restores the database, runs flightctl-restore without an archive for device preparation, then starts application services again. An alternative path skips flightctl-restore but still stops only application services while the database and KV remain up.

3.1. Prerequisites

The following items cover access, tooling, backups, and release compatibility for the restore procedures in this topic on a Red Hat Enterprise Linux host where Red Hat Edge Manager runs as systemd-managed Podman quadlets. The mandatory requirements apply to both the primary restore procedure and the database-only alternative. Optional tools listed later are needed only when you run the steps that use them (for example, temporary port publishing for flightctl-restore).

  • Host access: Root privileges or sudo on the host where the quadlets run, with permission to stop and start systemd units, run systemctl daemon-reload, create unit drop-in files under .d directories, and run Podman commands (including podman secret and podman exec).
  • Core tools: systemctl and podman installed and usable by your restore account.
  • Flight Control CLI tools: flightctl and flightctl-restore available on the host (or on a jump host if your runbook runs commands remotely). Before you restore, confirm that flightctl-restore matches the Red Hat Edge Manager server version.
  • Backup artifacts: A tested backup of the flightctl PostgreSQL database, in a form your team can replay (logical dump, physical backup, or other agreed method). Include matching configuration backups from /etc/flightctl/ when your recovery plan requires them.
  • Compatible versions: The Red Hat Edge Manager / flightctl-services release on the host should be compatible with the data in the backup (typically the same major release as when the backup was created).

Optional tools for verification steps in this topic (install only what you use):

  • jq — parse output from podman secret inspect when retrieving passwords.
  • pg_isready — check PostgreSQL readiness on localhost after temporary port publishing.
  • redis-cli (Red Hat Enterprise Linux 9) or valkey-cli (Red Hat Enterprise Linux 10) — check KV store connectivity on localhost; use the client that matches your host’s major Red Hat Enterprise Linux version and container image.
  • ss or netstat — confirm that ports 5432 and 6379 listen during the restore window.
Note

If Red Hat Edge Manager runs on Red Hat OpenShift Container Platform or another Kubernetes cluster rather than on a Red Hat Enterprise Linux quadlet host, prerequisites differ (for example, kubectl access and cluster networking). For more information about restoring the PostgreSQL database for Red Hat Edge Manager on Red Hat OpenShift Container Platform, see Additional resources.

Important

Perform full restores during a maintenance window. Restoring data requires stopping application services and can interrupt device management until the procedure completes.

3.2. Restore using quadlets and flightctl-restore

Typical RPM installations run Red Hat Edge Manager application components as separate systemd units backed by Podman quadlets, while the PostgreSQL and KV store containers keep running until you intentionally restart them.

Note

For recovery from a flightctl-backup archive, use the full server restore procedure. For more information about restoring the full server on Red Hat Enterprise Linux, see Additional resources. The steps below apply when you restore the database manually and then run flightctl-restore without an archive argument for device preparation.

This sequence matches the quadlet layout: stop application services only, restore the flightctl database, supply credentials, expose ports locally for flightctl-restore, run the restore binary, remove temporary port publishing, then start application services again.

  1. Verify that flightctl-restore matches the Red Hat Edge Manager server version:

    Run flightctl version from a host where the CLI can reach the Red Hat Edge Manager API (use flightctl login if needed). The command reports Client Version (the CLI binary) and, when connected, Server Version (the running Red Hat Edge Manager instance). Run flightctl-restore version to report the restore binary version.

    flightctl version
    flightctl-restore version

    Compare the output: Client Version, Server Version, and the flightctl-restore version must all report the same Red Hat Edge Manager release. If Server Version is missing or any version differs, update the mismatched binaries before you continue. For more information about prerequisites for backup and restore, see Additional resources.

    Important

    Cross-version restore is not supported. The CLI client, the running server, and flightctl-restore must be on the same release before you continue.

  2. Stop the Red Hat Edge Manager application services so they do not write to the database during restore. Do not stop flightctl-db.service, flightctl-kv.service, or other database or KV units; they must keep running while you restore data and run flightctl-restore.

    # Stop only application services (keep database and KV store running)
    sudo systemctl stop flightctl-gateway.service
    sudo systemctl stop flightctl-api.service
    sudo systemctl stop flightctl-worker.service
    sudo systemctl stop flightctl-periodic.service
    sudo systemctl stop flightctl-alert-exporter.service
    sudo systemctl stop flightctl-alertmanager-proxy.service
    sudo systemctl stop flightctl-telemetry-gateway.service
    sudo systemctl stop flightctl-pam-issuer.service
    sudo systemctl stop flightctl-cli-artifacts.service
    sudo systemctl stop flightctl-alertmanager.service
    sudo systemctl stop flightctl-imagebuilder-api.service
    sudo systemctl stop flightctl-imagebuilder-worker.service
    sudo systemctl stop flightctl-ui.service
    Note

    If systemctl reports Unknown unit for a service, your host might not ship that component (for example, image builder or UI). Skip that line. Do not run systemctl stop flightctl.target here; that would stop the database and KV services as well.

  3. Confirm that the application units you stopped are inactive:

    sudo systemctl status flightctl-api.service
    systemctl is-active flightctl-api.service
  4. Restore the flightctl PostgreSQL database using the method that matches your backup (for example, pg_restore, psql with a SQL dump, or storage-level recovery). Ensure the database is consistent and reachable from the deployment before you run flightctl-restore.
  5. Retrieve the database application password from the Podman secret:

    DB_APP_PASSWORD=$(sudo podman secret inspect flightctl-postgresql-user-password --showsecret | jq -r '.[0].SecretData')
    echo "Database password retrieved successfully"
  6. Retrieve the KV store password from the Podman secret:

    KV_PASSWORD=$(sudo podman secret inspect flightctl-kv-password --showsecret | jq -r '.[0].SecretData')
    echo "KV store credentials retrieved successfully"
  7. Optional: verify database and KV connectivity from the host through the running containers:

    Database: verify readiness:

    sudo podman exec flightctl-db pg_isready -U postgres

    To connect to the database for additional verification (optional):

    sudo podman exec -it flightctl-db psql -U flightctl_app -d flightctl

    KV store: on Red Hat Enterprise Linux 9 you can use redis-cli; on Red Hat Enterprise Linux 10 use valkey-cli inside the container, for example:

    sudo podman exec flightctl-kv redis-cli ping
  8. Publish the database and KV ports on localhost so flightctl-restore can reach them. The database and KV containers use a private network by default; use temporary systemd drop-in files to add port publishing, then reload and restart only those services:

    DB_CONTAINER_FILE=$(systemctl show flightctl-db.service -p SourcePath --value)
    KV_CONTAINER_FILE=$(systemctl show flightctl-kv.service -p SourcePath --value)
    DB_DROPIN_DIR="${DB_CONTAINER_FILE}.d"
    KV_DROPIN_DIR="${KV_CONTAINER_FILE}.d"
    sudo mkdir -p "$DB_DROPIN_DIR" "$KV_DROPIN_DIR"
    
    sudo tee "$DB_DROPIN_DIR/10-publish-port.conf" > /dev/null <<'EOF'
    [Container]
    PublishPort=5432:5432
    EOF
    
    sudo tee "$KV_DROPIN_DIR/10-publish-port.conf" > /dev/null <<'EOF'
    [Container]
    PublishPort=6379:6379
    EOF
    
    sudo systemctl daemon-reload
    sudo systemctl restart flightctl-db.service flightctl-kv.service

    Verify listening ports and basic connectivity (adjust the KV client for your Red Hat Enterprise Linux major version):

    ss -tlnp | grep -E ':5432|:6379' || true
    pg_isready -h localhost -p 5432
    REDISCLI_AUTH="$KV_PASSWORD" redis-cli -h localhost -p 6379 ping

    On Red Hat Enterprise Linux 10, use VALKEYCLI_AUTH with valkey-cli instead of REDISCLI_AUTH with redis-cli if that matches your environment.

  9. Run flightctl-restore with the database and KV passwords (run from the directory that contains the binary if you do not use a full path):

    DB_PASSWORD="$DB_APP_PASSWORD" KV_PASSWORD="$KV_PASSWORD" ./bin/flightctl-restore

    Monitor the command output until it completes successfully. If it fails or logs per-device KV errors, keep application services stopped. For more information about recovering from a failed or partial restore, see Additional resources before you remove port publishing or start application services.

  10. Remove the temporary port publishing drop-ins and restart the database and KV services:

    DB_CONTAINER_FILE=$(systemctl show flightctl-db.service -p SourcePath --value)
    KV_CONTAINER_FILE=$(systemctl show flightctl-kv.service -p SourcePath --value)
    DB_DROPIN_DIR="${DB_CONTAINER_FILE}.d"
    KV_DROPIN_DIR="${KV_CONTAINER_FILE}.d"
    sudo rm -f "$DB_DROPIN_DIR/10-publish-port.conf" "$KV_DROPIN_DIR/10-publish-port.conf"
    sudo rmdir "$DB_DROPIN_DIR" 2>/dev/null || true
    sudo rmdir "$KV_DROPIN_DIR" 2>/dev/null || true
    sudo systemctl daemon-reload
    sudo systemctl restart flightctl-db.service flightctl-kv.service
  11. Start the application services again (same set you stopped; omit units your host does not use):

    sudo systemctl start flightctl-api.service
    sudo systemctl start flightctl-worker.service
    sudo systemctl start flightctl-periodic.service
    sudo systemctl start flightctl-alert-exporter.service
    sudo systemctl start flightctl-alertmanager-proxy.service
    sudo systemctl start flightctl-telemetry-gateway.service
    sudo systemctl start flightctl-pam-issuer.service
    sudo systemctl start flightctl-cli-artifacts.service
    sudo systemctl start flightctl-alertmanager.service
    sudo systemctl start flightctl-imagebuilder-api.service
    sudo systemctl start flightctl-imagebuilder-worker.service
    sudo systemctl start flightctl-ui.service
    sudo systemctl start flightctl-gateway.service

    Verify with sudo systemctl status on each unit and sudo podman ps --filter "name=flightctl-".

  12. Confirm that the API responds and that inventory looks correct in the Red Hat Edge Manager web console or with flightctl CLI commands.

3.3. Alternative: Database-only restore without flightctl-restore

If your operations team restores the PostgreSQL data without running flightctl-restore (for example, a DBA-led replay into the live database), use the same rule as the main procedure: stop only application services so the database and KV store keep running. Do not use systemctl stop flightctl.target, which stops flightctl-db, flightctl-kv, and everything else.

  1. Stop the application services only (same list as in the Restore using quadlets and flightctl-restore procedure, step 2). Skip units that do not exist on your host.
  2. Restore the flightctl database using the procedure that matches your backup while PostgreSQL remains available:

    • Logical backups (dump files): Use pg_restore, psql, or equivalent clients per your DBA standards.
    • Storage or snapshot restores: Restore the data directory or volume following your infrastructure playbook.

      Ensure database name, roles, and grants match what Red Hat Edge Manager expects.

  3. Verify that PostgreSQL accepts connections and that the flightctl database is present (for example, pg_isready or a short test query).
  4. Start the application services again with the same systemctl start sequence as in the Restore using quadlets and flightctl-restore procedure, step 11. Do not rely on systemctl start flightctl.target unless your runbook confirms it does not disrupt the database or KV units you left running.
  5. Confirm health (for example, sudo systemctl status flightctl-api.service) and validate data in the web console or with flightctl CLI commands.

3.4. After you restore

When the restore commands finish and application services are healthy again, validate the control plane and plan for device reconciliation. Restoring the database changes what the service knows about devices; edge devices must reconnect and compare their live state to the restored specifications.

Operational follow-up

  • Re-run checks from the Testing backups section if you need a structured validation checklist. For more information about testing backups, see Additional resources.
  • Record commands, secret handling, and timing in your runbooks so the next restore repeats cleanly.

3.5. Post-restore device status changes

After a successful restore, devices move through automatic status transitions while they reconnect and reconcile with the restored control plane data.

AwaitingReconnect
Devices are always placed in AwaitingReconnect first. The service waits for each device to report its current state again. Spec reconciliation for those devices remains paused until they reconnect.
Enrollment requests and post-restore approval

Devices approved after the restored backup was taken do not exist after the restore and must be approved again. After restore:

  • Devices created from a restored enrollment request are placed in AwaitingReconnect and follow the normal AwaitingReconnect behavior.
  • Devices without an enrollment request before backup, with a non-zero deployed specification version, are placed in AwaitingReconnect and follow the normal AwaitingReconnect behavior.
  • Devices without an enrollment request before backup, with a zero specification version, move to normal status.
ConflictPaused
After a device reconnects and reports its current state, the service compares the specification stored in the restored backup with the device-reported version. If the restored backup specification is older (for example, the device had moved forward while backups lagged), the device can enter ConflictPaused. Rendering of new specifications stops for that device until an operator resolves the mismatch. Human review is required before you force configuration forward.
Normal operation
When the restored specification and the device-reported state are compatible, the device returns to normal operational statuses (for example, online or updating) and usual reconciliation resumes.

Monitor device status

Use the flightctl CLI to see which devices need attention:

flightctl get devices
flightctl get devices --field-selector=status.summary.status=AwaitingReconnect
flightctl get devices --field-selector=status.summary.status=ConflictPaused

Resolve ConflictPaused devices

  1. Review the specification source: if the device belongs to a fleet, inspect the fleet template and selector; if not, inspect the device spec directly. Review labels and ownership to confirm how the restored specification applies to the device.
  2. When you are confident the restored specification is what you want, resume the device or a group of devices. Replace example-device with your device resource name and adjust selectors to match your environment:

    flightctl resume device example-device
    flightctl resume device --selector="environment=production"

    Use additional flightctl resume device options your deployment supports (for example, field selectors) if you need to resume many devices in bulk.

Chapter 4. Backing up the PostgreSQL database for Red Hat Edge Manager on Red Hat OpenShift Container Platform

This topic describes PostgreSQL backup strategy for Red Hat Edge Manager on Red Hat OpenShift Container Platform: recovery point objectives (RPOs), database backup scope, how to validate restores, and practices that complement backups in a cluster deployment. For more information about backing up the full server on Red Hat OpenShift Container Platform, see Additional resources.

4.1. Overview

Red Hat Edge Manager uses PostgreSQL to store mission-critical data, including device configurations, fleet management records, user profiles, and system state. To protect this data and ensure business continuity, implement backup strategies that support disaster recovery.

Define your Recovery Point Objective (RPO) based on your business requirements, compliance obligations, and risk tolerance. Typical baselines include the following:

  • Mission-critical environments: RPO on the order of 15 minutes
  • Non-production or development environments: RPO up to 24 hours
Important

For full control plane recovery on Red Hat OpenShift Container Platform, use flightctl-backup and flightctl-restore from a workstation with cluster access. For more information about full server backup and restore overview, see Additional resources.

You are responsible for retention, scheduling, and off-cluster storage of backup archives. Red Hat Edge Manager provides backup and restore commands and documents architectural guidance; it does not manage your backup infrastructure or remote storage targets.

4.2. Database backup scope

This topic describes the database portion of backup scope. For a single archive that includes the database, PKI Secrets, and configuration, use flightctl-backup. For more information about backing up the full server on Red Hat OpenShift Container Platform, see Additional resources.

When you use database-only tooling, back up the flightctl database, including the following:

  • Data: Devices, fleets, organizations, templates, and related records
  • Schema and structure: The database architecture and table definitions that match your installed Red Hat Edge Manager release
  • Security metadata: Database users, roles, and access permissions required for the application

4.3. Testing backups

Regularly test backup and restore procedures. Untested backups are a common source of failed recoveries.

  1. Create an isolated test environment that mirrors your production configuration closely enough to exercise a restore (same major Red Hat Edge Manager version and similar configuration).
  2. Restore from your most recent backup into the test environment using flightctl-restore with a flightctl-backup archive, or using your chosen database-only restore procedure.
  3. Verify data integrity by listing Flight Control resources (for example, devices and fleets) and comparing them to production records.
  4. Confirm that the Red Hat Edge Manager API and web console behave as expected against the restored database.
  5. Record any gaps, errors, and corrective actions in your runbooks.

4.4. Recommended practices

Follow these practices to complement your backup strategy for Red Hat Edge Manager on Red Hat OpenShift Container Platform.

  • Full server backups: Run flightctl-backup on a schedule (for example, a CronJob) and copy archives off the cluster. For more information about backing up the full server on Red Hat OpenShift Container Platform, see Additional resources.
  • Git-backed configuration: Store declarative configuration in Git. Use the Repository resource to reference configurations with gitRef in device and fleet specifications where appropriate. For more information about managing the device configuration from a Git repository on the CLI, see Additional resources.
  • Separate failure domain for Git: Host Git repositories on external services (for example, GitHub, GitLab, or Bitbucket) in a different failure domain than the cluster that runs Red Hat Edge Manager.
  • Back up deployment configuration: flightctl-backup captures Helm values and the service ConfigMap; also store values files in Git so you can redeploy consistently.
  • Version and tag changes: Tag and version configuration changes so you can roll back or redeploy predictably.

Chapter 5. Restoring the PostgreSQL database for Red Hat Edge Manager on Red Hat OpenShift Container Platform

After you lose data or need to recover the control plane on Red Hat OpenShift Container Platform and you have a flightctl-backup archive. For more information about restoring the full server on Red Hat OpenShift Container Platform, see Additional resources.

This topic covers manual PostgreSQL recovery: scaling workloads, restoring the database with your own tooling, running flightctl-restore without an archive for device preparation, and bringing services back. Exact database restore commands still depend on your backup format.

5.1. Prerequisites

The following prerequisites apply to the restore procedure for Red Hat Edge Manager on Red Hat OpenShift Container Platform.

  • Cluster access to the Kubernetes cluster that hosts the Red Hat Edge Manager deployment.
  • The OpenShift project (Kubernetes namespace) where you installed the Red Hat Edge Manager Helm chart. The examples in this topic use the placeholder rhem-chart-namespace; substitute your real namespace name everywhere it appears (production deployments often use a single namespace for all Red Hat Edge Manager workloads).
  • Kubernetes tools: kubectl installed and configured with administrative permissions.
  • Flight Control CLI: flightctl and flightctl-restore available locally or in your restore environment.
  • Backup artifacts: access to the database backup files required for recovery.
  • Optional verification tools: redis-cli and pg_isready for validating service readiness and data integrity.
Important

Database restore steps before you run flightctl-restore must match your backup strategy (for example, pg_restore, psql, or volume-level recovery). Follow your organization’s runbooks together with the outline below.

5.2. Restore procedure

Use this procedure to restore the flightctl PostgreSQL database for Red Hat Edge Manager on Red Hat OpenShift Container Platform when you are not using an archive from flightctl-backup.

Note

For recovery from a flightctl-backup archive, use the full server restore procedure. For more information about restoring the full server on Red Hat OpenShift Container Platform, see Additional resources.

  1. Verify that the flightctl-restore version matches the Red Hat Edge Manager server version:

    From a workstation where flightctl is configured to reach the Red Hat Edge Manager API, run flightctl version (use flightctl login first if needed). The command reports Client Version (the CLI binary) and, when connected, Server Version (the running Red Hat Edge Manager instance). Run flightctl-restore version to report the restore binary version.

    flightctl version
    flightctl-restore version

    Compare the output: Client Version, Server Version, and the flightctl-restore version must all report the same Red Hat Edge Manager release. If Server Version is missing or any version differs, update the mismatched binaries before you continue. For more information about prerequisites for backup and restore, see Additional resources.

    Important

    Cross-version restore is not supported. The CLI client, the running server, and flightctl-restore must be on the same release before you continue.

  2. Scale down the Red Hat Edge Manager services to avoid data conflicts during the restore process. On Red Hat OpenShift Container Platform, all of these workloads run in a single OpenShift project; use the same namespace value on every command (the placeholder rhem-chart-namespace stands for that project—replace it with yours):

    # Replace rhem-chart-namespace with your OpenShift project (one namespace for all deployments below)
    kubectl scale deployment flightctl-api --replicas=0 -n rhem-chart-namespace
    kubectl scale deployment flightctl-worker --replicas=0 -n rhem-chart-namespace
    kubectl scale deployment flightctl-periodic --replicas=0 -n rhem-chart-namespace
    kubectl scale deployment flightctl-alert-exporter --replicas=0 -n rhem-chart-namespace
    kubectl scale deployment flightctl-alertmanager-proxy --replicas=0 -n rhem-chart-namespace

    Wait for pods to terminate, then verify:

    # Same namespace as above
    kubectl get pods -n rhem-chart-namespace
  3. Restore the flightctl PostgreSQL database from your existing backup.

    After all Red Hat Edge Manager services are scaled down, restore the database using the method that matches your backup strategy.

    • Target database: Restore the PostgreSQL database instance named flightctl.
    • Supported methods: Use your preferred recovery procedure, such as pg_restore, psql (for SQL dumps), or infrastructure-level volume snapshots.
    • Verification: Confirm that the database is fully accessible and that the integrity of the restored data is verified before proceeding.
    Important

    The specific restoration commands depend on your backup strategy and tooling. Ensure the database is fully restored and consistent before you continue.

  4. Retrieve database and KV store credentials (same namespace as the scale commands):

    DB_APP_PASSWORD=$(kubectl get secret flightctl-db-app-secret -n rhem-chart-namespace -o jsonpath='{.data.userPassword}' | base64 -d)
    
    echo "Database password retrieved successfully"
    KV_PASSWORD=$(kubectl get secret flightctl-kv-secret -n rhem-chart-namespace -o jsonpath='{.data.password}' | base64 -d)
  5. Set up port forwarding for the database and the KV store. Use separate terminal sessions for each port forward, or run them in the background.

    Forward the database service:

    # Forward database port (run in a separate terminal or in the background)
    kubectl port-forward svc/flightctl-db 5432:5432 -n rhem-chart-namespace &
    DB_PORT_FORWARD_PID=$!
    
    # Verify database connectivity (if available)
    pg_isready -h localhost -p 5432

    Forward the KV store service:

    # Forward KV store port (run in a separate terminal or in the background)
    kubectl port-forward svc/flightctl-kv 6379:6379 -n rhem-chart-namespace &
    KV_PORT_FORWARD_PID=$!
    
    # Verify KV store connectivity (if available)
    REDISCLI_AUTH="$KV_PASSWORD" redis-cli -h localhost -p 6379 ping
  6. Run the restore command using environment variables for the database and KV store passwords:

    DB_PASSWORD="$DB_APP_PASSWORD" KV_PASSWORD="$KV_PASSWORD" ./bin/flightctl-restore

    Monitor the command output until it completes successfully. If it fails or logs per-device KV errors, keep workloads scaled to zero. For more information about recovering from a failed or partial restore, see Additional resources before you stop port-forwards or scale services back up.

  7. Stop the port-forward processes when the restore finishes:

    kill $DB_PORT_FORWARD_PID $KV_PORT_FORWARD_PID

    If you ran the port forwards in separate terminals instead, stop them with Ctrl+C in those terminals.

  8. Restart Red Hat Edge Manager services. Scale the deployments back to their normal replica counts in the same OpenShift project as in step 2:

    # Replace rhem-chart-namespace with your OpenShift project (same single namespace for every command)
    kubectl scale deployment flightctl-api --replicas=1 -n rhem-chart-namespace
    kubectl scale deployment flightctl-worker --replicas=1 -n rhem-chart-namespace
    kubectl scale deployment flightctl-periodic --replicas=1 -n rhem-chart-namespace
    kubectl scale deployment flightctl-alert-exporter --replicas=1 -n rhem-chart-namespace
    kubectl scale deployment flightctl-alertmanager-proxy --replicas=1 -n rhem-chart-namespace
    kubectl get deployments -n rhem-chart-namespace
    kubectl get pods -n rhem-chart-namespace

5.3. After you restore

When the restore finishes and Red Hat Edge Manager workloads are healthy again in your Red Hat OpenShift Container Platform namespace, validate the control plane and plan for device reconciliation. Restoring the database changes what the service knows about devices; edge devices must reconnect and compare their live state to the restored specifications.

Operational follow-up

  • Re-run any validation steps from the Testing backups section if you need a structured checklist. For more information about testing backups, see Additional resources.
  • Document deviations, incidents, and command variants in your runbooks so the next restore follows the same path.

5.4. Post-restore device status changes

After a successful restore, devices move through automatic status transitions while they reconnect and reconcile with the restored control plane data.

AwaitingReconnect
Devices are always placed in AwaitingReconnect first. The service waits for each device to report its current state again. Spec reconciliation for those devices remains paused until they reconnect.
Enrollment requests and post-restore approval

Devices approved after the restored backup was taken do not exist after the restore and must be approved again. After restore:

  • Devices created from a restored enrollment request are placed in AwaitingReconnect and follow the normal AwaitingReconnect behavior.
  • Devices without an enrollment request before backup, with a non-zero deployed specification version, are placed in AwaitingReconnect and follow the normal AwaitingReconnect behavior.
  • Devices without an enrollment request before backup, with a zero specification version, move to normal status.
ConflictPaused
After a device reconnects and reports its current state, the service compares the specification stored in the restored backup with the device-reported version. If the restored backup specification is older (for example, the device had moved forward while backups lagged), the device can enter ConflictPaused. Rendering of new specifications stops for that device until an operator resolves the mismatch. Human review is required before you force configuration forward.
Normal operation
When the restored specification and the device-reported state are compatible, the device returns to normal operational statuses (for example, online or updating) and usual reconciliation resumes.

Monitor device status

Use the flightctl CLI to see which devices need attention:

flightctl get devices
flightctl get devices --field-selector=status.summary.status=AwaitingReconnect
flightctl get devices --field-selector=status.summary.status=ConflictPaused

Resolve ConflictPaused devices

  1. Review the specification source: if the device belongs to a fleet, inspect the fleet template and selector; if not, inspect the device spec directly. Review labels and ownership to confirm how the restored specification applies to the device.
  2. When you are confident the restored specification is what you want, resume the device or a group of devices. Replace example-device with your device resource name and adjust selectors to match your environment:

    flightctl resume device example-device
    flightctl resume device --selector="environment=production"

    Use additional flightctl resume device options your deployment supports (for example, field selectors) if you need to resume many devices in bulk.

Legal Notice

Copyright © Red Hat.
Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.
The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.
All other trademarks are the property of their respective owners.