Multi-Site Architecture for Red Hat Gluster Storage
Updated
Multi-Site Disaster Recovery Clusters
- A multi-site cluster established for disaster recovery comprises two or more completely different clusters
- These clusters typically have the same configuration, with one active and the other passive
- If the primary site fails, the secondary site is manually activated and takes over all services
- The secondary site is async replica of primary site
- Multi-site clusters are generally supported without any special considerations, since implementation involves two separate clusters with the same configuration/architecture at two physical locations
Stretch Clusters
- Stretch clusters are designed to withstand the loss or failure of all members at a given physical site.This can be a challenge for a number of reasons:
- A large percentage of cluster members might be lost simultaneously
- Loss of connectivity to all members at a given site might be more likely because site-to-site network and storage connectivity is often less redundant, more expensive, and less reliable than single-site connectivity
- Some method of multi-site storage replication is required so that clustered services data is still available after site loss
- Stretch cluster replicas are always in sync with minimal latency
For the purposes of this document, a stretch cluster is one that comprises a single infrastructure and membership spanning all sites. Membership of the cluster is logically divided into two groups so that cluster services can continue with minimal disruption when an entire group fails or becomes unreachable. Data is replicated via software replication mechanisms so that each group has access to a replica. The groups are typically, but not necessarily, at different physical locations, often with reduced communication inter-connectivity and increased delay compared to a single site.
The following is some examples of what qualifies as a stretch cluster:
- Multiple connected physical chassis where no chassis has a majority of the cluster nodes
- Cluster members that are located in the same room or data-center but are not all connected to the same switch in 1 hop
- Cluster members that are located in different physical sites connected by physical site link
The limitations, requirements, and guidelines listed in the remainder of this document generally apply to stretch clusters
- Overview
Only certain configurations of stretch clusters can be supported by Red Hat. In addition to the specific restrictions and limitations noted below - Requirements
Stretch-cluster deployments should have a burn-in/testing period during which the architecture is validated in a non-production environment, but with production loads and under a variety of failure conditions to adequately test the configuration and ensure the behavior of the cluster adequately meets the requirements of the deployment. - Limitations
- Both physical sites must be connected by a network interconnect (for example, a site to site fiber interconnect) that provides LAN-like latency that is less than or equal to
5ms (<=5ms RTT). Higher latency site-to-site connections are not supported. For measuring latency see the following article:How can I determine the latency of my Multi-site cluster?. - A stretch cluster can only span 3 physical sites (including any quorum device configured from a third site only for quorum node)
- The limitation of
5mslatency is between data nodes. Forquorum node(not a data node) latency can be up to15 ms - The cluster nodes must be distributed evenly across the three physical sites. Each physical site has to contain an equal number of cluster nodes
- If you plan to have a cluster spanning across only 2 sites, note that in case of a site downtime, you would be unable to access data from another site, as quorum would not be met. A two site cluster also requires a support exception to be filed.
- Also note that quorum must be always set to > 50 or auto. Other quorum settings are not supported.
- Storage Consulting must be involved at the beginning of the architecture discussion. As stretch clusters are not officially supported configuration
- Both physical sites must be connected by a network interconnect (for example, a site to site fiber interconnect) that provides LAN-like latency that is less than or equal to
Product(s)
Category
Components
Article Type