JGroups cluster transport configuration for Data Grid Server 8.x

Solution Verified - Updated

Environment

  • Red Hat Data Grid (RHDG)
    • 8

Issue

  • How to configure JGroups ports for Data Grid Server without using infinispan.xml?
  • Which JGroups transport configuration is recommended?
  • What discovery protocol is recommended?
  • What ports are used for communication between nodes in a cluster?
  • Is it possible to cluster Data Grid instances running on two different hosts ?

Resolution

Red Hat Data Grid is mainly used in clustered configurations that balance client requests to distribute load and avoid excessive memory resource consumption from any one instance.

RHDG uses JGroups technology as the underlying framework that provides clustering capabilities. JGroups provides mechanisms for cluster discovery and transport and also offers features such as failure detection through the use of additional network sockets.

This article is intended to provide information that helps you understand how to tailor JGroups settings for different network environments and successfully set up RHDG clustering.

RHDG defaults

RHDG provides default JGroups stack configurations as part of the Data Grid distribution .

The default stacks provide basic functionality that should work out of the box in most cases. The default TCP and UDP stacks use multicast and full port ranges for all cluster members.

If it is possible to use UDP with multicast, it is recommended to use the udp stack because it scales better with messages sent 1-many.

If it is not possible to use UDP or multicast, then you can use the tcp stack. However, this can negatively affect performance because messages are sent 1-1 and cause more network traffic and processing for the nodes.

Change the JGroups stack that clusters use with the -j <stack> or --cluster-stack=<stack> arguments when starting the server or change the infinispan.cluster.stack attribute for the <transport> element in configuration.

Hint The -b option will set the IP address of the server API (Hotrod, http, etc) and not affect the IP JGroups inner cluster communication will use, this is elected according to the JGroups configuration by default SITE_LOCAL, see [JGroups Transport](http://jgroups.org/manual4/index.html#Transport “JGroups 4.x documentation).
The server start parameter -k’ ‘--cluster-address will set the IP address if the configuration contains the expression jgroups.bind.address, otherwise it must be directly set by configuration.

In production the recommended way is to use the start options or properites to static settings to ensure the used IP addresses and ports are known.
The settings should be fixed and use the different properties to set fixed port numbers and overwrite the port_range with “0” to not allow unexpected variations.
Also the port-offset provided by -o, which provide to add an offset for the used ports to allow severa instances on the same machine, is not recommended for production because of opaque behavior.
Hint Until 8.1.1 the offset affects only the RHDG API port infinispan.bind.port from 8.1.1 onwards it will add this offset to the jgroups.bind.port as well.

Important properties for configuration
  • infinispan.cluster.stack
    By default tcpis used, could be set with start option -j <stack> or --cluster-stack=<stack>

  • infinispan.bind.address
    should be used to use the fix IP address

  • infinispan.bind.port
    should be used to change the public server port (11222) if necessary

  • jgroups.bind.address
    The interface JGroups will bind for udp and tcp (Cloud protocols are using this as well as tcp is used)
    It will default to SITE_LOCAL for the transport, but use 127.0.0.1 for MPING.

  • jgroups.bind.port
    The port JGroups will bind for udp and tcp (Cloud protocols are using this as well as tcp is used)
    For tcp transport it is 7800
    For udp it is set to 0 which means random port election
    Port_range for UDP and TCP needs configuration change as not set and default to >0

  • jgroups.mcast_addr
    Multicast address for discovery used by udp PING and tcp MPING

  • jgroups-mcast_port
    Multicast port for discovery used by udp PING and tcp MPING

  • jgroups.join.timeout
    Setting for a timeout for membership detection

For a simple start command there is a new option, from RHDG 8.1.1 onwards the option -P --properties=<file> for the server start command can be sued to provide all the settings with a single configuration file.

Cluster node detection

There are different membership discovery protocols supported, especially for Cloud environments. See the Default JGroups Stacks, Cluster Discovery and Running Data Grid on OpenShift

With the recommended UDP configuration JGroups will use UDP Multicast with the <PING> protocol to detect other nodes in the same sub-network, if a wider network is used or the network is restricted it multicast needs to be checked.
It is highly recommended to check the multicast addresses or set -Djgroups.mcast_addr= and/or -Djgroups.mcast_port= to a different and unique address.
This will prevent problems if other (default) instances are started in the same network and join a cluster by accident! Consider that all applications like EAP or RHDG which are using JGroups might connect. It would produce weird warn or error messages if different versions of JGroups are connecting as it is not ensured that the protocol will be compatible. At least a warning should be seen that messages of unknown members are discarded.

The default TCP configuration will use MPING which relies on multicast as well. And will have the same restrictions as UDP transport.
To switch to a static discovery <TCPPING> should be used which will not use Multicast but needs static configuration.
The following configuration for infinispan.xml will extend the default TCP configuration and change the discovery

<infinispan …>
   <jgroups>
    <stack name=”tcp-ping” extends=”tcp”>
      <TCPPING stack.combine="REPLACE" stack.position="MPING" initial_hosts=”${jgroups.bind.address:127.0.0.1}[7800],otherIP[7800]” port_range=”0” />
    </stack>
   </jgroups>
   <cache-container name="default" statistics="true">
      <transport cluster="${infinispan.cluster.name}" stack="tcp-ping" node-name="${infinispan.node.name:}"/>
   </cache-container>

Important Note: It is important to use these parameters {stack.combine="REPLACE" stack.position="MPING"} while using "extends="tcp" to use TCPPING instead of MPING.
Without this MPING is used from default tcp stack. Refer (documentation)[https://access.redhat.com/documentation/en-us/red_hat_data_grid/8.3/html-single/data_grid_server_guide/index#customizing-jgroups-stacks_cluster-transport]

Note that JGroups will probe the bind-port and increase the port-number, if not suppressed by port_range, if it is used by another process. This allows starting multiple instances of RHDG at the same machine.
The TCPPING initial_hosts list should include all cluster members, including this one so the list is the same for all instances, to ensure best detection.
If the addresses are all fixed and known the port_range should be set to "0" to not probe other instances which is unnecessary.

Hint until 8.1.1 the -o start option to add a port-offset is not used for JGroups. It will remain the initial port, or increased by 1 if used as the default port_range is not “0”.
From 8.1.1 onwards the port-offset is used and added to the bind_port if jgroups.bind.port expression is used. The port_range should be set to "0" in this case to prevent from unexpected change to the port and because of that unexpected discovery results.

Use JGroups static port or autoincrement for 8.1.1+

To remove the port offset and use the automatic JGroups increment the transport bind_port could be set to a hard value without using the ${jgroups.bind.port}, the -o parameter will then not have any effect to this and JGroups will increment the value by “1” if the port_range will allow it.

Example for a static port setting

<jgroups>
    <stack name="tcp2" extends="tcp">
        <TCP stack.combine="COMBINE" bind_port="7800" port_range="0"/>
    </stack>
 </jgroups>

If multiple instances are started with port_range=”0”, or the port_range is to low, for the same JGroups IP address the server will not start and fail with a FATAL message:

Caused by: java.net.BindException: no port available in range [7800 .. 7800] (bind_addr=/<some IP>)

Other used ports

The FD_SOCK protocol will use a port as well to detect fast failure of a node.
Note by default it will use a random port which might be affected by firewall permissions. To restrict the port to a known number start_port and port_range can be used to limit the used ports.

<infinispan …>
  <jgroups>
    ..
    <stack name="changedFDSOCK" extends="ucp">
      <FD_SOCK start_port="50000" port_range=”0” />
    </stack>
  </jgroups>
  ...

With the configuration above the FD_SOCK protocol is completely replaced in the default udp stack at the same place!

There is a request to use a fix start port and allow th -o option to take effect for FD_SOCK port This content is not included.ISPN-12531

For more information the documentation Configure Data Grid Clustering should be visited.

Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.