UFC protocol in jgroups comms and its importance for async cross-site
Environment
- Red Hat JBoss Enterprise Application Platform (EAP)
- 7.x
- Red Hat Data Grid (RHDG)
- 7.x/8.x
Issue
- What is UFC protocol? Is the UFC important?
- Can I crash be caused on its absence?
- When using max-idle outgoing and incoming messages, and those outgoing messages are blocking in Flow Control.
Resolution
What is UFC?
Unicast Flow Control is a simple flow control protocol based on a Content from www.jgroups.org is not included.credit system, where each sender has a number of credits (bytes to send). When the credits have been exhausted, the sender blocks. Each receiver also keeps track of how many credits it has received from a sender. When credits for a sender fall below a threshold, the receiver sends more credits to the sender. Works for both unicast and multicast messages.
UFC is a flow control algorithm protocol, which limits the amount of data that can be sent to another node at a time before it gets an
acknowledgement that the other node has processed it. This stops a node from getting too backed up with network traffic.
UFC vs async replication
With async replication (for async x-site scenarios), there is no implicit rate limiting on the replication traffic, so if another node is not processing the replications fast enough (for any reason, like GC operations), without any type of flow control it can just keep sending more and cause the other node to get backed up. With flow control it will slow the sender down to let the other node catch up.
UFC vs TCP protocols
TCP also has built-in flow control, but in some scenarios it can be a blocker, meaning it can block at that layer also blocks
JGroups control messages such as acknowledgements and retransmissions, while UFC will only block application (Infinispan) messages but not JGroups control messages. The JGroups control messages getting blocked can then lead to excessive retransmissions, making a problem worse (in case of overflow). In a table:
| Protocol | Effect |
|---|---|
| TCP | TCP flow control blocks everything on that TCP connection (and more, i.e.g it blocks JGroups sending thread) |
| UFC | Flow control protocol UFC will only block application (Infinispan) messages but not JGroups control messages. It blocks messages above it on the configuration stack |
Example
UFC blocks messages above it in the stack (below it in the config), which are basically just application messages.
And it does not block the things below it in the stack (above in the config) like FD, UNICAST, NAKACK, GMS, etc.
On the example below:
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp">
<property name="ip_mcast">false</property>
</transport>
<protocol type="org.jgroups.protocols.TCPPING">
<property name="initial_hosts">127.1.1.1[7600]</property>
<property name="port_range">0</property>
</protocol>
<protocol type="MERGE3"/>
<socket-protocol type="FD_SOCK" socket-binding="jgroups-udp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/> <!-- not block -->
<protocol type="pbcast.GMS"/> <!-- not block -->
<protocol type="UFC"/> <!-- UFC here -->
<protocol type="MFC"/> <!-- block -->
<protocol type="FRAG3"/> <!-- block -->
</stack>
The UFC will block messages from MFC and FRAG3 (above the stack);
And it will not block the messages below it in the stack:FD, UNICAST, NAKACK, GMS, etc:
UFC recommendation
UFC is always recommended, but is particularly important with asynchronous replication.
UFC absence can cause crash?
Missing UFC is an issue and async messages overflow jgroups; In async scenario, it just keeps sending even if the other end is not reading fast enough (for any reason), which will back up in UNICAST3 if it keeps sending faster
than they're being read for long enough. So it can cause a crash as consequence of this overflow.
JBoss EAP miss UFC, how to add there?
One can add UFC protocol via jboss-cli command: /subsystem=jgroups/stack=tcp/protocol=UFC:add(add-index=10) (the 10 is the index and depends the layer). See options here.
UFC absent on DG Operator 8.2.x when using Cross-site setting (jgroups-relay.xml)
DG Operator 8.2.x does not have UFC protocol set, which is a mistake, nor it has the ability to use a custom jgroups-relay.xml. Operator versions prior to 8.3.0 generate the server configuration files: infinispan.xml, jgroups-relay.xml, log4j.xml. And by a oversight does not have UFC. Recommendation is to use DG Operator 8.3.x latest, 8.3.7 has the ability to use custom jgroups-relay.xml configuration.
What is UFC NB?
UFC non-blocking - using on Non-blocking flow control combined with MFC_NB and replace their blocking counterparts. Instead of blocking sender threads, the non-blocking flow control protocols queue messages when not enough credits are available to send them, allowing the sender threads to return immediately.
Using Content from www.jgroups.org is not included.MFC_NB/Content from www.jgroups.org is not included.UFC_NB with a transport with multicast - UDP - or Content from www.jgroups.org is not included.TCP_NIO2, which also never block, provides a completely non-blocking stack. More details can be found on JDG jgroups non-blocking protocols UFC_NB/MFC_NB.
Default UFC_NB/MFC_NB values:
JDG 7 UFC_NB for example has the following default properties:
./jdg-7.3.0-src/server/integration/jgroups/src/main/resources/jgroups-defaults.xml
...
<UFC_NB
max_credits="2m"
max_queue_size="2m"
min_threshold="0.40"/>
<MFC_NB
max_credits="2m"
max_queue_size="2m"
min_threshold="0.40"/>
Expiration.maxidle property and UFC control
When using using expiration via max-idle, it has to send outgoing messages to other nodes for every read, and those outgoing messages are blocking in Flow Control But also can block the incoming thread with the request. Until it used up all the threads under load and started deadlocking between multiple nodes with their threads all stuck waiting on each other. Therefore this is a situation that can lead to deadlocks.
How to have a full non-blocking stack for sending messages?
Use a stack composed of MFC_NB, UFC_NB, multicast (UDP) or TCP_NIO2 - provides a completely non-blocking stack.
Comparing TCP vs TCP_NIO2, the latter uses non-blocking I/O (NIO), which eliminates the thread per connection model. Instead, TCP_NIO uses a single selector to poll for incoming messages and dispatches handling of those to a (configurable) thread pool.
Diagnostic Steps
- Verify the jgroups stack for UFC protocol, between GMS and MFC:
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/> <!-- here -->
<protocol type="MFC"/>
- When using the DG Operator 8.2.x, see that UFC is not there on the
jgroups-relay.xml:
oc exec -it $podname – ls /opt/infinispan/server/conf/
admin groups.properties infinispan-xsite.xml infinispan.xml jgroups-relay.xml log4j2.xml users.properties
cat jgroups-relay.xml | grep ufc
<--- absent
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.