EJB/Remoting XA transaction across multiple servers can leave unfinished transactions if JVM/Network crashes in 2PC prepare
Environment
- Red Hat Enterprise Application Platform (EAP) 7.2
- Update 8
- Update 7
Issue
- EJB/Remoting XA transaction across multiple servers can leave unfinished transactions if JVM/Network crashes in 2PC prepare
Issue description: EJB remote server to server sub-transaction may be locked of not being rolled-back in case of of network failure or JVM crash.
Details: When server A communicates with server B with jboss remoting EJB calls (this does not happen for EJB2 IIOP communication) then from transaction manager perspective the EJB remoting behaves as a XAResource. For correct processing of transaction recovery in case of failure the EJB remoting stores a persistent record on the server A (on the side initializing the remote call). If there is some system failure (e.g. intermittent network failure or JVM crash) then rollbacking of the participant on the remote server (server B) could not be finished as the EJB remoting erases the persistent record before the prepare on the server B can be processed.
The system failure has to happen during the first phase of the 2PC protocol when the participants on the server B are prepared - they take the lock and waiting for the final commit command from the transaction manager. But as some other participant may fail to prepare the final outcome is rollback. The transaction manager tries to rollback the participants on the server B. If the system failure occurs at this particular time then transaction processing may suffer of never roll-backing the participants on the server B.
Some of the possible scenarios when this may happen: server A calls to server B, all business activity succeeds and the EJB method finishes. The EJB method worked with 2 resources on server A. There was an insertion to a DB and then call to server B (it behaves as a XAResource). The transaction manager starts with 2PC.
-
the server A calls prepare on server B, all participants/resources prepare on the server B and suddenly a network error occurs. The response about the successful outcome of the prepare call is lost and the server A gets only a network exception. The transaction manager decides to roll-back the whole transaction. The DB is rolled-back and as the network is down the abort on server B fails. The periodic recovery processing tries to roll-back the prepared resources on server B. Because of the issue such an action may never happen.
-
the server A calls prepare on server B, all participants prepare on the server B the success is returned to server A. Then JVM of server A crashes. The server B is left with prepared participants (XA resources). When the server A is restarted it's assumed that all the prepared transaction's participants are finished by rolling-back. This may never happen.
Resolution
Being fixed in Update 9
Manual Recovery:
Since these cannot be resolved automatically they need to be resolved manually. The manual procedure for resolving such in doubt transactions is difficult and would need to be verified for:
- all supported resource managers
- all supported transaction log storage (filesystem, Artemis journal, JDBC database)
The procedure is:
- locate the EAP logs for all outstanding transactions
- for JDBC is is a SQL select call on the relevant table
- for a filesystem store it is a filsystem command
- we would need to write a tool for the Artemis journal
- locate the in doubt branches for all resource managers known to an EAP instance
Now decode the log entries from those two sources and marry them up. Use the node name of the EAP instance to determine which resource manager logs need to be rolled back and which ones need to be committed. Use the tooling provided by the resource manager to perform this manual commit or rollback.
Root Cause
This content is not included.Unfinished transactions in JMS crash recovery scenario using JTA
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.