XATransaction commit inconsistent when slowing down process

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 6.x
    • 7.x

Issue

  • XATransaction transaction in-flight
  • server is slowed down by external impact (high-CPU)
  • XATransaction ends reproducible in state "heuristic"

Resolution

  • EAP 6.4.CP9 comes with JBoss Transaction Manager 4.17.34.Final-redhat-1 and incorporates a fix for This content is not included.BZ#1310603. It allows to enable addtional ResourceOrphanFilters by command line parameter:

      -DJTAEnvironmentBean.xaResourceOrphanFilterClassNames="com.arjuna.ats.internal.jta.recovery.arjunacore.JTATransactionLogXAResourceOrphanFilter com.arjuna.ats.internal.jta.recovery.arjunacore.JTANodeNameXAResourceOrphanFilter com.arjuna.ats.internal.jta.recovery.arjunacore.SubordinateJTAXAResourceOrphanFilter
    
  • EAP 7.0.2 comes with narayana 5.2.17.Final-redhat-1 and it allows to enable addtional ResourceOrphanFilters by command line parameter:

      -DJTAEnvironmentBean.xaResourceOrphanFilterClassNames="com.arjuna.ats.internal.jta.recovery.arjunacore.JTATransactionLogXAResourceOrphanFilter com.arjuna.ats.internal.jta.recovery.arjunacore.JTANodeNameXAResourceOrphanFilter com.arjuna.ats.internal.jta.recovery.arjunacore.SubordinateJTAXAResourceOrphanFilter
    

    or

      -Dorg.jboss.narayana.wildfly.useActionStatusServiceRecoveryFilter.deprecated=true
    

    This will be enabled by default from EAP 7.1.0 / EAP 7.0.7 or later by Content from issues.jboss.org is not included.JBEAP-6039/Content from issues.jboss.org is not included.BEAP-6171.

Root Cause

  • Transaction Manager (TM) and Recovery Manger (RM) are independent processes by design who even could live in seperate JVM.
  • TM and RM both rely on the tx-store filesystem
  • if at least one the processes get massivly slowed down (or comes to a stop) race conditions appear when updating the tx-store takes much longer than reading.

Diagnostic Steps

  • Build a test case involving 3 XA-resources (local queue, remote queue and database) to run:

    • 2phasecommit is initiated
    • topLevelPrepare are done on 3 XAresources (local queue, remote queue and Oracle)
    • ShadowNoFileLockStore is asked to write transaction log
    • [BYTEMAN artificial load - Sleep for 7min]
  • what you'll experience:

    • Recovery thread launch a recovery pass on transaction
    • it asks to different Orphan Filters to vote decision about each XAResource ,and so first to oracle XAResources:
      • JTATransactionLogXAResourceOrphanFilter asks ShadowNoFileLockStore the transactionstatus
      • ShadowNoFileLockStore looks for a transaction log, but it's not written yet on disk
      • JTATransactionLogXAResourceOrphanFilter is abstaining to vote
      • JTANodeNameXAResourceOrphanFilter decides to rollback
      • [BYTEMAN artificial load - Sleep 3min]
    • Other Thread is awaken :
      • ShadowNoFileLockStore writes transaction log
      • doCommit method is called
      • [BYTEMAN artificial load - Sleep 1min]
    • Recovery Thread is awaken :
      • database transaction is rolled back
      • handle orphan is called successively on remote queue XA Resource and local queue XAResource
      • as transaction log exists, JTATransactionLogXAResourceOrphanFilter return LEAVE_ALONE
      • both transactions are not rolled back
    • Other Thread is awaken :
      • topLevelCommit on local queue => SUCCESS
      • topLevelCommit on remote queue => SUCCESS
      • topLevelCommit on oracle => FAILURE (e.g. ORA_24756 : transaction don't exist anymore, as it has been rolled back)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.