Enqueue threshold exceptions triggered for qpid queue with few messages only

Solution Verified - Updated

Environment

  • Red Hat Enterprise MRG Messaging 2.*

Issue

We had a production issue where "Enqueue threshold exception" were getting triggered for a durable queue even though only 10 messages were pending. Total size of the messages was quite small, compared to the size of the journal.

  • Why it happened?
  • How to prevent it?
  • How to recover from it?

Resolution

The journal behavior can be hardly changed (see Root Cause for description), so let describe just workaround how to recover from that issue.

A workaround for these pathological cases is to have a trivial script / application that consumes all the messages in a batch, buffers them locally and enqueues them again - now the enqueues should be placed in the journal in adjacent places. Optionally, one can use transaction for the re-enqueues to ensure the specific messages will be enqueued (and thus written to the journal) in sequence.

The workaround helps once the problem occurs. It can be run as a preventive action to mitigate future problem, but in this case it is highly recommended to run it when no other activity exists on the queue. Otherwise the action might not prevent the problem completely and mainly ordering of messages won't be maintained (this is applicable mainly to LVQ where older message might rewrite newer one with the same LVQ key).

Root Cause

How storage module works:
It tracks in journal all enqueues (full messages including header and body) and also dequeues as separate records. The journal keeps current position (as file id and position within the file) where to write next record. It also tracks number of enqueued (and not-yet-dequeued) messages in each journal file - once this number decrements to zero, the journal file is marked as empty (by default all journal files). If journal reaches end of a journal file during a write operation, it moves to the next empty journal file. If there is no empty journal file, it raises the exception you noticed.

(technical note: see "Important" notice This content is not included.here explaining that in fact only approximatelly 80% of whole journal capacity is really utilized in the manner described above)

So there are ways how to put into a journal just few small messages such that the journal can't enqueue another:

  • send a specific message to it
  • send and receive ordinary messages to it until the very first journal file gets full
    • ordinary message means that consumer receives just these message but does not consume the specific message from the first step
    • examples of specific and ordinary messages are:
      • using priority queue and sending low-priority message followed by high-priority enqueues and dequeues, or
      • using message selectors in Java client such that the selector filters out the specific message, or
      • using concurrent consumer in client acknowledge mode that fetches the specific message but does not acknowledge it

The procedure above leaves one journal file with just one message enqueued but non-useable for further enqueues.

Running the procedure for all journal files leads to the situation you described ("full" journal with just few messages).

Diagnostic Steps

First reproducer (for priority messages, very similar exists also for LVQ):

  • create durable priority queue:
qpid-config add queue Durable --durable --argument=x-qpid-priorities=10
  • following the procedure from Root Cause, fill first six journal files (out from eight files):
for i in 1 2 3 4 5 6; do
  qpid-send --durable=yes -a Durable -m 1 --priority=1 --content-size=200
  qpid-send --durable=yes -a Durable -m 3070 --priority=2 --content-size=200
  qpid-receive -a Durable -m 3070 --print-content=no
  sleep 2
done
  • send some more messages to create enqueue+dequeue records in the seventh journal file such that exactly 80% of the journal has been written:
qpid-send --durable=yes -a Durable -m 1 --priority=1 --content-size=200
qpid-send --durable=yes -a Durable -m 1228 --priority=2 --content-size=200
qpid-receive -a Durable -m 1228 --print-content=no
  • check the queue has just few messages enqueued:
# qpid-stat -q | egrep '(ueue|=|Durable)'
Queues
  queue                                                               dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  ===================================================================================================================================================
  Durable                                                             Y                      7  19.7k  19.6k   1.40k  3.93m    3.93m        0     1
#
  • try to enqueue a message:
# qpid-send --durable=yes -a Durable -m 1 --priority=1 --content-size=200
2012-11-26 15:32:30 warning Broker closed connection: 501, Enqueue capacity threshold exceeded on queue "Durable". (JournalImpl.cpp:597)
qpid-send: framing-error: Enqueue capacity threshold exceeded on queue "Durable". (JournalImpl.cpp:597)
#
  • check journal content:
# /usr/libexec/qpid/store_chk /var/lib/qpidd/rhm/jrnl/0000/Durable
Journal info file /var/lib/qpidd/rhm/jrnl/0000/Durable/JournalData.jinf:
  journal_version = 1
  string = 2012/11/26 15:26:04.586198646
  seconds = 1353939964
  nanoseconds = 586198646
  number_jrnl_files = 8
  JRNL_RMGR_PAGES = 16
  wcache_num_pages = 32
  jrnl_file_size_sblks = 3072
  base_filename = JournalData
  directory = /var/lib/qpidd/rhm/jrnl/0000/Durable/
  id_string = Durable
  JRNL_DBLK_SIZE = 128
  auto_expand = False
  JRNL_RMGR_PAGE_SIZE = 128
  JRNL_SBLK_SIZE = 4
  wcache_pgsize_sblks = 64

Journal files analyzed in directory /var/lib/qpidd/rhm/jrnl/0000/Durable (* = earliest full):
  * JournalData.0000.jdat: owi=False rid=0x1, fro=0x200 ts=Mon Nov 26 14:26:09.999091630 2012
    JournalData.0001.jdat: owi=False rid=0x17fe, fro=0x200 ts=Mon Nov 26 14:26:12.389464236 2012
    JournalData.0002.jdat: owi=False rid=0x2ffb, fro=0x200 ts=Mon Nov 26 14:26:14.848733090 2012
    JournalData.0003.jdat: owi=False rid=0x47f8, fro=0x200 ts=Mon Nov 26 14:26:17.485843806 2012
    JournalData.0004.jdat: owi=False rid=0x5ff5, fro=0x200 ts=Mon Nov 26 14:26:19.946051586 2012
    JournalData.0005.jdat: owi=False rid=0x77f2, fro=0x200 ts=Mon Nov 26 14:26:22.575791213 2012
    JournalData.0006.jdat: owi=False rid=0x8fed, fro=0x200 ts=Mon Nov 26 14:26:22.949026518 2012
    JournalData.0007.jdat: <empty>

Recovering journals . . . . . . .

 === REPORT ====

Records:         19655 non-transactional
                     0 transactional
                 19655 total

Transactions:        0 aborts
                     0 commits
                     0 total

Remaining enqueued records (sorted by rid): 
  fid=0 0x00000200: ["RHMe" v=1 e=0 f=0x0000 rid=0x1] data(305)="\00\00\00e\0b\01\00\15\00\01 ... XXXXXXXXXX"  ["RHMe" rid=0x1] 
  fid=1 0x00000200: ["RHMe" v=1 e=0 f=0x0000 rid=0x17fe] data(305)="\00\00\00e\0b\01\00\15\00\01 ... XXXXXXXXXX"  ["RHMe" rid=0x17fe] 
  fid=2 0x00000200: ["RHMe" v=1 e=0 f=0x0000 rid=0x2ffb] data(305)="\00\00\00e\0b\01\00\15\00\01 ... XXXXXXXXXX"  ["RHMe" rid=0x2ffb] 
  fid=3 0x00000200: ["RHMe" v=1 e=0 f=0x0000 rid=0x47f8] data(305)="\00\00\00e\0b\01\00\15\00\01 ... XXXXXXXXXX"  ["RHMe" rid=0x47f8] 
  fid=4 0x00000200: ["RHMe" v=1 e=0 f=0x0000 rid=0x5ff5] data(305)="\00\00\00e\0b\01\00\15\00\01 ... XXXXXXXXXX"  ["RHMe" rid=0x5ff5] 
  fid=5 0x00000200: ["RHMe" v=1 e=0 f=0x0000 rid=0x77f2] data(305)="\00\00\00e\0b\01\00\15\00\01 ... XXXXXXXXXX"  ["RHMe" rid=0x77f2] 
  fid=6 0x00000400: ["RHMe" v=1 e=0 f=0x0000 rid=0x8fef] data(305)="\00\00\00e\0b\01\00\15\00\01 ... XXXXXXXXXX"  ["RHMe" rid=0x8fef] 
WARNING: Enqueue-Dequeue mismatch, 7 enqueued records remain.

No incomplete transactions found (tmap empty).

19655 enqueues, 39338 journal records processed.

See that each of 7 enqueued (but not dequeued) messages sits in its "own" journal file ("fid" in "Remaining enqueued records").

Second reproducer leaving the journal with a single one small message enqueued but further enqueue attempt fails with "Enqueue capacity threshold exceeded" error:

rm -rf /var/lib/qpidd/*
service qpidd restart

echo "creating queue.."
qpid-config add queue Durable --durable --argument=x-qpid-priorities=10

echo "sending and receiving some priority messages.."
qpid-send --durable=yes -a Durable -m 1 --priority=1 --content-size=200
qpid-send --durable=yes -a Durable -m 3070 --priority=3 --content-size=200
qpid-receive -a Durable -m 3070 --print-content=no

for i in 2 3 4 5 6; do
  qpid-send --durable=yes -a Durable -m 3070 --priority=3 --content-size=200
  qpid-send --durable=yes -a Durable -m 1 --priority=2 --content-size=200
  qpid-receive -a Durable -m 3070 --print-content=no
  sleep 2
done

qpid-send --durable=yes -a Durable -m 1 --priority=2 --content-size=200
qpid-send --durable=yes -a Durable -m 1228 --priority=3 --content-size=200
qpid-receive -a Durable -m $((1228+6)) --print-content=no

echo "Done."
qpid-stat -q
/usr/libexec/qpid/store_chk /var/lib/qpidd/rhm/jrnl/0000/Durable

echo "Sending one message of 200 bytes body"
qpid-send -m 1 -a Durable --durable=yes --content-size=200
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.