Troubleshooting an accumulation of messages on object queues
An accumulation of messages on object queues is characterized by a high number of pending messages, such as 1000 messages, on an object queue that is decreasing slowly.
The number of pending messages may not decrease due to long running messages that prevent other messages from running on the queue. A high value for the Processing rate (ms): mean JMX statistic indicates long processing times.
Typical causes
Accumulation of messages on object queues is typically caused by a number of issues including a recent bulk data load or update, or errors in the database that cause a high number of failures and retries, for example, deadlocks on ACLRebuildObjectQueue. Causes also include long running queues on the object queue dispatcher due to a database that requires tuning. This may result from high index fragmentation on OQMessage or Oracle statistics that are out of date. Additionally, this issue may be caused by long running queues or database slowness, an accumulation of messages on a higher priority queue, or a database connection pool that is exhausted.
Diagnostic checklist
Collect the following information to assist with troubleshooting:
Application logs
Oracle AWR reports on a hourly basis
- Queue depth status retrieved using SQL queries
- The following JMX statistics:
- Processing time (ms): mean
- Dispatch time (ms): mean
- Active Receiver Count: mean, which you can access from . You can use this statistic to identify whether long running messages may be consuming available processing threads.
- Java thread dumps if queues are stuck or if the Dispatch rate (Hz) statistic is 0, which can indicate a stuck dispatcher.
Troubleshooting
When troubleshooting the issue, consider whether a bulk data load or mass update was executed recently that may impact object queues. Review application logs for database related errors, such as deadlocks, or long running queues originating from the queue. Long running queues indicate that each message or transaction may be processing slowly.
If your environment uses an Oracle database, review Oracle AWR reports, preferably hourly snapshots, and look for:
- Databases that are waiting on locks for a significant amount of time
- Queries that are very high in numbers, long running, or that consume a significant amount of CPU or IO resources.
Consider whether there is a long running queue related to the object queue dispatcher. In the OQMessage table, verify whether indexes on OQMessage have became fragmented and require tuning. If your environment uses an Oracle database, also verify that Oracle statistics are up to date.
If a queue is stuck and messages are not being processed, take thread dumps from all application nodes before restarting servers or terminating database transactions.
Monitoring object queues
You can also monitor object queues in several ways. You can monitor object queues from the Object Queues page in NexJ System Admin Console and view JMX statistics for object queues from the Statistics page.
Finally you can perform an SQL query on a SYSTEM database, as shown in the following example:
The following example shows how to get overall message counts and status per queue:
select q.name, m.statecode, count(1) from oqmessage m inner join oqqueue q on
q.id=m.queueid group by m.queueid, m.statecode, q.name order by count(1) desc
The following example shows how to check the age of messages on a specific queue:
select min(m.deliveryTime) as oldestTime from oqqueue q, oqmessage m where m.queueId
= q.id and q.name = 'BatchMail_SendEmail_Queue‘ and m.statecode not in ('B','E','R')
Additionally, deliveryTime
reflects the time in UNIX Epoch format when each message is put on the queue.
Related links
Monitoring object queues
Interpreting object queue statistics from the Statistics page