Troubleshooting object queues
Use the NexJ System Admin Console to monitor all the object queues in runtime.
In NexJ System Admin Console, navigate to the Object Queues page to see a list of all available object queues along with specific attributes, statistics, and messages currently in the queue. Review the queues to investigate system health or troubleshoot potential issues.
To determine the dispatcher node in a clustered environment, in NexJ Admin Console navigate to the System page and find the selected Dispatcher node check box in the Channel Administration table. This is the same location where process segregation can be applied.
Debug log classes for object queue issues
When troubleshooting object queue problems, it is sometimes useful to turn on additional debug logging to help facilitate the investigation.
The following is a list of key classes, which provide useful debug information for runtime object queue components:
nexj.system.*
Set to DUMP level logging in order to make the DEBUG level logging visible to the classes in this list.
nexj.core.rpc.queueing.ObjectQueueDispatcher
Logs information about the dispatcher activity.
nexj.core.rpc.queueing.ObjectQueueMessage
Logs information about the state of the object queue messages.
nexj.core.rpc.queueing.ObjectQueueSemaphore
Logs information about the object queue concurrency controls.
nexj.core.rpc.queueing.ObjectQueueConnection
Logs all exceptions caught on threads entering the application code from any resource adapter. The messaging engine relies on a resource adapter for thread management, so this can be a useful starting point if you suspect a failure within the engine.
nexj.core.rpc.queueing.ObjectQueueConsumer
Logs information about object queue thread activity.
nexj.core.rpc.queueing.ObjectQueueConsumerPool
Logs information about object queue thread activity.
If the object queue engine determines that a preemptive fail-over of the dispatcher is required, this activity is logged as WARN messages in the log files as follows:
Object queue dispatcher node "NodeName" is overloaded
Unsuccessful pre-emptive fail over attempt from node "NodeName", cause of failure
SQL queries for messaging engine state assessment
View the state of the messaging engine directly from the database when troubleshooting potential object queue issues.
You can use the following SQL queries to assess the state of the messaging engine.
The following query shows the runtime configuration of object queues.
SELECT
A.ID, A.CLASSCODE, A.NAME, A.CAPTION, A.PRIORITY, A.CONCURRENCY, A.CUSTOMIZED, A.SYSTEM, A.THROTTLECOUNTERID,
TIMEOUT, SENDENABLED, RECEIVEENABLED, ERRORCOUNT, C.NAME ERRORQUEUE, NODETYPE
FROM
OQQUEUE A
JOIN OQOBJECTQUEUE B ON B.ID = A.ID
LEFT JOIN OQQUEUE C ON C.ID = B.ERRORQUEUEID
ORDER BY
A.NAME
The following query shows the messages that are not in the error queue.
SELECT
A.*, B.NAME QUEUENAME, C.NAME ORIGINALQUEUE, D.NAME RESENDQUEUE
FROM
OQMESSAGE A
JOIN OQQUEUE B ON B.ID = A.QUEUEID
JOIN OQQUEUE C ON C.ID = A.ORIGINALQUEUEID
JOIN OQQUEUE D ON D.ID = A.RESENDQUEUEID
WHERE
B.ID NOT IN (SELECT DISTINCT (ERRORQUEUEID) FROM OQOBJECTQUEUE WHERE
ERRORQUEUEID IS NOT NULL)
/*/*UNCOMMENT TO FILTER BY A STATE*/ AND A.STATECODE = /*/*blocked*/'B'*/
/*/*dispatched*/'D'*/ /*/*errored*/'E'*/ /*/*new*/'N'*/ /*/*currently
processing*/'P'*/ /*/*waiting on a resource*/'W'*/ */
ORDER BY
A.NAME, A.DELIVERYTIME ASC
The following query shows the messages that are in the error queue.SELECT
A.*, B.NAME ERRORQUEUENAME
FROM
OQMESSAGE A
JOIN OQQUEUE B ON B.ID = A.QUEUEID
WHERE
A.QUEUEID IN (SELECT ERRORQUEUEID FROM OQOBJECTQUEUE)
ORDER BY
A.DELIVERYTIME ASC
The following query shows the messages that are ready for processing but are waiting for a node.
SELECT
A.*, B.NAME QUEUENAME, C.NAME ORIGINALQUEUE, D.NAME RESENDQUEUE
FROM
OQMESSAGE A
JOIN OQQUEUE B ON B.ID = A.QUEUEID
JOIN OQQUEUE C ON C.ID = A.ORIGINALQUEUEID
JOIN OQQUEUE D ON D.ID = A.RESENDQUEUEID
AND A.STATECODE = 'P' AND NODEID = 0X0
ORDER BY
A.NAME, A.DELIVERYTIME ASC
The following query shows the currently processing messages by node.SELECT
E.NAME NODENAME, A.*, B.NAME QUEUENAME, C.NAME ORIGINALQUEUE, D.NAME RESENDQUEUE
FROM
OQMESSAGE A
JOIN OQQUEUE B ON B.ID = A.QUEUEID
JOIN OQQUEUE C ON C.ID = A.ORIGINALQUEUEID
JOIN OQQUEUE D ON D.ID = A.RESENDQUEUEID
JOIN NJNODE E ON E.ID = A.NODEID
AND A.STATECODE = 'P' AND NODEID <> 0X0
ORDER BY
A.NODEID, A.NAME, A.DELIVERYTIME ASC
The following query shows the queue depth count by queue, state, and message.
SELECT
A.NAME OBJECTQUEUE, B.STATECODE, B.NAME MESSAGE, COUNT(1) COUNT
FROM
OQQUEUE A
JOIN OQMESSAGE B ON B.QUEUEID = A.ID
WHERE
A.ID NOT IN (SELECT DISTINCT (ERRORQUEUEID) FROM OQOBJECTQUEUE WHERE ERRORQUEUEID IS NOT NULL)
GROUP BY
A.NAME, B.STATECODE, B.NAME
ORDER BY
A.NAME, B.STATECODE, B.NAME
The following query shows the error queue depth count by queue, state, and message.
SELECT
A.NAME ERRORQUEUE, B.STATECODE, B.NAME MESSAGE, COUNT(1) COUNT
FROM
OQQUEUE A
JOIN OQMESSAGE B ON B.QUEUEID = A.ID
WHERE
A.ID IN (SELECT DISTINCT (ERRORQUEUEID) FROM OQOBJECTQUEUE WHERE ERRORQUEUEID IS NOT NULL)
GROUP BY
A.NAME, B.STATECODE, B.NAME
ORDER BY
A.NAME, B.STATECODE, B.NAME
The following query identifies the dispatcher node.
SELECT
A.*, B.*
FROM
OQOBJECTQUEUEDISPATCHER A
JOIN NJNODE B ON B.ID = A.NODEID
The following query triggers a dispatcher restart.
UPDATE OQOBJECTQUEUEDISPATCHER SET NODEID = NULL, ADDRESS = NULL;
The following query restricts object queue traffic to a specific subnet.
UPDATE OQOBJECTQUEUEDISPATCHER SET NETWORK = '0/0' /*ACCEPTS CIDR ADDRESSES*/
Investigating performance issues using semaphore data
Semaphores are used to restrict the number of concurrent messages running on a particular queue. Together with the maxMessageRecievers environment file property, semaphores help balance the workload and prevent system resource overload.
If a message has completed processing, but the pool dump shows that the dispatcher continues to hold a semaphore for the message, there has been a semaphore leak. Leaked semaphores reduce queue concurrency, leading to reduced throughput, and may sometimes cause the queue to be completely stuck. Identifying leaked semaphores may be a necessary step in troubleshooting object queue performance issues.
Semaphore data is included in the resource pool dump, which you can generate using the dumpPoolState command in NexJ System Admin Console. The following is an example resource pool dump with semaphore data.
JMSConnectionPool@1059036600(channel="RetryQueue", busyTimeout=10000, idleTimeout=60000, maxSize=16, activeCount=0, idleCount=0)
DefaultObjectConsumerPool@1248972060(config=ObjectDispatcherQueue SysObjectQueueDispatcher, idleTimeout=60000, maxSize=16, activeCount=6, idleCount=0)
Semaphore data:
Message: #z10265E6A26FD4642ADBDED491C00412169, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
Resource: #z104A0A5E30539647DD9981A4095640A668
Message: #z10FB1DB11CAC2F4555B5213E318093EA53, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
Message: #z109D2136148DFE4EE38B7F370EA832F8FC, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
Message: #z10AC5D3CFF259E414EBD8343DC95CC3958, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
Message: #z104E404107C74849A391DFE2CB1D42C1F5, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
Resource #z104A0A5E30539647DD9981A4095640A668 saturated since Mon Oct 07 09:53:31 EDT 2019 (10 seconds)
Resource #z10BBEB7D996E56490A9F3DD59DE2BA860B saturated since Mon Oct 07 09:53:31 EDT 2019 (10 seconds)
Active:
ObjectConsumer@1626015073(lastUse=10455ms)
DiagnosticTraceFactory$DiagnosticTraceHolder@42676591(threadName=NexJ nexj.core.rpc.queueing.ObjectConsumerPool$RepeatedWork$1@40fdff88 #1211048872)
systemStack:
at nexj.core.rpc.pool.DiagnosticTraceFactory.createTrace(DiagnosticTraceFactory.java:83)
at nexj.core.util.pool.consumer.GenericConsumerPool.getConsumer(GenericConsumerPool.java:614)
at nexj.core.rpc.queueing.ObjectConsumerPool.access$2(ObjectConsumerPool.java:1)
at nexj.core.rpc.queueing.ObjectConsumerPool$DispatcherState$4.run(ObjectConsumerPool.java:1936)
at nexj.core.rpc.queueing.ObjectConsumerPool$RepeatedWork$1.run(ObjectConsumerPool.java:533)
at nexj.core.rpc.pool.ThreadPool$Worker$1$1.run(ThreadPool.java:236)
The resource in the semaphore data often represents the queue ID.
For more information about generating resource pool dumps, see Collecting diagnostic data on dynamic resource pools.