Troubleshooting object queues

Use the NexJ System Admin Console to monitor all the object queues in runtime.

In NexJ System Admin Console, navigate to the Object Queues page to see a list of all available object queues along with specific attributes, statistics, and messages currently in the queue. Review the queues to investigate system health or troubleshoot potential issues.

To determine the dispatcher node in a clustered environment, in NexJ Admin Console navigate to the System page and find the selected Dispatcher node check box in the Channel Administration table. This is the same location where process segregation can be applied.

Debug log classes for object queue issues

When troubleshooting object queue problems, it is sometimes useful to turn on additional debug logging to help facilitate the investigation.

The following is a list of key classes, which provide useful debug information for runtime object queue components:

nexj.system.*

Set to DUMP level logging in order to make the DEBUG level logging visible to the classes in this list.

nexj.core.rpc.queueing.ObjectQueueDispatcher

Logs information about the dispatcher activity.

nexj.core.rpc.queueing.ObjectQueueMessage

Logs information about the state of the object queue messages.

nexj.core.rpc.queueing.ObjectQueueSemaphore

Logs information about the object queue concurrency controls.

nexj.core.rpc.queueing.ObjectQueueConnection

Logs all exceptions caught on threads entering the application code from any resource adapter. The messaging engine relies on a resource adapter for thread management, so this can be a useful starting point if you suspect a failure within the engine.

nexj.core.rpc.queueing.ObjectQueueConsumer

Logs information about object queue thread activity.

nexj.core.rpc.queueing.ObjectQueueConsumerPool

Logs information about object queue thread activity.

If the object queue engine determines that a preemptive fail-over of the dispatcher is required, this activity is logged as WARN messages in the log files as follows:

CODE

Object queue dispatcher node "NodeName" is overloaded
Unsuccessful pre-emptive fail over attempt from node "NodeName", cause of failure

SQL queries for messaging engine state assessment

View the state of the messaging engine directly from the database when troubleshooting potential object queue issues.

You can use the following SQL queries to assess the state of the messaging engine.

The following query shows the runtime configuration of object queues.

CODE

SELECT 
         A.ID, A.CLASSCODE, A.NAME, A.CAPTION, A.PRIORITY, A.CONCURRENCY, A.CUSTOMIZED, A.SYSTEM, A.THROTTLECOUNTERID,
         TIMEOUT, SENDENABLED, RECEIVEENABLED, ERRORCOUNT, C.NAME ERRORQUEUE, NODETYPE
FROM 
         OQQUEUE A 
         JOIN OQOBJECTQUEUE B ON B.ID = A.ID
         LEFT JOIN OQQUEUE C ON C.ID = B.ERRORQUEUEID
ORDER BY
         A.NAME

The following query shows the messages that are not in the error queue.

CODE

SELECT 
        A.*, B.NAME QUEUENAME, C.NAME ORIGINALQUEUE, D.NAME RESENDQUEUE 
FROM 
        OQMESSAGE A
        JOIN OQQUEUE B ON B.ID = A.QUEUEID
        JOIN OQQUEUE C ON C.ID = A.ORIGINALQUEUEID
        JOIN OQQUEUE D ON D.ID = A.RESENDQUEUEID
WHERE 
        B.ID NOT IN (SELECT DISTINCT (ERRORQUEUEID) FROM OQOBJECTQUEUE WHERE 
        ERRORQUEUEID IS NOT NULL)
        /*/*UNCOMMENT TO FILTER BY A STATE*/ AND A.STATECODE = /*/*blocked*/'B'*/  
        /*/*dispatched*/'D'*/  /*/*errored*/'E'*/ /*/*new*/'N'*/  /*/*currently 
        processing*/'P'*/  /*/*waiting on a resource*/'W'*/ */
ORDER BY 
        A.NAME, A.DELIVERYTIME ASC

The following query shows the messages that are in the error queue.SELECT 
         A.*, B.NAME ERRORQUEUENAME 
FROM 
         OQMESSAGE A
         JOIN OQQUEUE B ON B.ID = A.QUEUEID
WHERE 
         A.QUEUEID IN (SELECT ERRORQUEUEID FROM OQOBJECTQUEUE)
ORDER BY 
         A.DELIVERYTIME ASC

The following query shows the messages that are ready for processing but are waiting for a node.
SELECT 
         A.*, B.NAME QUEUENAME, C.NAME ORIGINALQUEUE, D.NAME RESENDQUEUE 
FROM 
         OQMESSAGE A
         JOIN OQQUEUE B ON B.ID = A.QUEUEID  
         JOIN OQQUEUE C ON C.ID = A.ORIGINALQUEUEID
         JOIN OQQUEUE D ON D.ID = A.RESENDQUEUEID
         AND A.STATECODE = 'P' AND NODEID = 0X0
ORDER BY 
         A.NAME, A.DELIVERYTIME ASC

The following query shows the currently processing messages by node.SELECT 
        E.NAME NODENAME, A.*, B.NAME QUEUENAME, C.NAME ORIGINALQUEUE, D.NAME RESENDQUEUE 
FROM 
        OQMESSAGE A
        JOIN OQQUEUE B ON B.ID = A.QUEUEID
        JOIN OQQUEUE C ON C.ID = A.ORIGINALQUEUEID
        JOIN OQQUEUE D ON D.ID = A.RESENDQUEUEID
        JOIN NJNODE E ON E.ID = A.NODEID
        AND A.STATECODE = 'P' AND NODEID <> 0X0
ORDER BY 
        A.NODEID, A.NAME, A.DELIVERYTIME ASC

The following query shows the queue depth count by queue, state, and message.

CODE

SELECT 
        A.NAME OBJECTQUEUE, B.STATECODE, B.NAME MESSAGE, COUNT(1) COUNT
FROM 
        OQQUEUE A
        JOIN OQMESSAGE B ON B.QUEUEID = A.ID
WHERE 
        A.ID NOT IN (SELECT DISTINCT (ERRORQUEUEID) FROM OQOBJECTQUEUE WHERE ERRORQUEUEID IS NOT NULL)
GROUP BY 
        A.NAME, B.STATECODE, B.NAME
ORDER BY 
        A.NAME, B.STATECODE, B.NAME

The following query shows the error queue depth count by queue, state, and message.

CODE

SELECT 
        A.NAME ERRORQUEUE, B.STATECODE, B.NAME MESSAGE, COUNT(1) COUNT
FROM 
        OQQUEUE A
        JOIN OQMESSAGE B ON B.QUEUEID = A.ID
WHERE 
        A.ID IN (SELECT DISTINCT (ERRORQUEUEID) FROM OQOBJECTQUEUE WHERE ERRORQUEUEID IS NOT NULL)
GROUP BY 
        A.NAME, B.STATECODE, B.NAME
ORDER BY 
        A.NAME, B.STATECODE, B.NAME

The following query identifies the dispatcher node.

CODE

SELECT
         A.*, B.* 
FROM 
         OQOBJECTQUEUEDISPATCHER A
         JOIN NJNODE B ON B.ID = A.NODEID

The following query triggers a dispatcher restart.

UPDATE OQOBJECTQUEUEDISPATCHER SET NODEID = NULL, ADDRESS = NULL;

The following query restricts object queue traffic to a specific subnet.

UPDATE OQOBJECTQUEUEDISPATCHER SET NETWORK = '0/0' /*ACCEPTS CIDR ADDRESSES*/

Investigating performance issues using semaphore data

Semaphores are used to restrict the number of concurrent messages running on a particular queue. Together with the maxMessageRecievers environment file property, semaphores help balance the workload and prevent system resource overload.

If a message has completed processing, but the pool dump shows that the dispatcher continues to hold a semaphore for the message, there has been a semaphore leak. Leaked semaphores reduce queue concurrency, leading to reduced throughput, and may sometimes cause the queue to be completely stuck. Identifying leaked semaphores may be a necessary step in troubleshooting object queue performance issues.

Semaphore data is included in the resource pool dump, which you can generate using the dumpPoolState command in NexJ System Admin Console. The following is an example resource pool dump with semaphore data.

TEXT

JMSConnectionPool@1059036600(channel="RetryQueue", busyTimeout=10000, idleTimeout=60000, maxSize=16, activeCount=0, idleCount=0)
   
DefaultObjectConsumerPool@1248972060(config=ObjectDispatcherQueue SysObjectQueueDispatcher, idleTimeout=60000, maxSize=16, activeCount=6, idleCount=0)
      Semaphore data:
         Message: #z10265E6A26FD4642ADBDED491C00412169, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
            Resource: #z104A0A5E30539647DD9981A4095640A668
         Message: #z10FB1DB11CAC2F4555B5213E318093EA53, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
            Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
         Message: #z109D2136148DFE4EE38B7F370EA832F8FC, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
            Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
         Message: #z10AC5D3CFF259E414EBD8343DC95CC3958, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
            Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
         Message: #z104E404107C74849A391DFE2CB1D42C1F5, node: "node1", acquire time: Mon Oct 07 09:53:31 EDT 2019
            Resource: #z10BBEB7D996E56490A9F3DD59DE2BA860B
         Resource #z104A0A5E30539647DD9981A4095640A668 saturated since Mon Oct 07 09:53:31 EDT 2019 (10 seconds)
         Resource #z10BBEB7D996E56490A9F3DD59DE2BA860B saturated since Mon Oct 07 09:53:31 EDT 2019 (10 seconds)
      Active:
         ObjectConsumer@1626015073(lastUse=10455ms)
            DiagnosticTraceFactory$DiagnosticTraceHolder@42676591(threadName=NexJ nexj.core.rpc.queueing.ObjectConsumerPool$RepeatedWork$1@40fdff88 #1211048872)
               systemStack:
                  at nexj.core.rpc.pool.DiagnosticTraceFactory.createTrace(DiagnosticTraceFactory.java:83)
                  at nexj.core.util.pool.consumer.GenericConsumerPool.getConsumer(GenericConsumerPool.java:614)
                  at nexj.core.rpc.queueing.ObjectConsumerPool.access$2(ObjectConsumerPool.java:1)
                  at nexj.core.rpc.queueing.ObjectConsumerPool$DispatcherState$4.run(ObjectConsumerPool.java:1936)
                  at nexj.core.rpc.queueing.ObjectConsumerPool$RepeatedWork$1.run(ObjectConsumerPool.java:533)
                  at nexj.core.rpc.pool.ThreadPool$Worker$1$1.run(ThreadPool.java:236)

The resource in the semaphore data often represents the queue ID.

For more information about generating resource pool dumps, see Collecting diagnostic data on dynamic resource pools.