This is another annoying MSMQ issue that consumed most of my time. One of my previous articles, I explained about a MSMQ message rejection issue due to a bug on Microsoft hot-fix and service packs. This time it was more complicated to find out the root cause since there were no errors logged at both client and server sides.
Symptoms:
If you have a WCF with MSMQ binding in your application, in certain instances, Messages get stuck on Outgoing Queues with ‘Unacknowledged’ state due to no reason. The application logs say, it has successfully delivered to the MSMQ. However, those messages were not consumed and also this issue occurred in a random manner.
Analysis of the problem:
Mainly, if the MSMQ Messages block under the Outgoing Queues with ‘Unacknowledged’ state, which means:
The application successfully delivered messages to the Outgoing Queues.
The Outgoing queue delivered those messages to the destination queue.
However, since you use transactional MSMQs, the destination MSMQ Server didn’t send Acknowledgement back to the Sender MSMQ Server.
Therefore, those messages remain under the Outgoing Queue’s with Unacknowledged State at the Senders side.
If you consider the possibilities for the above issue, mainly could be happened due to:
Once the destination MSMQ Server received messages, it sends the Acknowledgement back to the Sender MSMQ Server, however, due to network issues; the Acknowledgement didn’t reach the Sender.
Once the destination MSMQ Server received messages, it sends the Acknowledgement to a wrong IP address.
Once the destination MSMQ Server received messages, it couldn’t send the Acknowledgement due to MSMQ component issues.
In this article, I am considering the above possibility #2, that sending Acknowledgement to a wrong IP address. Mainly, this can be occurred, if you have more than one cloned machines that work as client (MSMQ Sender) and they are belongs to the same master image.
Root Cause:
If you look at the following registry path of the Cloned machines (belongs to the same image), the QMid registry value is same on each those machines.
HKLM\Software\Microsoft\MSMQ\Parameters\Machine Cache
The root cause is, when Client communicates the Server, the Server cache the Sender’s QMid value (which come along with incoming messages) with the Clients IP address. This cache table gets cleared in a regular basis. So the Server sends the Acknowledgement back to the Client by looking at its cache after the 1st time communication.
Therefore, when more than one cloned Client machines belongs the same image send messages to the Server, the Server will send ‘Acknowledgement’ to the same IP address by looking at its cache.
So, one Client machine receives many additional Acknowledgements and it discards them, nevertheless, the other original senders never receive Acknowledgements for its outgoing messages. Hence the messages remain as ‘Unacknowledged’ in Outgoing Queues.
Therefore, it is clear that in certain instances you can see the Outgoing messages successfully transferred (during 1st time communication or when cache got cleared.) however, the outgoing messages get blocked in a regular basis for sure.
Resolution:
Locate “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSMQ\Parameters\Workgroup” registry key and make sure the value is “1” on each Cloned machine
Stop the MSMQ Service
Clear the QMId value completely
Add a SysPrep DWORD (Under HKLM\Software\Microsoft\MSMQ\Parameters) and set it to 1
Start the MSMQ Service
Next Steps:
Make sure to exclude the MSMQ component when you are cloning a machine.
Reference:
http://blogs.msdn.com/b/johnbreakwell/archive/2007/02/06/msmq-prefers-to-be-unique.aspx