Monday, 29 August 2011

Memory Leak – Trace it with PoolMon


Memory leak! What I understand about memory leak is pieces of memory block that has been allocated, however it is no longer required/references by application/services. Which mean, the application/processes did not release that piece of memory block when it is no longer required.

Below is the 2 common Event ID (2020 and 2021) that you will see when you have memory leak issue.
Event ID: 2020
Source: Srv
Description: The server was unable to allocate from the system paged pool because the pool was empty.

Event ID: 2019
Source: Srv
Description: The server was unable to allocate from the system nonpaged pool because the pool was empty

Additionally, your system will become unresponsive, behave unexpectedly or hung. 

So, now, how do we know which culprit keep “eating” the memory? 

There is a tool from Microsoft called PoolMon, where this tool will capture and display current usage of the paged pool(pp) and non paged pool (np). You can get this tool from Microsoft Support Tools that ship with Operating System CD/DVD.

Below is the screenshot of the option that you can use when running the poolmon:


So, how to use this tool? First of all, you need to enable the tag mode for poolmon. You can refer to:
http://support.microsoft.com/kb/177415 on how to enable the tag mode. Basically, you can enable it by using registry or Gflags.exe utility. 
Note: we do not need to enable the tag mode in Windows 2003 as it is enabled by default.

To run the poolmon, normally I’ll use below switches “–u –n <logfile path>” (poolmon.exe –u –n C:\mypoolmontest.txt), this will display data with sort by bytes, then dump the output into the logfile that you specified.

Screenshot above showed that I’m in Paged pool screen, and my current highest paged pool is process/app with “CM31” tag.

So, what we need to do is, capture these information every 15-30 minutes, do it for several hours or days. Once you compiled those data, do a trace manually as in which tags that always have paged or non paged pool “Bytes” increased and never come down. 

What I’m going to do is, I’ll check the last screenshot of the poolmon, check which is the highest tag in Bytes is, then I trace backward, see if the pattern is keep increasing. (I mean if we trace backward, you should see the Bytes should be lesser and lesser).

On my recent case, the tag MPIO was keep increasingly using Non Paged pool, after drill more information from Internet, MPIO tag is referring to MPIO.sys driver that is meant for multi path I/O. This driver was develop by Microsoft and shipped by vendor on their multipath I/O application. In my case, I installed the EMC powerpath which come with MPIO version 1.21. 

Again, found out that MPIO.sys version 1.21 have known memory leak issue which fixed in updated version MPIO.sys version 1.23.  So, escalated to EMC, and I have to upgrade the EMC PowerPath to latest version fix this issue as the latest EMC PowerPath come with MPIO.sys version 1.23

Summary, above method to detect memory leak is abit too manual for data capture, however, it does able to help you to be Memory Leak Hunter… 

Cheers… Happy troubleshooting….