Diagnosing Microsoft Exchange Server 2007/2010 w3wp high memory/cpu
This post is largely for future me. I'm fed up with (re)writing/(re)discovering some of these queries. However, I also hope it can help other people.
This post was written specifically whilst I was finishing up with an Exchange 2010 installation. However, should work verbatim with 2007 and some of the queries may require a little alteration for 2013. If you're still on 2003. I'm sorry.
So your Exchange server has a w3wp instance with high memory and cpu.
If you're on 2010, ensure that you're on a patch level that covers the issue described in KB2800133.
First step is to find out what the instance is running. Use task manager to show the full command line of the instance. Now check the Windows Event logs. Is there anything interesting? If not move on.
Try recycling that AppPool instance. If that doesn't help long term then we need to start analysing logs.
If you're not shipping your log files to a central location with something like logstash or nxlog, then logparser will be your friend.
If you find that it's the MSExchangePowerShellAppPool, there's probably just a console open somewhere doing a lot of talking, or recently having done a lot of talking. It'll sort itself out shortly.
If it's the MSExchangeSyncAppPool then the odds are likely good that you have a problem device. To figure out which, make sure that IIS is logging access. If it's not, wait a day. Or at least a few hours if you can't.
Now, run the IIS logs through logparser with the following query -
SELECT
TOP 500
TO_TIMESTAMP(TO_DATE(date), TO_TIME(time)) as Time,
cs-username as User,
cs(user-agent) as DeviceID,
TO_INT(EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, '_RpcC'), 0, '_')) As RPCCount,
sc-status as Status,
sc-substatus as SubStatus,
sc-bytes as Bytes,
DIV(sc-bytes, 1024) AS KBytes, time-taken, DIV(time-taken, 1000) as Seconds, cs-uri-query
FROM 'path\to\log\files\*.log'
WHERE
RPCCount >= 1500
AND cs-uri-query LIKE '%Cmd=Sync%'
AND cs-uri-query LIKE '%Ty:Co%'
ORDER BY Bytes DESC
If you find a user frequently popping up to the top, it's likely their device causing the problem. Disable their ActiveSync privileges, recycle the AppPool and see how things fair. Repeat as necessary.
If you find it's a specific user, but you cannot “fix” their device, throttle their device instead, using a throttling policy.
If you find you're not getting anywhere then start looking for unusually high number of requesting devices+users -
SELECT
TOP 500
cs-username AS User, cs(User-Agent) AS DeviceType,
COUNT(*) as Hits
FROM 'path\to\log\files\*.log'
WHERE cs-uri-stem LIKE '%Microsoft-Server-ActiveSync%'
GROUP BY User, DeviceType
ORDER BY Hits, DeviceType DESC
If it's the MSExchangeOWAAppPool then you may have someone attempting to log into an account. It should be locking out if they've found a real account.
SELECT
TOP 500
c-ip AS IP, cs(User-Agent) AS DeviceType,
COUNT(*) as Hits
FROM 'path\to\log\files\*.log'
WHERE cs-uri-stem LIKE '%/OWA%'
GROUP BY IP, DeviceType
ORDER BY Hits, DeviceType DESC
If you're still not getting anywhere, revisit the Windows Event Logs. Check that there's nothing showing up in there that's relevant. If there really isn't anything then start cutting down the problem.
Try to isolate your Exchange's CAS from the internet temporarily. Does it quieten down? If not isolate them/it from the LAN. Does it quieten down? Start looking at the logs in different ways.