Monday, June 18, 2012

Hacking FastMM for Debugging Purposes

Last week I started solving weird ‘out of memory’ crash in one of our services. First reports of problems came in from a client but with some experimenting I managed to repeat it on the test configuration. The program works nicely for few hours and then something weird happens and it starts using the memory, few megs per minute, until it crashes.

Repeating the problem on the test configuration usually means half the victory won but this time it wasn’t so :( I was fighting the most terrible of memory leaks – a live leak. Memory was allocated, stored away in some management structure and properly disposed off when the program terminated. FastMM have shown no problems at all. Such problems are always hard to find.

I started randomly checking lists and queues that could grow out of control but no luck. I couldn’t find the culprit. Then I remembered that FastMM comes with a nice usage tracker demo. Maybe I could use that?

Few hours later I found out the size of memory allocations that ran out of control – they were all in the 1508-bytes slot. Still, that was not enough information to proceed. I wanted to see the part of program that makes those allocations, but how to find it?

image

Well, are we programmers or not? FastMM itself could tell me this! I started coding (there’s no better way to fix a code than to add more – possibly also buggy - code ;) ) and some time later (entirely too much time, but I had to find out how some parts of FastMM are made) I had my hacking done. To cut the long story short, here are the modifications I’ve made.

1. In FastMMUsageTracker.pas, I’ve added a OnFixedCellClick event handler to the sgBlockStatistics grid. I also had to enable goFixedColClick option. By clicking on a row grid I can now enable breakpoints for the selected memory block size.

procedure TfFastMMUsageTracker.sgBlockStatisticsFixedCellClick(
Sender: TObject; ACol, ARow: Integer);
begin
if ARow > 0 then
ToggleBreakpoint(ARow-1, 10);
end;

ToggleBreakpoint is a method I have introduced to the FastMM4.pas source (more on that below). ARow-1 is the small memory block index (I found that by examining the sgBlockStatistics loader – UpdateFastMM4Data) and 10 is number of following allocations from that block size that I want to be notified about.

2. In FastMM4.pas I had to edit the TSmallBlockType structure. Before the Reserved2 filler I have added my breakpoint counter. TSmallBlockType is a descriptor for one bucket (FastMM memory operations are granular, each memory request size is rounded up to the next bucket size) and when the code in ToggleBreakpoint changes BreakpointOnAllocate to something greater than zero, memory allocations from that specific bucket will break into the debugger.

{$else}
Reserved1: Pointer;
{$endif}
BreakpointOnAllocate: cardinal;
{$ifdef 64Bit}
{Pad to 64 bytes for 64-bit}
Reserved2: cardinal;
{$endif}
end;

3. Then I have added the ToggleBreakpoint method.

procedure ToggleBreakpoint(SmallBlockIndex, RepeatCount: integer);
var
LPSmallBlockType: PSmallBlockType;
begin
LPSmallBlockType := PSmallBlockType(AllocSize2SmallBlockTypeIndX4[
(SmallBlockTypes[SmallBlockIndex].BlockSize - 1) div SmallBlockGranularity]
* (SizeOf(TSmallBlockType) div 4)
+ UIntPtr(@SmallBlockTypes));
if LPSmallBlockType.BreakpointOnAllocate = 0 then
LPSmallBlockType.BreakpointOnAllocate := RepeatCount
else
LPSmallBlockType.BreakpointOnAllocate := 0;
end;

This code uses SmallBlockIndex to access the proper TSmallBlockType entry (I copied this complicated calculation from the FastGetMem method) and toggles the BreakpointOnAllocate count.

4. At the end, I have modified FastGetMem and FastReallocMem to break into the debugger if BreakpointOnAllocate is greater than zero, and to decrement that value. I could also modify FastFreeMem but as the memory I’m interested in is never freed that wouldn’t help much. I have added this code just after the LPSmallBlockType calculation:

if LPSmallBlockType.BreakpointOnAllocate > 0 then begin
LPSmallBlockType.BreakpointOnAllocate :=
LP
SmallBlockType.BreakpointOnAllocate - 1;
asm int 3; end;
end;

As I am using FullDebugMode for testing, I only had to modify the Pascal version of both methods, which helped a lot.

Now I can click on a bucket size in the FastMMUsageTracker and during the next ten allocations/reallocations of that memory size the debugger will pop up.

Using this tool I have quickly found out that most of those allocations are coming from the OverbyteIcsWSocket unit, from the TCustomWSocket.PutDataInSendBuffer method which is called whenever you write something to a socket. [Actual memory allocation occurred in a method called from the FBufHandler.Write.]

procedure TCustomWSocket.PutDataInSendBuffer(
Data : TWSocketData;
Len : Integer);
begin
if (Len <= 0) or (Data = nil) then
Exit;
FBufHandler.Lock;
try
FBufHandler.Write(Data, Len);
Inc(FBufferedByteCount, Len);
bAllSent := FALSE;
finally
FBufHandler.UnLock;
end;
end;

I checked the FBufferedByteCount and indeed it was some very large number. It looks like a socket is put into half-closed state and I’m still sending data to it while nobody is reading. Now I have to solve that mystery – but at least now I know where the problem lies.

10 comments:

  1. Excellent!
    I'll keep that handy.
    Can't say enough good about FastMM.

    ReplyDelete
  2. That is an excellent technique. I'm impressed.

    I can't decide whether I want a chance to try it or not...

    ReplyDelete
  3. Note that Pierre is more than willing to accept patches. That's how my FullDebugModeCallBacks addition got into FastMM too (:

    It also allows you to set breakpoints, by routing callbacks from some central FastMM logic to your code.

    I still need to write a decent blog entry on this, need to find some time for that.

    ReplyDelete
    Replies
    1. I know, there are also my suggestions and fixes present in the FastMM4.

      In this case, however, I have not decided yet whether this is an idea that could be used in a general debugging case and how to implement it in a cleaner manner.

      Delete
  4. Nice. Want to try it soon

    ReplyDelete
  5. why FastMM4 link is linking to itdevcon?

    ReplyDelete
    Replies
    1. I have absolutely no idea how I managed to do that. Fixed now, thanks!

      Delete