Wednesday, October 10, 2007

A Case of Mysterious SUnkOSError

Or: When INVALID_FILE_POINTER doesn't signal an  error.

I was dealing with an interesting problem today.

A customer reported that our software failed to process some specific file. The logged error was "A call to an OS function failed". Hmmm?

As I'm always logging bunch of redundant information, the problem was quickly tracked to the source. It was triggered from my GpHugeF unit, more specifically from the Win32Check method.

procedure TGpHugeFile.Win32Check(condition: boolean; method: string);
begin
if not condition then begin
hfWindowsError := GetLastError;
if hfWindowsError <> ERROR_SUCCESS then
raise EGpHugeFile.CreateFmtHelp(sFileFailed+
{$IFNDEF D6PLUS}SWin32Error{$ELSE}SOSError{$ENDIF},
[method, FileName, hfWindowsError, SysErrorMessage(hfWindowsError)],
hcHFWindowsError)
else
raise EGpHugeFile.CreateFmtHelp(sFileFailed+
{$IFNDEF D6PLUS}SUnkWin32Error{$ELSE}SUnkOSError{$ENDIF},
[method, FileName], hcHFUnknownWindowsError);
end;
end; { TGpHugeFile.Win32Check }

This is a simple wrapper around Win32 API calls in the GpHugeF unit and somehow it got called with condition set to False while GetLastError returned ERROR_SUCCESS. Hmmm again?

A little more tracing showed that in this case Win32Check was wrapped around GetFileSize call. So how could it happen that GetFileSize returned error when there was no error?

GetFileSize is one of weirder Win32 API functions. It takes one parameter, which is an address of a DWORD and returns a DWORD. Lower 32 bits of the file size are returned in the function result while higher 32 bits are stored in the parameter passed via reference. Delphi declars this API as

function GetFileSize(hFile: THandle; lpFileSizeHigh: Pointer): DWORD; stdcall;

If you only need file sizes up to 4294967294 ($FFFFFFFE) bytes, you can pass nil in gthe lpFileSizeHigh parameter. But to be fully compliant with brave new world, you'll better pass an address of some DWORD variable here.

That's all good and well unless GetFileSize needs to signal a problem. Most of Win32 functions that return some integer number (functions that open files, create synchronisation primitives etc) return special value $FFFFFFFF when an error is encountered. Application can then call GetLastError to get more info about the problem.

Do you see the problem yet? $FFFFFFFF is a valid file size. Heck, even $10FFFFFFFF (where $10 is returned in lpFileSizeHigh^ and $FFFFFFFF as a function result) is a valid file size. If GetFileSize returns $FFFFFFFF, how would you know if there was an error or not?

It turns out that you should check GetLastError to be really sure. From the Microsoft's documentation:


If the function succeeds, the return value is the low-order doubleword of the file size, and, if lpFileSizeHigh is non-NULL, the function puts the high-order doubleword of the file size into the variable pointed to by that parameter.

If the function fails and lpFileSizeHigh is NULL, the return value is INVALID_FILE_SIZE. To get extended error information, call GetLastError. When lpFileSizeHigh is NULL, the results returned for large files are ambiguous, and you will not be able to determine the actual size of the file. It is recommended that you use GetFileSizeEx instead.

If the function fails and lpFileSizeHigh is non-NULL, the return value is INVALID_FILE_SIZE and GetLastError will return a value other than NO_ERROR.


So that was my problem. I incorrectly checked for error code. And it's even worse - when I was writing GpHugeF, I was fully aware of this problem but my error checking code was coded incorrectly :( In my defense I should state that the code in question was written in 1998 when it was really hard to test operation on 4 GB files.


I solved the problem with two simple wrappers that return False on failure (two because SetFilePointer API also has this feature).

function TGpHugeFile.HFGetFileSize(handle: THandle; var size: TLargeInteger): boolean;
begin
size.LowPart := GetFileSize(handle, @size.HighPart);
Result := (size.LowPart <> INVALID_FILE_SIZE) or (GetLastError = NO_ERROR);
end; { TGpHugeFile.HFGetFileSize }

function TGpHugeFile.HFSetFilePointer(handle: THandle; var distanceToMove: TLargeInteger;
moveMethod: DWORD): boolean;
begin
distanceToMove.LowPart := SetFilePointer(handle, longint(distanceToMove.LowPart),
@distanceToMove.HighPart, moveMethod);
Result := (distanceToMove.LowPart <> INVALID_SET_FILE_POINTER) or (GetLastError = NO_ERROR);
end; { TGpHugeFile.HFSetFilePointer }

The only remaining question was how they stumbled upon a file that was exactly 4294967295 bytes long. It turned out that they downloaded this file from another location with ftp. File was originally 4,6 GB long but Vista's excellent ftp client truncated it at $FFFFFFFF bytes. So my bug was only found and fixed because of Microsoft's buggy code. Thanks for that bug, Microsoft!


[New version of GpHugeF will be released soon. Really soon. That's a promise!]

4 comments:

  1. > Thanks for that bug

    Agreed. :-)
    Nice post!

    ReplyDelete
  2. Anonymous16:04

    That's an obvious thing.

    ReplyDelete
  3. Every problem is obvious once it's resolved.

    ReplyDelete
  4. Anonymous08:24

    Great detective work, there is a lot of weird parameter passing in the win32 API, thats why i like abstractions such as your own :)
    Avoid microsoft ftp client (and server) whenever you can, its included in all windows but it probably haven't been updated since win95.

    ReplyDelete