Monday, November 02, 2009

Read prefetch in GpHugeFile

There is only one big change in the latest GpHugeFile – read prefetch. Most people won’t need it at all and other will only need it occasionally, but for some people, sometimes, it will be a life saver.

The prefetch option is only useful when you read a file mostly sequentially from a relatively slow media. Useless? You never did that before? Did you ever played a video file from the network server or from the YouTube? Well, there you are!

Playing video files (especially HD) over network is not a trivial task. In some occasions (namely, slow networks or high bitrate files) the network speed is only slightly above the minimum required for the seamless video playout. Even more – the network speed is not constant because you share it with other users and at some times it may not be high enough to play the video without stuttering.

To solve this problem, video players use prefetch (or read-ahead) – they will read more data than required and use this buffer when the network slows down. Better said – video will always play from this buffer but the buffer size will vary depending on current network speed.

So how’s this typically done? One way is with a background thread that sequentially reads through the file and buffers the data and another is with asynchronous read operations. This very powerful approach is part of the standard ReadFileEx Win32 API and is relatively easy to use – you just start the read operation and some time later the system will notify you that the data is available. There are some problems, though, the biggest of them the requirement that your reading thread must be in a special alertable sleep state for this notification to occur.

The third option is not to use threads or asynch file ops, but to pass hfoPrefetch and hfoBuffered flags to the ResetEx. In the same call you can also set the number of prefetched buffers. As for the buffer size – it is also settable with a ResetEx parameters and will be rounded up to the next multiplier of the system page size (async file io requirement) or it will be set to 64 KB if you leave the parameter at 0.

When you se hfoPrefetch, TGpHugeFile will create background thread and this thread will issue asynchronous file io calls. Prefetched data is stored in a cache which is shared between the worker thread and the owner. Unfortunately for some, this option is only available in Delphi 2007 and newer because the worker object is implemented using OmniThreadLibrary.

Maybe you’ll wonder why the thread is not issuing normal synchronous reads? For two reasons – I didn’t want the thread to block reading data when owner executes a Seek (file repositioning will immediately tell the prefetcher that it should start reading from a different file offset) and I wanted to issue multiple read commands at the same time (namely 2).

Enough talk – if you want to learn more, look at the code. I’ll only give you the simplest possible demo:

program Project12;



hf : TGpHugeFile;
buf: array [1..65536] of byte;
bytesRead: cardinal;
bytesTotal: int64;

hf := TGpHugeFile.Create(ParamStr(1));
if hf.ResetEx(1, 0, 0, 0, [hfoBuffered, hfoPrefetch]) <> hfOK then
else begin
bytesTotal := 0;
hf.BlockRead(buf, SizeOf(buf), bytesRead);
Inc(bytesTotal, bytesRead);
until bytesRead = 0;
Writeln('Total bytes read: ', bytesTotal);
finally FreeAndNil(hf); end;


  1. Anonymous19:50

    I entered the above code using OmniThreadLibrary-3.05 and the latest GpHugeF.pas from GpDelphiUnits-master. I got an error Exception Class EAssetionFailed with message Assertion failure GPHugeF.pas line 2603 the actual line of code in GPHugeF.pas is Assert((trans = hfBufferSize) or isEof);

    Any ideas what went wrong the file size was 1.5GB.


    1. Absolutely no idea until you show a small program that reproduces the problem.