The Delphi Geek

Tuesday, January 30, 2007

Microsoft Keyboard Layout Creator 1.4 has been released

If you need it, download it from here.

If you don't know it - MKLC is a great tool that will help you create customized keyboards (with some changes or completely scrambled layout) in minutes. It's a great programming tool also - you can make sure that bot important special characters (i.e. []{}) and national characters are easily accessible.

[Source: Sorting It All Out]

Friday, January 26, 2007

Fast binary searching, part 2: Hard data

This is an update on my Faster than the speed of binary search entry. Maybe it would be good to refresh your memory before proceeding, brave reader ...

Last time I promised some hard data for multi-GB files. Finally, I had some time to do the measurements. I was doing the usual space & time profiling (we are talking about 30 - 60 minute execution times here so each 10% helps a lot) and this was a natural extension of tests.

Sadly (from the viewpoint of a perspective author that introduced the improved binary search algorithm into the application) I have redesigned the most time-consuming part of our app and the biggest list (requiring most lookups) is not used anymore. Therefore, numbers are not so large as I expected three months ago. Data is still interesting, especially as the analysis exposed two anomalous cases.

Let's first take a look at a 'normal' case. The list had between 1.596,266 and 1.803,027 entries (lists are live and in this case entries were removed from the list during the execution) and program executed 171,191 searches. Standard binary search needed 20.8 rounds (bisection steps) to resolve search query on average while my 'bias' version needed only 6.7 rounds. A more detailed distribution of number of rounds is shown in the following graph.

All good and well then, but this was exactly the situation for which the algorithm was designed. Let's take a look at the first anomalous case, where biased search exhibits really bad behaviour.

As you can see, the list had from 0 to 37,300 entries (0 at beginning) and there were 37,301 searches. In fact, each addition to the list was followed by a search request which always found the result somewhere in last 30 entries. Standard binary search needed 13.9 rounds on average while biased version needed 13.5 rounds, which is almost the same (and is in fact slower due to the slower inner loop of the biased version).

The distribution of rounds in biased search is also weird. It must either be a bug in the algorithm or a weird data pattern. It turned out to be the latter. In this case, the list contained data in clusters with big differences between each cluster. Something like this list with 40 entries: 1, 2, 3 ... 20, 1.000,000 1.000,001 .. 1.000,019 (notice the big jump between 20 and 1.000,000). Even worse, the code tries to find smallest number (or a number very close to it) in last cluster (1.000,000 in this case). My biased version first sets lower and upper bound of the search and then calculates the bias (1000000 - 1)/(1000019 - 1), which is in this case 0.999981 or almost 1 [see the code in previous article]. That causes selected 'midpoint' to be equal to the higher bound - 1. Search loop is then repeated, bias is almost the same and upper bound is again reduced by 1. This repeats until upper bound reaches the element we are searching for.

As you've seen, the search degrades to a simple linear search (with complicated calculations behind) in this case. Which is good when we want to find the last element (that happens when last cluster has only one element) or a element very much near the end of the list (when last cluster is small), but degrades quite badly when last cluster is large.

The second anomalous case is actually a special case of the first one, but in this case the cluster was always just one element big. In other words, the code always searched for the largest value in the list. In this case, biased search almost always retrieved the correct number in the first attempt, which is significantly faster than the standard binary search (16.5 rounds on average).

So what can we do to improve the situation? One way is to add a special search mode for my 'weird' data, but I really dislike this idea, as I don't know what kind of data will be passed to my (fairly generic) classes in the future.

Another idea was to limit the maximum bias. In standard binary search, the bias is always 0.5 (midpoint between low and high bound) while in the problematic case the bias is almost 1. If we limit the maximum bias to, say, 0.9, the code never degenerates into linear search.

When such clamping is applied, first anomalous case behaves much better (average number of rounds has been reduced to 8.2).

That's fine, but how would that change affect the other two cases? The 'normal' case behaves worse than before (average number of rounds went from 6.7 to 9.6).

That could be acceptable, but in the second anomalous case the number of rounds increases almost by factor 6.

It's clear that this simple solution indeed improves the worst case but degrades all other searches by a noticeable amount.

I had other ideas on how to improve the search algorithm but at the end I left it as it is now (the 'bias' version). After all, in the worst case biased search still makes less rounds than standard search; in the normal case, biased search works as expected and in the best case, biased search works much much better than the standard search.

I hope that I made my point clear - always, but really always analyse the program behaviour on real data, not on test cases.

The other point that is not so obvious is - Excel 2007 charting capabilities rock :)

Technorati tags: Delphi, algorithms, searching, binary search, profiling

Thursday, January 25, 2007

There is no good place to advertise Delphi blog ...

Craig Stunz kindly noticed my blog (thanks, Craig) and in private discussion (because commenting on teamb.com blogs still doesn't work) we talked about possibilities to tell about your new blog to other Delphi programmers.

Few possibilities appeared - nontech newsgroup, DelphiFeeds aggregator (my ideas), community homepage and 'Get Published' on CDN (his ideas) but they are all flawed - many developers don't read any of those ...

What we really need is a Delphi Wiki, maintained by CodeGear and linked from the welcome page. Plus a RSS for this wiki so that one can follow updates without too much trouble ... (Yes I know that there is already a Delphi Wiki but I don't think it has large audience. And the blog list there is quite short.)

CodeGear, if anybody is listening ... could something like this happen?

Using interfaces for multithreading synchronisation

This seems to be an interesting idea so I'm noting it down for possible future work on the subject.

Technorati tags: Delphi, interfaces, multithreading

Recovering from Windows Update errors

Mostly for my own backup - here are two links to articles on how to make misbehaving Windows Update work again.

Michael Kaplan: Maybe your computer doesn't like you?

Scott Hanselman: Resetting Microsoft Update - Error 0x8024001D

Technorati tags: Windows, Windows Update

Tuesday, January 23, 2007

Viewing composite files in Total Commander

[This is a third and probably last article in the 'embedded file system' series. If you want to refresh your memory or if you're new to my blog, read previous two installments here and here. You may also want to refresh your copy of the GpStructuredStorage unit as it was updated after the last article in the series was produced.]

I'm a great fan of Total Commander. I've been using it for who-knows-how long - I started in the Windows 3.1 times, when it was still called Windows Commander. Even before, in the DOS, I was using Norton Commander. You'd never guess, huh? ;)

One of the thing I like about Total Commander is its ability to work with various packed formats as if they were a part of a file system. You just press Enter and ZIP files opens as if it were a normal folder, and you can use standard Copy, Move, Rename etc commands on files inside the archive. Great stuff.

Another thing I like about Total Commander is its extensibility. The author has documented four types of extension interfaces (to access packed files, foreign file systems, display new file formats and to extract data from files). You have to write a DLL exporting few simple (or not) functions and plunk it into the TC plugins folder. And it "just works" after that.

The third thing I like about Total Commander is the fact that header files for plugins are all available in .h and .pas format, making Delphi implementation a snap :)

File system or packed file?

For quite some time I couldn't decide whether I should implement support for GpStructureStorage composite files as a files system plugin or as a packed file (archive) plugin. At the end, two things prevailed - packed file plugins are simpler to write and they are simpler to use (in the the Total Commander).

I started by browsing the WCX packer plugins documentation (WCX is extension for packer plugin DLLs). As WCX is simple DLL with only stdcall functions, there are no problems implementing it in Delphi. Great. I also found out that I don't have to implement much as all plugin functions are access dynamically and I can just ignore the ones I don't need. Even better!

It turns out that there are three classes of functions - for reading, writing, and packing in memory (as opposed to packing directly to the file). I only had to implement the first batch of functions as I only need read access to composite files. (Yes, writing to composite files would sometime be nice, too, but I haven't had time to implement it. You are welcom to improve my packer in that direction, if you have both time and knowledge.)

The WCX documentation told me that I only have to implement six functions:

function  OpenArchive(var ArchiveData: TOpenArchiveData): THandle; stdcall;
function  ReadHeader(hArcData: THandle; var HeaderData: THeaderData): integer; stdcall;
function  ProcessFile(hArcData: THandle; Operation: integer; DestPath, DestName: PChar): integer; stdcall;
function  CloseArchive(hArcData: THandle): integer; stdcall;
procedure SetChangeVolProc(hArcData: THandle; pChangeVolProc: TChangeVolProc); stdcall;
procedure SetProcessDataProc(hArcData: THandle; pProcessDataProc: TProcessDataProc); stdcall;

I also choose to implement two additional functions:

function  GetPackerCaps: integer; stdcall;
function  CanYouHandleThisFile(FileName: PChar): boolean; stdcall;

GetPackerCaps is really simple - it informs the Total Commander of your plugin capabilities. I specified options PK_CAPS_BY_CONTENT (TC should detect composite files by content, not by extension) and PK_CAPS_SEARCHTEXT (searching inside composite files is allowed).

function GetPackerCaps: integer;
begin
  Result := PK_CAPS_BY_CONTENT + PK_CAPS_SEARCHTEXT;
end; { GetPackerCaps }

The tricky part here is that this function is called only when plugin is installed. If you later change plugin capabilities and recompile, new capabilities won't be detected! You have to remove the plugin from TC and reinstall it so that GetPackerCaps is called again.

If there was a standard extensions for GpStructuredStorage composite files, I could just register it with Total Commander and skip the CanYouHandleThisFile function. As there is no such extension, I had to write it, but it was really simple as GpStructuredStorage already contains a function to check for valid composite file:

function CanYouHandleThisFile(FileName: PChar): boolean; stdcall;
begin
  Result := CreateStructuredStorage.IsStructuredStorage(FileName);
end; { CanYouHandleThisFile }

Now I can just select any composite file and press Enter or Ctrl+PgDn to open it inside TC. Of course, I still need to implement few functions to actually read the composite file ...

Implementing read access

To allow Total Commander to list contents of a composite file, we need to implement OpenArchive, ReadHeader, ProcessFile, and CloseArchive functions. An example from the WCX plugin documentation shows how they are used in TC to access the archive contents:

OpenArchive()          with OpenMode==PK_OM_LIST
repeat
   ReadHeader()
   ProcessFile(...,PK_SKIP,...)
until error returned
CloseArchive()

OpenArchive must return unique handle to the archive - an opaque object that is passed to ReadHeader, ProcessFile, and CloseArchive so that they know what archive they are working on. In theory, we could return IGpStructuredStorage instance converted to a THandle, but there is a problem with this - TC expects the contents of an archive to be a simple list of files and we have directory structure stored inside composite files.

Somehow, we have to walk the directory structure inside the composite file and convert it into a flat list. This could, for example, be done in the OpenArchive call. Still, we have to keep this list and the IGpStructuredStorage interface as, maybe, we will have to extract data from it.

If user copies a file from an archive, TC executes similar loop to the one above:

OpenArchive()          with OpenMode==PK_OM_EXTRACT
repeat
   ReadHeader()
   if WantToExtractThisFile()
      ProcessFile(...,PK_EXTRACT,...)
   else
      ProcessFile(...,PK_SKIP,...)
until error returned
CloseArchive()

Because some archives may not have directory and direct access to files (tar comes to mind), TC reads header of each file and then either skips or extracts it.

Therefore I have created a global list containing TGpStructuredStorageManipulator objects, each holding a reference to a IGpStructuredStorage interface plus an internal state required for the file enumeration. WCX DLL functions are just mappers into global list and manipulator objects:

function OpenArchive(var ArchiveData: TOpenArchiveData): THandle;
begin
  Result := GStructuredStorageList.OpenArchive(ArchiveData.ArcName, ArchiveData.OpenResult);
  ArchiveData.CmtBuf := nil;
  ArchiveData.CmtBufSize := 0;
  ArchiveData.CmtSize := 0;
  ArchiveData.CmtState := 0;
end; { OpenArchive }

function ReadHeader(hArcData: THandle; var HeaderData: THeaderData): integer; stdcall;
var
  manip: TGpStructuredStorageManipulator;
begin
  manip := GStructuredStorageList.LocateHandle(hArcData);
  if not assigned(manip) then
    Result := E_NOT_SUPPORTED
  else if manip.GetNextHeader(HeaderData) then
    Result := 0
  else
    Result := E_END_ARCHIVE;
end; { ReadHeader }

function ProcessFile(hArcData: THandle; Operation: integer; DestPath, DestName: PChar):
  integer; stdcall;
var
  manip  : TGpStructuredStorageManipulator;
  outFile: TFileStream;
begin
  if (Operation = PK_SKIP) or (Operation = PK_TEST) then
    Result := 0
  else begin
    manip := GStructuredStorageList.LocateHandle(hArcData);
    if not assigned(manip) then
      Result := E_NOT_SUPPORTED
    else begin
      try
        outFile := TFileStream.Create(string(DestPath) + string(DestName), fmCreate);
        try
          Result := manip.ExtractCurrentTo(outFile);
        finally FreeAndNil(outFile); end;
      except
        on E: EFCreateError do
          Result := E_ECREATE;
      end;
    end;
  end;
end; { ProcessFile }

function CloseArchive(hArcData: THandle): integer; stdcall;
begin
  Result := GStructuredStorageList.CloseArchive(hArcData);
end; { CloseArchive }

If you'll be browsing the code, you'll notice that I decided against creating a long list of all files and folders inside the composite file in the OpenArchive procedure. Instead, I'm keeping internal enumerator state in the TGpStructuredStorageManipulator object and GetNextHeader method of this class uses this internal state to walk over the composite file. It seemed neater that way.

Oh, and what about SetChangeVolProc and SetProcessDataProc? They just set callback functions that we don't need. Read more on them in the WCX plugin documentation.

WRITING TO COMPOSITE FILES

Adding write and delete access to the plugin should be easy - one only has to implement PackFiles and DeleteFiles. Implementation should be quite straightforward as theay don't use the 'iterate until you find the correct file' approach. Instead of that, TC sends full file name (with folders and everything) to the procedure. But as I had no need for write access, I haven't implemented that part yet. Feel free to do it instead of me (and don't forget to update GetPackerCaps accordingly :) ).

Tuesday, January 09, 2007

No Delphi content here, just woodworking ...

No, I didn't forget that I'm writing a blog. Yes, it was a long time since I wrote anything here. I am really sorry for that. I'm just fixing thousand things all at once in an important part of our software (and I intend to blog about some aspects of that - as soon as I can find a little time to prepare the article).

In the meantime, I'd like to show you a little of what I do in the free time - make stuff from the wood. No real photos today (for the curious, there is a gallery on my web site), just two SketchUp models.

First a table I've been building for my wife's sister and her family. It will be made of ash and is currently in a preparation phase - half of the wood have been planned and cut to size and other half is still in a raw form. Dimensions are 100 x 80 x 35 cm and I think it will weight around 50 kg (yep, lots of solid wood here).

Another project got stuck in the designing phase - my 'customer' (I only work for friends and family, I'm not selling products on the open market) decided that she will continue to live without a coffee table.

It ought to be a simple wood-and-glass table, looking roughly like this (all rough edges are courtesy of my inability to draw nice curves with SketchUp).

If somebody wants a deeper look, I can provide the SketchUp source files.

Technorati tags: woodworking

Friday, December 01, 2006

Installing BDS Update 2 on Vista

Few days ago I had an opportunity to test BDS 2006 on a Vista RTM (I was writing an article on Vista for the Monitor magazine and I did some personal tests when all other work was finished). Installing BDS was not a problem, but Update 2 was stubbornly refusing to find the BDS installation. All I was getting was this error:

Setup cannot continue. This is Update 2 setup for Borland Developer
Studio 2006 Architect Edition. Please cancel this and install
Edition of Update 2

(Sorry, no picture - I forgot to take a screen shot at that time.) There was a double space between 'install' and 'Edition' so obviously something went very wrong with the installer and it didn't know anymore what it's supposed to do.

Soon I found out that I'm not the only one with the problem. I also found a workaround on Borland support site. As I was already very skeptical about the whole UAC thing, I switched it off without a thought.

That's how you do it: Click the Start button, type user in the search box and run User Accounts applet. Click the Turn User Account Control on or off link. Uncheck the User User Account Control (UAC) to help protect your computer. Click OK. Be aware that restart will be required to disable this setting.

After installing BDS 2006 Update 2 you can turn UAC on with the similar procedure - but that's up to you. I left it disabled. Hate the damn thing.

To return to my installation problems - this workaround didn't work for me. I was not getting that error anymore but it got even worse - Update 2 just closed and Vista reported something like "Microsoft Installer found a problem and cannot continue". I tried few times )(stubborn programmer) but the result was always the same. I even rebooted and retried and the I got lucky - Update 2 still closed but Vista displayed an error hint in the taskbar saying that my app got closed due to a DEP protection. (Yes, I was testing Vista on AMD 64 processor, which has hardware DEP support built in. No, I was running Vista 32-bit.)

That's something, but how do we disable DEP? Vista, like Windows XP, only has two options - enable DEP for all system programs (default) or enable DEP for all programs except the one in the exclusion list. (DEP settings are hidden in System Properties, Performance Options, in case you can't find them.)

I searched the MSDN and found out that I have to disable DEP during boot. Usually (since NT times) one would have to edit c:\boot.ini file. Surprise, surprise, this file is gone in Vista! To edit boot configurations, you have to use a command-line tool bcdedit. Documentation on it is not very well organized. The best starting point is probably Boot Options for Driver Testing and Debugging.

It turns out it's easiest to create new configuration with the /copy switch:

C:\Users\gabr>bcdedit /copy {current} /d "DEP disabled" The entry was successfully copied to {69ef4ad0-8123-11db-aa33-0015c53ad3ed}.

I based the new configuration on the current configuration (which is the simplest approach to creating a new configuration). Bcdedit returned GUID of the new configuration. We'll need this GUID to modify the new configuration:

C:\Users\gabr>bcdedit /set {69ef4ad0-8123-11db-aa33-0015c53ad3ed} nx AlwaysOff The operation completed successfully.

The 'nx AlwaysOff' setting disables the execution prevention (see BCDEdit /set for more parameters).

We can verify that configuration really got created:

Windows Boot Manager -------------------- identifier {bootmgr} device partition=C: description Windows Boot Manager locale en-US inherit {globalsettings} default {current} resumeobject {53a24ff1-7f3d-11db-9cd4-a2298b078311} displayorder {current} {69ef4ad0-8123-11db-aa33-0015c53ad3ed} toolsdisplayorder {memdiag} timeout 30 Windows Boot Loader ------------------- identifier {current} device partition=C: path \Windows\system32\winload.exe description Microsoft Windows Vista locale en-US inherit {bootloadersettings} osdevice partition=C: systemroot \Windows resumeobject {53a24ff1-7f3d-11db-9cd4-a2298b078311} nx OptIn Windows Boot Loader ------------------- identifier {69ef4ad0-8123-11db-aa33-0015c53ad3ed} device partition=C: path \Windows\system32\winload.exe description DEP disabled locale en-US inherit {bootloadersettings} osdevice partition=C: systemroot \Windows resumeobject {53a24ff1-7f3d-11db-9cd4-a2298b078311} nx AlwaysOff

Alternatively, you can check the System Properties, Startup and Recovery.

(I apologize for the quality of the picture - that's purely Windows Liver Writer's doing.)

If you made those changes and reboot the system, it will display the usual NT style 'select your configuration' prompt. I did that, booted without a DEP and finally I could install Update 2. I rebooted back with DEP enabled and BDS 2006 works just fine.

Sadly, I had to return the test computer next day so now I'm running Vista only on my Dell notebook. At least BDS works fine here, too.

Wednesday, November 15, 2006

Bulk update

I have updated few of my BSD-licensed units. Enjoy.

GpLists 1.22a

Fixed nasty bug in TGpIntegerObjectList.AddObject and TGpInt64ObjectList.AddObject.
Fixed range errors in TGpInt64[Object]List.
Old articles: Sir! Do you need a list? Cheap, just for you!

GpHugeFile 4.0b

Fixed various compilation problems.
Old articles: GpHugeFile 4.0

GpStructuredStorage 1.11

Added test to raise exception if programmer tries to delete a folder containing open files.
Old articles: Writing an embedded file system, GpStructuredStorage internals

GpSync 1.19a

Raise exception if caller tries to post a message that is larger than the message queue.

GpTextFile 4.0a

Bug fixed [found by Brian D. Ogilvie]: When cfUseLF was set, /CR/ was used as line delimiter in Writeln (/LF/ should be used).

GpTextStream 1.04a

Bug fixed: When cfUseLF was set, /CR/ was used as line delimiter in Writeln (/LF/ should be used).

GpTimezone 1.21b

Modified UTCTo[TZ]LocalTime/[TZ]LocalTimeToUTC functions to automatically execute FixDT on the result.

Friday, October 27, 2006

Faster than the speed of binary search

[Updated 2006-10-28, new text and code at the end.]

Everybody (at least everybody competent enough to be worth mentioning) know how to search in sorted array with the binary search algorithm. But everybody (or almost everybody) thinks that this is the fastest approach to searching. Is everybody (or almost everybody) correct regarding that topic? It depends.

On general, the answer is true - binary search is fastest. But that only holds if we know nothing about the data stored in the array - in other words, if we pretend that it is random (but sorted, of course).

Want an example? Take an array [1..99] of byte. Fill it with numbers in the range 1..100 but leave (random) one out. Use each number only once. Sort it. You'll get an array with first N numbers in the same position as the array index (A[i] = i for all i in 1..N) and the rest of it will be off by one (A[i] = i+1 for all i in N+1..99). Do you think that binary search is fastest possible search in that case? Of course no - to find a random number (say m) you only have to check A[m] and A[m-1]. Two lookups, that's all.

Are there more general guidelines to speeding up a binary search. Yes, it's quite simple - we only have to be able to guess the position of the value we're searching for better than with the "it's somewhere in the middle" approach. In a way, that's exactly what we did in the example above.

This approach is very simple to use if data in the array is distributed uniformly. In that case, we can take the value at low and high end of the interval (not the indices as in the binary search, but values at those indices) and use a linear interpolation to guess a possible position of the value.

Something similar happend to me last week. I was solving some DVB-related problem and built an array of video fragments (video PES, if you know anything about the DVB internals) in a video file (actually I only stored file offset for each fragment). Later, I needed to find (many times) a fragment closest to some file offset (every time a different offset). Of course, I could use a simple binary search, but it was such a nice opportunity to try out a better approach and run some tests ...

For test data I used a small, 40 MB video file. There were 51737 entries in the array and the code executed 3128 searches. With standard bisection, the code did 15.45 rounds through the bisection loop per search (on average). With interpolated search, this number was decreased to 4.13 rounds per search. That's better by a factor 3.7! And in the real life, the file will be some 50x larger - around 2 GB - and difference will be even bigger.

The number of rounds/number of searches distribution for both approaches:

We can see that standard binary search needed 16 rounds for majority of the searches and almost no searches were completed in less than 15 rounds.

On the other hand, interpolated binary search shows almost gaussian distribution from 2 to 6 rounds.

But as the code is more complicated (see below), the total speedup was not that impressive, only somewhere around 1.6 times. Still, it was a personal satisfaction to see the theoretically better code run faster in reality, too :)

function TGpTSPIDIndex.Find(transportStreamOffset: int64; var index: integer): boolean;
var
  L, H, I, C: integer;
begin
  {More work is needed to check for boundary conditions}
  if Count = 0 then begin
    index := 0;
    Result := false;
  end
  else if transportStreamOffset <= Chains[0].TransportStreamOffset then begin
    index := 0;
    Result := (transportStreamOffset = Chains[0].TransportStreamOffset);
  end
  else if transportStreamOffset = Chains[Count-1].TransportStreamOffset then begin
    index := Count-1;
    Result := true;
  end
  else if transportStreamOffset >= Chains[Count-1].TransportStreamOffset then begin
    index := Count;
    Result := false;
  end
  else begin
    Result := false;
    L := 0;
    H := Count - 1;
    while Chains[L].TransportStreamOffset <= Chains[H].TransportStreamOffset do begin
      //instead of
      //I := (L + H) shr 1;
      //we're using
      if L = H then
        I := L
      else begin
        I := Round((transportStreamOffset - Chains[L].TransportStreamOffset) /
             (Chains[H].TransportStreamOffset - Chains[L].TransportStreamOffset) * (H-L)) + L;
        if I < L then
          I := L
        else if I > H then
          I := H;
      end;
      //offsets are int64 numbers
      C := Int64Compare(Chains[I].TransportStreamOffset, transportStreamOffset);
      if C < 0 then
        L := I + 1
      else begin
        H := I - 1;
        if C = 0 then begin
          Result := true;
          L := I;
        end;
      end;
    end;
    index := L;
  end;
end; { TGpTSPIDIndex.Find }

[Updated 2006-10-28]

While thinking on the comment posted by the anonymous I looked at the code from a new view and suddenly I realized that I'm an idiot. The 'while' condition in the bisection loop in my version should not differ from the condition in standard binary search! When I put this right, I could suddenly remove all extra tests at the beginning as they were no longer needed. New code (tested to work the same as the old code and as the standard binary search - at least on my data) is posted below.

A word of warning: all this works only if there are no repetitive elements in the sorted array! If this condition is not satisfied, you'll soon get a 'division by 0' exception in the calculation of the weighted midpoint.

function TGpTSPIDIndex.Find(transportStreamOffset: int64; var index: integer): boolean;
var
  L, H, I, C: integer;
begin
  Result := false;
  L := 0;
  H := Count - 1;
  while L <= H do begin
    //standard binary search
    //I := (L + H) shr 1;
    //weighted version
    if L = H then
      I := L
    else begin
      I := Round((transportStreamOffset - Chains[L].TransportStreamOffset) /
           (Chains[H].TransportStreamOffset - Chains[L].TransportStreamOffset) * (H-L)) + L;
      if I < L then
        I := L
      else if I > H then
        I := H;
    end;
    C := Int64Compare(Chains[I].TransportStreamOffset, transportStreamOffset);
    if C < 0 then
      L := I + 1
    else begin
      H := I - 1;
      if C = 0 then begin
        Result := true;
        L := I;
      end;
    end;
  end;
  index := L;
end; { TGpTSPIDIndex.Find }

Tuesday, October 17, 2006

Delphi to the rescue (and one little Win32 API call)

A new day, a new hardware to support ...

I plugged the board (DVB receiver/transmitter in case you are interested) into the computer, powered it up and got an "interesting" hardware wizard window

This one is localized into Slovenian, but you probably know it by heart - it is a standard Windows wizard for adding new hardware devices. Except that this time it came with a twist - for some reason the window was only 200 pixels high. Did I have to mention that this wizard is not resizable? Probably not. (I hate non-resizable applets!)

So what did I do? Reboot, of course. You can probably guess that it didn't help (or I wouldn't be writing this). HW wizard appeared after the next boot and it was completely the same as in the picture above.

An ordinary user would probably start kicking the computer, blaming it all on the Microsoft and calling his/her best friend that supposedly knows "everything" about computers. Me? I kicked the computer, blamed it all on the Microsoft and fired up Delphi 2006.

First, I started one of my old applications that lists all windows on the system (using EnumWindows API) and located window handle of the HW wizard. I could also do this with Microsofts Spy++ but at that moment I was positively mad at guys in Redmond and didn't want to use their tools.

Next I wrote a small app that called SetWindowPos API with this handle and bigger window boundaries. And, behold, HW wizard was magically enlarged! Sometimes it's really good to know your Win32 API ...

OK, so I lied a little. I didn't wrote a small app that called ... etc. as I already had something lying handly. An old app of mine that searches for a window with specified caption and resizes it. As it is a burden to type "Čarovnik za najdeno novo strojno opremo" into a command line (and I would be running into all sorts of problems with codepages in DOS window that way) I updated it to also accept window handle instead of a windows caption. (Yes, I could type everything into Start, Run without any codepage problems, but this solution was way more satisfying :) )

If you run into something like that in the future, feel free to use my little app.

program SetWindowPlacement;

{$APPTYPE CONSOLE}

uses
  Windows,
  SysUtils;

  procedure SetWindowPos(windowTitle: string; windowLeft, windowTop,
    windowWidth, windowHeight: integer);
  var
    flags  : cardinal;
    hWindow: THandle;
  begin
    if Copy(windowTitle, 1, 1) = '#' then begin
      Delete(windowTitle, 1, 1);
      hWindow := StrToInt(windowTitle);
    end
    else
      hWindow := FindWindow(nil, PChar(windowTitle));
    if hWindow = 0 then
      Writeln('Window >', windowTitle, '< not found!')
    else begin
      flags := SWP_NOACTIVATE OR SWP_NOZORDER;
      if (windowWidth = 0) and (windowHeight = 0) then
        flags := flags OR SWP_NOSIZE;
      Windows.SetWindowPos(hWindow, 0, windowLeft, windowTop, windowWidth, windowHeight,
        flags);
    end;
  end; { SetWindowPos }

  procedure Usage;
  begin
    Writeln('SetWindowPlacement');
    Writeln('(c) 2006 Primoz Gabrijelcic');
    Writeln;
    Writeln('Usage: SetWindowPlacement (<window title>|#<window handle>) <left> <top> [<width> <height>]');
  end; { Usage }

begin
  if (ParamCount < 3) or (ParamCount > 5) then
    Usage
  else try
    if ParamCount = 3 then
      SetWindowPos(ParamStr(1), StrToInt(ParamStr(2)), StrToInt(ParamStr(3)),
        0, 0)
    else
      SetWindowPos(ParamStr(1), StrToInt(ParamStr(2)), StrToInt(ParamStr(3)),
        StrToInt(ParamStr(4)), StrToInt(ParamStr(5)));
  except Usage; end;
end.

Monday, October 16, 2006

My very personal Algorithms and Data Structures

Thanks to the wonders of the printing on demand, I have received my very personal (printed just for me! merry!) copy of Julian M Bucknall's Algorithms and Data Structures. (previous post on that topic)

I won't comment on the content much - this is a great book and every serious Delphi programmer should have it on the shelf. Plus the Julian's writing is fluid and well organized and always to the point.

Printing could be improved a little, though. Font on inner pages is slightly too black for my taste and shadings under the code fragments are waaay too dense. Covers are weird - it seems that the font on the front and back cover was aliased to a different color than used on the cover for the background. It looks unsharp from the distance and halo'd if you look closer.

Still, the book is totally readable and printing is much better than I expected. Binding is fine, too. I have no worries that the book will fall apart in my hands after only a short use.

I hope Julian won't mind if I attach here few photos for the yet undecided. (Yes, the bottom right corner is slightly damaged, but that's courtesy of mail service. The book could be wrapped and packed better, though, so that such things wouldn't happen.)

Wednesday, October 04, 2006

Algorithms and Data Structures

Julian Bucknall has made his excellent book The Tomes of Delphi: Algorithms and Data Structures available on Lulu.

If you don't have it yet, don't wait. Get your copy now! Every Delphi developer should have it!

[I don't know if it shows ;) but I'm a great fan of Julian's writing.]

Saturday, September 30, 2006

There's no firewall like PF

[No Delphi content here, sorry.]

If you are, by any chance, using Open/FreeBSD for firewalling, be sure to read three excellent articles on PF writen by the Daniel Hartmeier. Yes, the maker of PF himself.

Firewall Ruleset Optimization

Testing Your Firewall

Firewall Management

Tuesday, September 26, 2006

GpStructuredStorage internals

[This is a second article in the 'embedded file system' series. If you missed the first part, you can read it here.]

GpStructuredStorage compound file is organized in 1 KB blocks. First block contains header, then the content alternates between a file allocation table (FAT) fragment block and 256 blocks managed by the preceeding FAT fragment. Each block can be represented by a number - header block is block #0, first fat fragment is block #1 and so on.

[header:HEADER:1024]
[fat entry:FATENTRY:1024]
256 x [block:FOLDER/FILE:1024]
[fat entry:FATENTRY:1024]
256 x [block:FOLDER/FILE:1024]
...
[fat entry:FATENTRY:1024]
<=256 x [block:FOLDER/FILE:1024]

Header starts with a signature, which must always be 'GpStructuredStorage file'#13#10#26#0.

HEADER
[signature:32] // PChar
[unused:964]
[storage attribute file:4]
[storage attribute file size:4]
[first FAT block:4]
[first unused block:4]
[root folder:4]
[root folder size:4]
[version:4] // storage system version

Header ends in:

block number and size of the internal file containing global storage attributes
block number of the first FAT block (always 1)
block number of the first unused file block
block number and size of the root folder
storage system version (at the moment $01000200, or 1.0.2.0)

Each FAT fragment contains 256 32-bit numbers, one for each of the 256 following file blocks. This number is either 0 if block is last in the FAT chain, or it is a number of the next file block in the chain. Each block is linked into exactly one chain. It can either belong to a file/folder or to a empty block list. Address of the first block in the empty list is stored in the header ([first unused block] entry).

FATENTRY
256*[next block:4] // next-pointers for this block; 0 = unused

FAT structure defines the limitations of my file format - last block must have number that is less than 4294967294. As block are 1 KB in size, total file size cannot exceed 4294967294 * 1 KB, which is slightly less than 4 TB. Enough for all practical purposes, I think.

Folders are simple - each folder is just a file. It contains file information records and is terminated with two 0 bytes.

FOLDER //potentially split over several blocks
[FILE_INFO]
[FILE_INFO]
...
[FILE_INFO]
[0:2]

File information record is a variable length record containing file name (up to 64 KB), attributes, length, and address of the first file block (additional blocks are reached by following the FAT chain).

FILE_INFO
[file name length:2]
[file name:1..65535]
[file attributes:ATTRIBUTES:4]
[file length:4] // 4 GB per file
[first file block:4]

At the moment, only two attributes are defined. One specifies that file is actually a subfolder, and another designates a special file containing file attributes (for discussion of attributes see the previous article).

ATTRIBUTES
$0001 = attrIsFolder
$0002 = attrIsAttributeFile

That's just about everything that is to tell about the compound file format. Armed with this knowledge, one can easily write a compound file browser/repair utility.

Friday, September 22, 2006

Writing an embedded file system

Once upon a time, Julian M. Bucknall wrote an interesting article for The Delphi Magazine. Well, he wrote more than one and they are all interesting and (that's the best part) he's still writing them, but I particularly liked that one.

The article name is Manage The Damage (it had to be Julian's Jimmy Somerville phase) and it was published in Issue 69 (May 2001). Yes, quite an old stuff - at least in computing terms. Inside, Julian described an implementation of an embedded file system i. e. a file system that rests inside another storage (a file in an ordinary file system, for example). That's such a useful concept that even Microsoft recognizes it (think OLE structured storage).

In short, embedded file system (or compound file, or structured storage) allows you to store many unconnected or loosely connected pieces of information inside one physical file in a structured way. For example, it can be used to store different parts of configuration settings for a large application, or it can be used for data acquisition to store many data packets inside one file, or it can be used as a simple database to store text fragments or code snippets.

Although I liked the idea behind Manage The Damage, I couldn't live with the implementation. It was too limited for my taste (embedded file system was limited to 32 MB) and implementation was a mess of pointers, which made the code almost nonportable to .NET.

And then, as all fairy tales start, I decided to write my own implementation.

Why not use OLE structured storage, you'd ask? Well, I like to know how my data is stored (Have you ever tried to look inside an OLE structure storage file with a hex editor? I did. Not a pretty sight.), I want the implementation to be fast and simple to use from Win32 and potentially .NET. Besides that, it sounded like an interesting challenge.

So how did I fare? Good, if I'm the one to answer. There were no pointers killed while writing the code, total size of the file system is limited only to 4 TB (4 GB for files stored inside the compound file) and file internals are easy to understand (well, at least to me ;) ).

The code was used in some commercial projects. Also, GExperts use it for snippet storage (CodeLibrarian.fs file). It seems to be stable and mostly bug-free and as such I'm releasing it to the public, with the usual string attached.

For the brave, here's the code and test suite: GpStructuredStorage.

If you're still interested, continue reading. I'll show few examples on how the GpStructuredStorage can be used.

A small how-to

Next fragment creates compound file and then reopens it for reading. Note that the compound file is implemented as an interface and doesn't need explicit destructor calls as such.

Instead of a file name, one can also send a TStream or descendant to the .Initialize method.

storage: IGpStructuredStorage;
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmCreate);
// write and read here
storage := CreateStructuredStorage;
storage.Initialize(CStorageFile, fmOpenRead);
// from now on, only reading is allowed

Now that we have the storage interface, we can create a file and then read it.

strFile: TStream
strFile := storage.OpenFile('/folder/file.dat', fmCreate);
try
  // write to strFile
finally FreeAndNil(strFile); end;
strFile := storage.OpenFile('/folder/file.dat', fmOpenRead);
try
  // read from strFile
finally FreeAndNil(strFile); end;

There is no need to create a /folder manually - every file access automagically creates all required folders.

Still, you are free to do it the old way.

storage.CreateFolder('/folder/subfolder');
if not storage.FolderExists('/folder/subfolder') then
  //panic

Of course, there is also a FileExists function.

File enumeration is simplified to the max.

files: TStringList;
files := TStringList.Create;
try
  storage.FileNames('/folder', files);
finally FreeAndNil(files); end;

(To enumerate folders, one would use FolderNames instead of FileNames.)

Additional information on file or folder can be access via FileInfo property:

FileInfo['/full/path/to/file.dat']

Currently, FileInfo only exports file's size (FileInfo[].Size) and file attributes (FileInfo[].Attribute).

Attributes offer you a way to store additional string info for each file and folder. Unlimited number of attributes can be stored and the only limitation is that both attribute name and value must be stringified.

storage.FileInfo['/folder/file.dat'].Attribute['author'] := 'Gp';

At the end, I must mention that it is also possible to Move and Delete files/folders and Compact (defragment) the file system.

If I have convinced you, go and use the stuff. If not, wait for the next episode.

Next in the embedded file system series

Internals of a GpStructuredStorage file
Total Commander plugin to look into GpStructuredStorage files.

Coming soon to a feed near you.

Specialization

A human being
should be able to change a diaper,
plan an invasion,
butcher a hog,
conn a ship,
design a building,
write a sonnet,
balance accounts,
build a wall,
set a bone,
comfort the dying,
take orders,
give orders,
cooperate,
act alone,
solve equations,
analyze a new problem,
pitch manure,
program a computer,
cook a tasty meal,
fight efficiently,
die gallantly.

Specialization is for insects.

- Robert A. Heinlein

UPDATE: Fixed the author (sometimes I'm soooo stupid).

Thursday, September 21, 2006

Sir! Do you need a list? Cheap, just for you!

Actually, it is free and comes with only one string attached.

I have just uploaded new version of my lists unit. At this moment it contains ten useful classes:

TGpIntegerList is a TList-compatible class used to store integers.
TGpInt64List is a TList-compatible class used to store int64 numbers.
TGpIntegerObjectList is a TList-compatible class used to store integers and associated objects.
TGpIntegerObjectList is a TList-compatible class used to store int64 numbers and associated objects.
TGpCountedStringList is a TStringList descendant that has each string item associated with an integer counter (internally stored in the Objects property).
TGpTMethodList is a list of TMethod records.
TGpObjectRingBuffer is a fixed-sized ring buffer of TObject references.
TGpObjectMap is one-dimensional array, indexed by objects and containing objects.
TGpObjectObjectMap is two-dimensional array, indexed by objects and containing objects.
TGpDoublyLinkedList is doubly-linked list of TGpDoublyLinkedListObject descendants.

Abuse it at will. Additions welcome.

Wednesday, September 20, 2006

Failed to find standard type 'TObject'? C'mon, DevCo!

I have just installed BDS Hotifx rolloup - the one that everybody else on the planet is blogging about - and now my BDS claims that I don't know how to write a class!

C'mon, DevCo, you can do better than that!

LATER: OK, third restart of BDS helped. No idea what's going on but I certainly hate it when syntax checker (or whatever they call it) goes amok.

Monday, September 04, 2006

If you'll be visiting Amsterdam this weekend ...

... you can meet me on IBC. Most of the time I'll be on stand 2.111. Look for the guy named Primoz - that's me.