Tuesday, January 30, 2007

Microsoft Keyboard Layout Creator 1.4 has been released

If you need it, download it from here.

If you don't know it - MKLC is a great tool that will help you create customized keyboards (with some changes or completely scrambled layout) in minutes. It's a great programming tool also - you can make sure that bot important special characters (i.e. []{}) and national characters are easily accessible.

[Source: Sorting It All Out]

Friday, January 26, 2007

Fast binary searching, part 2: Hard data

This is an update on my Faster than the speed of binary search entry. Maybe it would be good to refresh your memory before proceeding, brave reader ...

Last time I promised some hard data for multi-GB files. Finally, I had some time to do the measurements. I was doing the usual space & time profiling (we are talking about 30 - 60 minute execution times here so each 10% helps a lot) and this was a natural extension of tests.

Sadly (from the viewpoint of a perspective author that introduced the improved binary search algorithm into the application) I have redesigned the most time-consuming part of our app and the biggest list (requiring most lookups) is not used anymore. Therefore, numbers are not so large as I expected three months ago. Data is still interesting, especially as the analysis exposed two anomalous cases.

Let's first take a look at a 'normal' case. The list had between 1.596,266 and 1.803,027 entries (lists are live and in this case entries were removed from the list during the execution) and program executed 171,191 searches. Standard binary search needed 20.8 rounds (bisection steps) to resolve search query on average while my 'bias' version needed only 6.7 rounds. A more detailed distribution of number of rounds is shown in the following graph.

standard vs. biased search

All good and well then, but this was exactly the situation for which the algorithm was designed. Let's take a look at the first anomalous case, where biased search exhibits really bad behaviour.

anomalus behaviour, bad biased search

As you can see, the list had from 0 to 37,300 entries (0 at beginning) and there were 37,301 searches. In fact, each addition to the list was followed by a search request which always found the result somewhere in last 30 entries. Standard binary search needed 13.9 rounds on average while biased version needed 13.5 rounds, which is almost the same (and is in fact slower due to the slower inner loop of the biased version).

The distribution of rounds in biased search is also weird. It must either be a bug in the algorithm or a weird data pattern. It turned out to be the latter. In this case, the list contained data in clusters with big differences between each cluster. Something like this list with 40 entries: 1, 2, 3 ... 20, 1.000,000 1.000,001 .. 1.000,019 (notice the big jump between 20 and 1.000,000). Even worse, the code tries to find smallest number (or a number very close to it) in last cluster (1.000,000 in this case). My biased version first sets lower and upper bound of the search and then calculates the bias (1000000 - 1)/(1000019 - 1), which is in this case 0.999981 or almost 1 [see the code in previous article]. That causes selected 'midpoint' to be equal to the higher bound - 1. Search loop is then repeated, bias is almost the same and upper bound is again reduced by 1. This repeats until upper bound reaches the element we are searching for.

As you've seen, the search degrades to a simple linear search (with complicated calculations behind) in this case. Which is good when we want to find the last element (that happens when last cluster has only one element) or a element very much near the end of the list (when last cluster is small), but degrades quite badly when last cluster is large.

The second anomalous case is actually a special case of the first one, but in this case the cluster was always just one element big. In other words, the code always searched for the largest value in the list. In this case, biased search almost always retrieved the correct number in the first attempt, which is significantly faster than the standard binary search (16.5 rounds on average).

anomalus behaviour, good biased search

So what can we do to improve the situation? One way is to add a special search mode for my 'weird' data, but I really dislike this idea, as I don't know what kind of data will be passed to my (fairly generic) classes in the future.

Another idea was to limit the maximum bias. In standard binary search, the bias is always 0.5 (midpoint between low and high bound) while in the problematic case the bias is almost 1. If we limit the maximum bias to, say, 0.9, the code never degenerates into linear search.

When such clamping is applied, first anomalous case behaves much better (average number of rounds has been reduced to 8.2).

anomalus behaviour, bad biased search, clipping

That's fine, but how would that change affect the other two cases? The 'normal' case behaves worse than before (average number of rounds went from 6.7 to 9.6).

standard vs. biased vs. clipped biased search

That could be acceptable, but in the second anomalous case the number of rounds increases almost by factor 6.

anomalus behaviour, good biased search, clipped

It's clear that this simple solution indeed improves the worst case but degrades all other searches by a noticeable amount.

I had other ideas on how to improve the search algorithm but at the end I left it as it is now (the 'bias' version). After all, in the worst case biased search still makes less rounds than standard search; in the normal case, biased search works as expected and in the best case, biased search works much much better than the standard search.

I hope that I made my point clear - always, but really always analyse the program behaviour on real data, not on test cases.

The other point that is not so obvious is - Excel 2007 charting capabilities rock :)

 

Thursday, January 25, 2007

There is no good place to advertise Delphi blog ...

Craig Stunz kindly noticed my blog (thanks, Craig) and in private discussion (because commenting on teamb.com blogs still doesn't work) we talked about possibilities to tell about your new blog to other Delphi programmers.

Few possibilities appeared - nontech newsgroup, DelphiFeeds aggregator (my ideas), community homepage and 'Get Published' on CDN (his ideas) but they are all flawed - many developers don't read any of those ...

What we really need is a Delphi Wiki, maintained by CodeGear and linked from the welcome page. Plus a RSS for this wiki so that one can follow updates without too much trouble ... (Yes I know that there is already a Delphi Wiki but I don't think it has large audience. And the blog list there is quite short.)

CodeGear, if anybody is listening ... could something like this happen?

Using interfaces for multithreading synchronisation

This seems to be an interesting idea so I'm noting it down for possible future work on the subject.

 

Technorati tags: , ,

Recovering from Windows Update errors

Mostly for my own backup - here are two links to articles on how to make misbehaving Windows Update work again.

Michael Kaplan: Maybe your computer doesn't like you?

Scott Hanselman: Resetting Microsoft Update - Error 0x8024001D

 

Technorati tags: ,

Tuesday, January 23, 2007

Viewing composite files in Total Commander

[This is a third and probably last article in the 'embedded file system' series. If you want to refresh your memory or if you're new to my blog, read previous two installments here and here. You may also want to refresh your copy of the GpStructuredStorage unit as it was updated after the last article in the series was produced.]

I'm a great fan of Total Commander. I've been using it for who-knows-how long - I started in the Windows 3.1 times, when it was still called Windows Commander. Even before, in the DOS, I was using Norton Commander. You'd never guess, huh? ;)

One of the thing I like about Total Commander is its ability to work with various packed formats as if they were a part of a file system. You just press Enter and ZIP files opens as if it were a normal folder, and you can use standard Copy, Move, Rename etc commands on files inside the archive. Great stuff.

Another thing I like about Total Commander is its extensibility. The author has documented four types of extension interfaces (to access packed files, foreign file systems, display new file formats and to extract data from files). You have to write a DLL exporting few simple (or not) functions and plunk it into the TC plugins folder. And it "just works" after that.

The third thing I like about Total Commander is the fact that header files for plugins are all available in .h and .pas format, making Delphi implementation a snap :)

File system or packed file?

For quite some time I couldn't decide whether I should implement support for GpStructureStorage composite files as a files system plugin or as a packed file (archive) plugin. At the end, two things prevailed - packed file plugins are simpler to write and they are simpler to use (in the the Total Commander).

I started by browsing the WCX packer plugins documentation (WCX is extension for packer plugin DLLs). As WCX is simple DLL with only stdcall functions, there are no problems implementing it in Delphi. Great. I also found out that I don't have to implement much as all plugin functions are access dynamically and I can just ignore the ones I don't need. Even better!

It turns out that there are three classes of functions - for reading, writing, and packing in memory (as opposed to packing directly to the file). I only had to implement the first batch of functions as I only need read access to composite files. (Yes, writing to composite files would sometime be nice, too, but I haven't had time to implement it. You are welcom to improve my packer in that direction, if you have both time and knowledge.)

The WCX documentation told me that I only have to implement six functions:

function  OpenArchive(var ArchiveData: TOpenArchiveData): THandle; stdcall;
function ReadHeader(hArcData: THandle; var HeaderData: THeaderData): integer; stdcall;
function ProcessFile(hArcData: THandle; Operation: integer; DestPath, DestName: PChar): integer; stdcall;
function CloseArchive(hArcData: THandle): integer; stdcall;
procedure SetChangeVolProc(hArcData: THandle; pChangeVolProc: TChangeVolProc); stdcall;
procedure SetProcessDataProc(hArcData: THandle; pProcessDataProc: TProcessDataProc); stdcall;

I also choose to implement two additional functions:

function  GetPackerCaps: integer; stdcall;
function CanYouHandleThisFile(FileName: PChar): boolean; stdcall;

GetPackerCaps is really simple - it informs the Total Commander of your plugin capabilities. I specified options PK_CAPS_BY_CONTENT (TC should detect composite files by content, not by extension) and PK_CAPS_SEARCHTEXT (searching inside composite files is allowed).

function GetPackerCaps: integer;
begin
Result := PK_CAPS_BY_CONTENT + PK_CAPS_SEARCHTEXT;
end; { GetPackerCaps }

The tricky part here is that this function is called only when plugin is installed. If you later change plugin capabilities and recompile, new capabilities won't be detected! You have to remove the plugin from TC and reinstall it so that GetPackerCaps is called again.


If there was a standard extensions for GpStructuredStorage composite files, I could just register it with Total Commander and skip the CanYouHandleThisFile function. As there is no such extension, I had to write it, but it was really simple as GpStructuredStorage already contains a function to check for valid composite file:

function CanYouHandleThisFile(FileName: PChar): boolean; stdcall;
begin
Result := CreateStructuredStorage.IsStructuredStorage(FileName);
end; { CanYouHandleThisFile }

Now I can just select any composite file and press Enter or Ctrl+PgDn to open it inside TC. Of course, I still need to implement few functions to actually read the composite file ...


Implementing read access


To allow Total Commander to list contents of a composite file, we need to implement OpenArchive, ReadHeader, ProcessFile, and CloseArchive functions. An example from the WCX plugin documentation shows how they are used in TC to access the archive contents:



OpenArchive()          with OpenMode==PK_OM_LIST
repeat
   ReadHeader()
   ProcessFile(...,PK_SKIP,...)
until error returned
CloseArchive()


OpenArchive must return unique handle to the archive - an opaque object that is passed to ReadHeader, ProcessFile, and CloseArchive so that they know what archive they are working on. In theory, we could return IGpStructuredStorage instance converted to a THandle, but there is a problem with this - TC expects the contents of an archive to be a simple list of files and we have directory structure stored inside composite files.


Somehow, we have to walk the directory structure inside the composite file and convert it into a flat list. This could, for example, be done in the OpenArchive call. Still, we have to keep this list and the IGpStructuredStorage interface as, maybe, we will have to extract data from it.


If user copies a file from an archive, TC executes similar loop to the one above:



OpenArchive()          with OpenMode==PK_OM_EXTRACT
repeat
   ReadHeader()
   if WantToExtractThisFile()
      ProcessFile(...,PK_EXTRACT,...)
   else
      ProcessFile(...,PK_SKIP,...)
until error returned
CloseArchive()


Because some archives may not have directory and direct access to files (tar comes to mind), TC reads header of each file and then either skips or extracts it.


Therefore I have created a global list containing TGpStructuredStorageManipulator objects, each holding a reference to a IGpStructuredStorage interface plus an internal state required for the file enumeration. WCX DLL functions are just mappers into global list and manipulator objects:

function OpenArchive(var ArchiveData: TOpenArchiveData): THandle;
begin
Result := GStructuredStorageList.OpenArchive(ArchiveData.ArcName, ArchiveData.OpenResult);
ArchiveData.CmtBuf := nil;
ArchiveData.CmtBufSize := 0;
ArchiveData.CmtSize := 0;
ArchiveData.CmtState := 0;
end; { OpenArchive }
function ReadHeader(hArcData: THandle; var HeaderData: THeaderData): integer; stdcall;
var
manip: TGpStructuredStorageManipulator;
begin
manip := GStructuredStorageList.LocateHandle(hArcData);
if not assigned(manip) then
Result := E_NOT_SUPPORTED
else if manip.GetNextHeader(HeaderData) then
Result := 0
else
Result := E_END_ARCHIVE;
end; { ReadHeader }
function ProcessFile(hArcData: THandle; Operation: integer; DestPath, DestName: PChar):
integer; stdcall;
var
manip : TGpStructuredStorageManipulator;
outFile: TFileStream;
begin
if (Operation = PK_SKIP) or (Operation = PK_TEST) then
Result := 0
else begin
manip := GStructuredStorageList.LocateHandle(hArcData);
if not assigned(manip) then
Result := E_NOT_SUPPORTED
else begin
try
outFile := TFileStream.Create(string(DestPath) + string(DestName), fmCreate);
try
Result := manip.ExtractCurrentTo(outFile);
finally FreeAndNil(outFile); end;
except
on E: EFCreateError do
Result := E_ECREATE;
end;
end;
end;
end; { ProcessFile } 
function CloseArchive(hArcData: THandle): integer; stdcall;
begin
Result := GStructuredStorageList.CloseArchive(hArcData);
end; { CloseArchive }

If you'll be browsing the code, you'll notice that I decided against creating a long list of all files and folders inside the composite file in the OpenArchive procedure. Instead, I'm keeping internal enumerator state in the TGpStructuredStorageManipulator object and GetNextHeader method of this class uses this internal state to walk over the composite file. It seemed neater that way.


Oh, and what about SetChangeVolProc and SetProcessDataProc? They just set callback functions that we don't need. Read more on them in the WCX plugin documentation.


WRITING TO COMPOSITE FILES


Adding write and delete access to the plugin should be easy - one only has to implement PackFiles and DeleteFiles. Implementation should be quite straightforward as theay don't use the 'iterate until you find the correct file' approach. Instead of that, TC sends full file name (with folders and everything) to the procedure. But as I had no need for write access, I haven't implemented that part yet. Feel free to do it instead of me (and don't forget to update GetPackerCaps accordingly :) ).

Tuesday, January 09, 2007

No Delphi content here, just woodworking ...

No, I didn't forget that I'm writing a blog. Yes, it was a long time since I wrote anything here. I am really sorry for that. I'm just fixing thousand things all at once in an important part of our software (and I intend to blog about some aspects of that - as soon as I can find a little time to prepare the article).

In the meantime, I'd like to show you a little of what I do in the free time - make stuff from the wood. No real photos today (for the curious, there is a gallery on my web site), just two SketchUp models.

First a table I've been building for my wife's sister and her family. It will be made of ash and is currently in a preparation phase - half of the wood have been planned and cut to size and other half is still in a raw form. Dimensions are 100 x 80 x 35 cm and I think it will weight around 50 kg (yep, lots of solid wood here).

Another project got stuck in the designing phase - my 'customer' (I only work for friends and family, I'm not selling products on the open market) decided that she will continue to live without a coffee table.

It ought to be a simple wood-and-glass table, looking roughly like this (all rough edges are courtesy of my inability to draw nice curves with SketchUp).

If somebody wants a deeper look, I can provide the SketchUp source files. 

Technorati tags: