The Delphi Geek: 03/2008

Thursday, March 27, 2008

GpLists emergency update

TGpInt64List.Move was totally broken. Great thanks go to 'Intra' for noticing it.

Friday, March 21, 2008

Walking the key-value container

The recent discussion in comments to my latest articles (Fun with enumerators - Walking the (integer) list, On bad examples and smelly code) caused a shift in my perspective. I was always treating my TGpIntegerObjectList and TGpInt64ObjectList as lists with some baggage attached, but in practice I'm almost never using them that way. Most of the time I treat them as a background storage for key-value pairs.

What's the difference? Most of the time, I don't care about item indices. When I use my lists as containers, I never need to know where in the list some (Key, Value) pair is stored. OK, almost never. When I'm deleting from the list, I sometimes use the index, just for the performance purposes. And when I access the Value part, I have to find the index with IndexOf and then use it to reference the Objects property. There are probably other cases too - but that's something that you just have to take into account if you want to use a list as a storage media.

From time to time I'm extending lists in my GpLists unit with small wrappers that help me not to use list indices in some specific situation. Today, I used the new Walk enumerator in some code and asked myself: "Why does it have to return list index? Why couldn't it return a (Key, Value) pair?"

Good question. Why couldn't it? It turns out that it could.

A little enumerator that could

Let's set up some assumptions first. Assume that I have this pointless list.

il := TGpIntegerObjectList.Create;
il.AddObject(1, TGpString.Create('one'));
il.AddObject(2, TGpString.Create('two'));
il.AddObject(3, TGpString.Create('three'));

Further assume that I want to walk over it and display both number and text for each item. I can do this with a standard loop.

for idx := 0 to il.Count - 1 do
  Log('%d: %s', [il[idx], TGpString(il.Objects[idx]).Value]);

Or with my index-returning walker.

for idx in il.Walk do
  Log('%d: %s', [il[idx], TGpString(il.Objects[idx]).Value]);

But what I'd really like to do is.

for kv in il.WalkKV do
  Log('%d: %s', [kv.Key, TGpString(kv.Value).Value]);

Or even better.

for kv in il.WalkKV do
  Log('%d: %s', [kv.Key, kv.StrValue]);

In other words, I want to return not a single item, but a pair of items from the enumerator. Of course, Delphi expressions can only return a single result and not a tuple so we have to provide a wrapper for enumerated (Key, Value) pairs. We have to pack them in a record, class or interface.

Getting all self-destructive

My first idea was to return an interface to a key-value object from the enumerator.

  IGpKeyValue = interface
    function GetKey: int64;
    function GetValue: TObject;
    property Key: int64 read GetKey;
    property Value: TObject read GetValue;
  end; { IGpKeyValue }

  TGpKeyValue = class(TInterfacedObject, IGpKeyValue)
  private
    kvKey: int64;
    kvValue: TObject;
  protected
    function  GetKey: int64;
    function  GetValue: TObject;
  public
    constructor Create(key: int64; value: TObject);
    property Key: int64 read GetKey;
    property Value: TObject read GetValue;
  end; { TGpKeyValue }

  TGpIntegerObjectListWalkKVEnumerator = class
    //...
    function  GetCurrent: IGpKeyValue;
    property Current: IGpKeyValue read GetCurrent;
  end; { TGpIntegerObjectListWalkKVEnumerator }

function TGpIntegerObjectListWalkKVEnumerator.GetCurrent: IGpKeyValue;
var
  idx: integer;
begin
  idx := wkeListEnumerator.GetCurrent;
  Result := TGpKeyValue.Create(wkeListEnumerator.List[idx],
    TGpIntegerObjectList(wkeListEnumerator.List).Objects[idx]);
end; { TGpIntegerObjectListWalkKVEnumerator.GetCurrent }

That surely works fine, but guess what - it's incredibly slow. I wouldn't expect anything else - after all an object is allocated for every enumerated value, plus all that complications with interface reference counting...

I did some testing, of course. Thousand iterations over a list with 10.000 elements. Results are quite interesting. 5 milliseconds for a standard for loop. 50 milliseconds for my Walk enumerator. 5 seconds for interface-based WalkKV. Ouch! Let's return to the drawing board...

One allocation is enough

My next idea was to return not an interface, but an object. When you return an object, you actually return a pointer to the object data, which is quite fast. It would not help much if I would recreate this object every time the GetCurrent is called, but luckily there is no real reason to do that. I can create the object when enumerator is created and destroy it when enumerator is destroyed.

  TGpKeyValue = class
  private
    kvKey  : int64;
    kvValue: TObject;
  public
    property Key: int64 read kvKey write kvKey;
    property Value: TObject read kvValue write kvValue;
  end; { TGpKeyValue }

  TGpIntegerObjectListWalkKVEnumerator = class
  private
    wkeCurrentKV: TGpKeyValue;
    wkeListEnumerator: TGpIntegerListWalkEnumerator;
  public
    constructor Create(aList: TGpIntegerObjectList; idxFrom, idxTo: integer);
    destructor  Destroy; override;
    //...
    function  GetCurrent: TGpKeyValue;
    property Current: TGpKeyValue read GetCurrent;
  end; { TGpIntegerObjectListWalkKVEnumerator }

constructor TGpIntegerObjectListWalkKVEnumerator.Create(aList: TGpIntegerObjectList;
  idxFrom, idxTo: integer);
begin
  inherited Create;
  wkeListEnumerator := TGpIntegerListWalkEnumerator.Create(aList, idxFrom, idxTo);
  wkeCurrentKV := TGpKeyValue.Create;
end; { TGpIntegerObjectListWalkKVEnumerator.Create }

destructor TGpIntegerObjectListWalkKVEnumerator.Destroy;
begin
  FreeAndNil(wkeCurrentKV);
  FreeAndNil(wkeListEnumerator);
  inherited;
end; { TGpIntegerObjectListWalkKVEnumerator.Destroy }

function TGpIntegerObjectListWalkKVEnumerator.GetCurrent: TGpKeyValue;
var
  idx: integer;
begin
  idx := wkeListEnumerator.GetCurrent;
  wkeCurrentKV.Key := wkeListEnumerator.List[idx];
  wkeCurrentKV.Value := TGpIntegerObjectList(wkeListEnumerator.List).Objects[idx];
  Result := wkeCurrentKV;
end; { TGpIntegerObjectListWalkKVEnumerator.GetCurrent }

BTW, you can see another trick in this implementation - enumeration by delegation. I'm reusing my Walk enumerator to do the actual walking.

That works much faster than the interface-based version - 300 ms for my test case. It's still six times slower than the Walk enumerator, though, and it is not really obvious why the difference is so big. Leave that be, for a moment.

The third approach would be to use a record to store the current (Key, Value) pair. Then there is no allocation/deallocation at all, but resulting code is not faster. Record-based enumerator needs about 500 ms to run the test case.

This slowdown occurs because record is not returned as a pointer, but as a full copy. In the class-based scenario, GetCurrent returns a pointer to the TGpKeyValue object and that pointer is passed in a register. In the record version, GetCurrent returns not a pointer to the record, but the record itself - it copies full record to the stack so the caller can use this copy - and that is waaaay slower.

The speed difference

Let's return to that major speed difference between Walk and WalkKV. I looked at the code, but couldn't find any good reason. Then I turned to the CPU view and it was evident. The problem lied not in the enumerator, but in my poorly designed benchmarking code :(

You see, I was timing multiple repetitions of these three loops:

for idx := 1 to 10000 do 
  ;

for idx in il.Walk do
  ;

for kv in il.WalkKV do
  ;

Three empty loops, so I'm timing just the enumeration, yes? No!

First loop just runs from 1 to 10000. Trivial job and compiler will generate great code.

Second loop does the same, but with more overhead.

Third loop does much more than that. It also accesses il[] and il.Objects[] (inside the GetCurrent).

In reality, code inside the for statement in the first two cases would have to access il[] and il.Objects[] by itself. The code inside the third for statement has no need for that - it gets data neatly prepared in kv.Key and kv.Value.

I changed the loops to:

for idx := 0 to il.Count - 1 do begin
  il[idx];
  obj := il.Objects[idx];
end;

for idx in il.Walk do begin
  il[idx];
  obj := il.Objects[idx];
end;

for kv in il.WalkKV do begin
  kv.Key;
  obj := kv.Value;
end;

I'm just accessing and throwing the Items[]/Key information away and then I copy the Objects[]/Value information into the local variable. All loops are now running comparable jobs.

Results are now significantly different. Simple for loop needs 300 ms to run the test, Walk version needs 310 ms (really small difference) and WalkKV version needs 470 ms. It is still slower, but by less than the factor of two. If there would be a real code inside the for loop, the difference would be unnoticeable.

Morale? You should benchmark, but you should also check that you're benchmarking the right thing!

Syntactic sugar

The generic version of WalkKV only supports this kind of code:

for kv in il.WalkKV do
  Log('%d: %s', [kv.Key, TGpString(kv.Value).Value]);

But there's a really neat trick to change this into

for kv in il.WalkKV do
  Log('%d: %s', [kv.Key, kv.StrValue]);

without modifying the WalkKV itself. Can you guess it? Class helpers, of course.

You just have to declare a simple helper in the WalkKV consumer (no need to change the WalkKV itself) and you can use the StrValue instead of TGpString(kv.Value).Value.

  TGpKeyStrValue = class helper for TGpKeyValue
  public
    function StrValue: string; inline;
  end; { TGpKeyStrValue } 

function TGpKeyStrValue.StrValue: string;
begin
  Result := TGpString(Self.Value).Value;
end;

Conclusion: class helpers are great. Key-value walker is useful. Enumerator-induced performance loss is negligible. Enumerators are fun.

---

I've tagged this and all previous enumerator-related articles with the enumerators label so you can now access them all at once via this simple link.

Sunday, March 16, 2008

On bad examples and smelly code

I've received lots of criticism on my latest article on enumerators. For the convenience of my readers, here are some excerpts.

Removal loops should be iterated backwards to prevent index shuffling.

... while I can certainly see the usefulness in your Slice- and Walk-enumerators ... I don't think they're the best solution to the two examples you quoted.

Simplicity does not justify an example that fails to work correctly. A bad example is still a bad example, no matter how simple.

I'd also like to echo the other comments w.r.t your initial example. Reversing the direction of the iteration is the simplest wasy to simplify the exemplar task, not invoking enumerators and the such like.

They can be summarized in one sentence: "Your examples suck." The main rationale for that reasoning seems to be the accepted way to delete stuff from a TList/TObjectList/TAnyList - start at the end and proceed backwards.

Well, I have two things to say. [I apologize for responding in a new post, but I feel that this is an interesting topic by itself.]

1. Yes, my examples are simplified. Terribly so. Maybe they are even simplistic. At least some part of my readers seems to think so. But that's just the way I like them.

I'm not trying to enforce my ideas on my readers. I'm only displaying them to the public. That's why I tend to minimize all my examples. I'll show you the code, I'll show you the usage, but you should think about the solution and how it applies to your problems. Maybe you'll see that it can be useful and you'll adopt it. Maybe not. Maybe you'll remember it a year from now and suddenly see the light. Maybe you'll use it for a month or two and then decide that I'm a twit and that your previous ways were better. [In this case, please refer to the last paragraph in this post.] I don't really care. I'm giving my ideas out, I'm glad of every feedback and I'm happy if somebody else is using my code. I've received a lot from the Delphi community in past ten years and I like to give back as much as I can.

2. I don't like the 'traverse the list backwards and all will be well' approach to list deletion. I have a really strong aversion to it. It looks fine, it works fine but it smells really bad.

I was thinking a lot about why I dislike it so much and I think that the main reason is that it relies too much on the list internals. Admittedly, this is a very well documented behavior, which will never change in the future. Let's try again - it relies too much on the internals of the underlying structure. If at some point in the future you change the structure, this assumption may fail. OK, it is hard to imagine an underlying structure that would behave almost the same as the TList but would behave differently on Delete, but what if you just forgot to adapt this part of the code to the new structure? Unlikely, I agree.

People will complain that I can't give a strong reason for my fears and they will be correct. I don't have any good cause against this kind of list cleanup.

You see, if I learned something in my programming career, it's that some code should never be used. When you program, some things work out and some not. Some code works fine from the beginning and some causes problem after a problem and bug report after a bug report. A good programmer recognizes that and stops using the code/fragments/patterns that are likely to cause problems in the future. Somehow I have classified the 'downto deletion' in the same category. I don't remember why. That's why I'm not saying it is bad, only that it smells bad.

The other reason is the code readability. If find the code that explicitly states its intention more readable and more maintainable. Your point may, of course, differ.

---

May my ramblings not stop you from commenting. I'm happy when I receive comments that point to my errors, however justifiable they may be. Even if I don't agree with them, they make me rethink my strategies so at the end I'm even more sure in my approach.

Flame on!

Friday, March 14, 2008

Fun with enumerators - Walking the (integer) list

Do you hate such code?

    idx := 0;
    while idx < il.Count do
      if ShouldBeRemoved(il[idx]) then
        il.Delete(idx)
      else
        Inc(idx);

If you do, read on!

TGpIntegerList [For the purpose of this blog entry, TGpIntegerList token is redefined to implicitly expands to "TGpIntegerList, TGpInt64List, and all classes derived from them."] included enumeration support since Delphi 2005 times. You could, for example, write

  il := TGpIntegerList.Create([1, 3, 5, 7]);
  try
    for i in il do
      Log('%d', [i]);
  finally FreeAndNil(il); end;

and get numbers 1, 3, 5, and 7 in the log.

Slicing ...

Recently, I've extended TGpIntegerList with two additional enumerators, both (of course) springing from practice. I've noticed that I still write lots of 'old school' for loops because I want to do something special with the first element. For example, take this simple code fragment that finds a maximum element in the list.

    max := il[0];
    for idx := 1 to il.Count - 1 do
      if il[idx] > max then
        max := il[idx];

To do such thing with for-each, you have to introduce a boolean flag.

    gotMax := false;
    for i in il do
      if not gotMax then begin
        max := i;
        gotMax := true;
      end
      else if i > max then
        max := i;

Ugly.

Much nicer implementation can be written with the new Slice() enumerator.

    max := il[0];
    for i in il.Slice(1) do
      if i > max then
        max := i;

Slice takes two parameters representing starting and ending index in the list and returns all values between them (indices are inclusive). Ending index can be omitted in which case it will default to Count-1 (i.e. to the end of the list).

Slice enumerator is implemented in a traditional manner - with the enumerator factory. For more details see my older articles and GpLists source code.

... And Walking

The second and much more interesting enumerator solves the problem shown in the introduction to this article. I noticed that I write lots of code that iterates over some list and removes some entries while keeping others intact. Typical scenario: removing timed-out clients from the server's list.

Something like that is impossible to do with the TGpIntegerList default enumerator. Firstly because this enumerator returns list elements and not list indices (and we typically have to access the object stored in the .Objects[] part of the list during the process) and secondly because the enumerator does not work correctly if enumerated list is modified behind its back. The second issue also prevents the use of standard for loop.

To solve both problems at once, I've introduced the Walk enumerator. It returns list indices instead of list elements and functions correctly if Add, Insert or Delete are used on the list while enumerator is active. (The second part is implemented by using TGpIntegerList internal notification mechanism, introduced just for this purpose.)

Now I can write:

    for idx in il.Walk do
      if ShouldBeRemoved(il[idx]) then
        il.Delete(idx);

Implementation? Check the source, Luke. It's not complicated.

Now if only I could add such enumerator to the TObjectList ...

Gp* update

GpHugeFile 5.05

Delphi 2007 changed the implementation of CreateFmtHelp so that it clears the Win32 'last error'. Therefore, it was impossible to detect windows error in detection handler when EGpHugeFile help context was hcHFWindowsError. To circumvent the problem, all EGpHugeFile help contexts were changed to a negative value. HcHFWindowsError constant was removed. All Win32 API errors are passed in the HelpContext unchanged. IOW, if HelpContext > 0, it contains an Win32 API error code, otherwise it contains one of hcHF constants.
Added method TGpHugeFile.GetTime and property TGpHugeFile.FileTime.

GpStreamS 1.22

Added AppendToFile helper functions (two overloads).
Added ReadTag and WriteTag support for int64 and WideString data.
Added two overloaded SafeCreateFileStream versions returning exception message.
Added Append stream helper.
Added AutoDestroyWrappedStream property to the TGpStreamWindow class.
Added TGpBufferedStream class. At the moment, only reading is buffered while writing is implemented as a pass-through operation.
Made 'count' parameter to CopyStream optional, the same way as TStream.CopyFrom is implemented.
Check for < 0 position in TGpStreamWindow.Seek.
Fixed reading/writing of zero bytes in TGpStreamWindow.
Added bunch of 'inline' directives.

GpLists 1.35

Implemented TGpClassList, a TStringList-based list of classes. Class names must be unique. Useful when you need to implement a class registry to generate objects from class names.
Added Walk enumerator to the TGpIntegerList and TGpInt64List. This enumerator allows list modifications (Add, Insert, Delete) from the enumeration consumer. IOW, you can do this:
```
for idx in list.Walk do
  if SomeCondition(list[idx]) then
    list.Delete(idx);
```
Modified TGpCountedInt64List to store int64 counters.
Added property ItemCounter[] to TGpCountedIntegerList and TGpCountedInt64List.
Added Slice(from, to) enumerator to TGpIntegerList and TGpInt64List.
Add TGpObjectRingBuffer and TGpDoublyLinkedList enumerators. Both lock access to the list during the enumeration process if multithreaded mode is enabled.
Added method Contains to TGpIntegerList, TGpInt64List, TGpCountedStringList, TGpTMethodList.
When TGpIntegerObjectList or TGpInt64ObjectList was Sorted and with Duplicates set to dupIgnore, calling AddObject with item that was already in the list caused internal exception.
Enumerators changed to records with GetCurrent inlined, as suggested in More fun with Enumerators.

GpTimezone 1.22

Implemented DateLT, DateLE, DateGT, DateGE.

DSiWin32 1.36a

Added procedures DSiCenterRectInRect and DSiMakeRectFullyVisibleOnRect.
Added DSiTerminateProcessById procedure.
Added three performance counter helpers DSiPerfCounterToMS, DSiPerfCounterToUS, and DSiQueryPerfCounterAsUS.
Added function DSiTimeGetTime64.
Added function DSiGetProcessFileName.
Added function DSiEditShortcut.
Added function DSiInitFontToSystemDefault.
Added many SHGetSpecialFolderLocation folder constants.
Added ShGetSpecialFolderLocation flags CSIDL_FLAG_DONT_UNEXPAND and CSIDL_FLAG_DONT_VERIFY.
Added dynamically loaded API forwarder DSiSetSuspendState.
Added dynamically loaded API forwarders DSiEnumProcessModules, DSiGetModuleFileNameEx, and DSiGetProcessImageFileName.
Changed DSiIsAdmin to use big enough buffer for token data.
Changed DSiIsAdmin to ignore SE_GROUP_ENABLED attribute because function was sometimes incorrectly returning False.
Added parameter 'parameters' to DSiCreateShortcut and DSiGetShortcutInfo.
More stringent checking in DSiGetProcessWindow.

Friday, March 07, 2008

TDM Rerun #6: GExperts 1.0

The greatest (in my view), and one of the oldest of the Delphi IDE expert suites is GExperts. It was created by Gerals Nunn (in 1997, if I am not mistaken), who also developed it for almost two years. Gerald has moved to other projects, but he kindly donated the GExperts source code to the online community.

- GExperts 1.0, The Delphi Magazine 72, August 2001

I still stand by those words. GExperts is a great suite of experts and I still use it every day. I also wrote few experts that are included in the GExperts pack and few that are included only in my private build.

Links: article (PDF, 95 KB), GExperts

Monday, March 03, 2008

Debugging With Lazy Breakpoints

Sometimes you're hunting down a problem that occurs very rarely. When it does, an exception is triggered. If you're debugging, Delphi should pop up (it does) and allow you to look at the stack and inspect local variables (it doesn't - happens a lot to me). What can you do?

You could set a conditional breakpoint just before the problem occurs and test for the problem causing condition here. That would work on a short run, but if you're hunting the problem for many weeks, the breakpoint may move around or even get disabled. Delphi's breakpoint storage mechanism is somewhat unstable if you edit the code around the breakpoint a lot (and even more if you edit the code on multiple computers).

More stable solution is to do everything in code, from testing if debugger is running to pausing the program and breaking into the debugger. Following fragment illustrates the approach:

if DebugHook <> 0 then // test if debugger is running
  if problem = true then //test the condition
    asm int 3 end; //break into debugger

Not very obvious and full of arcane knowledge, but it works.

Saturday, March 01, 2008

Paint It Black (And Red)

Just in case you're not following Julian's blog, I'd like to bring your attention to his new series on red-black trees. Red-black trees are great data structure, but are notoriously hard to code correctly as the algorithm is full of weird edge cases.

I've always been a follower of his 'algorithms and data structures' articles in The Delphi Magazine and I'm sure his new series will be presented at the equally high standard. He's using C#, VB and .NET but I'm sure the series will be interesting for us Delphi users, too. [And he already gave us RB tree implementation in the EZDSL library - thanks, Julian!]

Of course, if you own his DADS book, you can read all about RB trees there.