Tuesday, November 09, 2010

Blaise Pascal Magazine Rerun #3: Real World Enumerators

This article was originally written for the Blaise Pascal Magazine and was published in Issue #5.

Relevant Delphi code is available at http://17slon.com/blogs/gabr/BPM/RealWorldEnumerators.zip

Today I’ll conclude my three part series on enumerators with some real world examples, mostly extracted from my own code. Have I mentioned already that I simply love enumerators? ;)

Generators

We’ve been looking at a various approaches to enumerators in the past two issues, but all approaches had something in common – enumerators operated on some structure, for example a string list. That’s not a requirement, though, it is entirely possible to write an enumerator that doesn’t require any external data to function, but generates values internally. Such enumerators I like to call generators but that’s entirely my own terminology and may be not widely accepted.

To demonstrate such generator I’ll code a Fibonacci sequence generator. In case you skipped that part of your basic math education, the Fibonacci sequence starts with numbers 1, 1 and then continues such that each element is a sum of previous two. After 1 and 1 comes 2 (= 1+1), 3 (= 1+2), 5 (=2+3) and so on. Sometimes you’ll also see a definition that starts with 0 and 1 but except the first element, there’s no difference between those sequences (as 0+1 is 1).

I always start coding with the test case or a code that will be using the generator.

procedure TfrmBPRealWorldEnum.btnGeneratorClick(Sender: TObject);
var
  fib: integer;
  s: string;
begin
  s := 'Fibonacci sequence:';
  for fib in Fibonacci(100) do
    s := s + ' ' + IntToStr(fib);
  lbLog.Items.Add(s);
end;

The code uses ‘for in’ loop to generate all Fibonacci number smaller than 100 (just an arbitrary number I pulled out of a thin air) and concatenates them into a string which will is logged at the end. When we run the code, it outputs the expected string “Fibonacci sequence: 1 1 2 3 5 8 13 21 34 55 89”. OK, it works, but how did it generate the sequence?

As you may remember from the previous installment, the first step is to write an enumerator factory – a record that implements function GetEnumerator. This record will be returned from the Fibonacci function and compiler will call it’s GetEnumerator function to create the enumerator. While creating this factory we must pass it the upper bound (which the factory will forward to the actual enumerator when it will be created).

type
  TFibonacciEnumFactory = record
  private
    FUpperBound: integer;
  public
    constructor Create(upperBound: integer);
    function GetEnumerator: TFibonacciEnum;
  end;

function Fibonacci(upperBound: integer): TFibonacciEnumFactory;
begin
  Result := TFibonacciEnumFactory.Create(upperBound);
end;

constructor TFibonacciEnumFactory.Create(upperBound: integer);
begin
  FUpperBound := upperBound;
end;

function TFibonacciEnumFactory.GetEnumerator: TFibonacciEnum;
begin
  Result := TFibonacciEnum.Create(FUpperBound);
end;

This is pretty much a template enumerator factory code with one small modification – no external structure is passed to the factory because we are not working with one.

The enumerator is equally simple. Constructor stores away the upper bound and prepares first two element of the Fibonacci sequence. GetCurrent method returns an element and generates new one. And MoveNext simply checks if the element to be returned is already larger than the upper bound. The code is really simple and speaks for itself.

type
  TFibonacciEnum = record
  private
    FFib1: integer;
    FFib2: integer;
    FUpperBound: integer;
  public
    constructor Create(upperBound: integer);
    function GetCurrent: integer;
    function MoveNext: boolean;
    property Current: integer read GetCurrent;
  end;

constructor TFibonacciEnum.Create(upperBound: integer);
begin
  FUpperBound := upperBound;
  FFib1 := 1;
  FFib2 := 1;
end;

function TFibonacciEnum.GetCurrent: integer;
var
  next: integer;
begin
  Result := FFib1;
  next := FFib1 + FFib2;
  FFib1 := FFib2;
  FFib2 := next;
end;

function TFibonacciEnum.MoveNext: boolean;
begin
  Result := FFib1 < FUpperBound;
end;

And that, my dear reader, is everything that can be said about generators.

Slicing

Don’t you just hate such code?

max := il[0];
for idx := 1 to il.Count - 1 do
  if il[idx] > max then
    max := il[idx];

When I see something like that I immediately start thinking about enumerators. The code really doesn’t care about indices – it wants to retrieve maximum element in the list. Still, it cannot use a simple ‘for in’ loop because the first element must be processed separately. Well, you can use ‘for in’, but the solution is less than optimal.

gotMax := false;
for i in il do
  if not gotMax then begin
    max := i;
    gotMax := true;
  end
  else if i > max then
    max := i;

A better approach is to write a Slice enumerator – an enumerator that returns only a part of the list. Then you can write much nicer code.

max := il[0];
for i in il.Slice(1) do
  if i > max then
    max := i;

If you are working with a container that implements method First returning first element (the one with index 0), the code becomes even nicer.

max := il.First;
for i in il.Slice(1) do
  if i > max then
    max := i;

Now it is really obvious what the intention of this code fragment is!

I’ll not be cluttering the article with the implementation of the Slice enumerator – you can look it up in my GpLists unit.

Real World Enumerators sample app

Fast key-value enumerator

Before the Delphi 2009 came out and brought generics into the Win32 world I liked to abuse my TGpIntegerObjectList (which is a list of (integer, TObject) pairs, much similar to the TStringObjectList) as a dictionary of (id, entity) pairs. To write a code that tried to expose this dictionary-ness to the casual observer I tried not to write too many traditional for loops operating on that code but stick to ‘for in’ whenever possible.

The trouble with the traditional list enumerator is that it only returns elements (integer part of the pair stored in my list). If you want to retrieve the associated object, you have to iterate over indices, not values, and access .Object[i]. Of course, that’s only a problem with the traditional enumerator. Nobody can stop us from writing an additional enumerator that returns (Key, Value) pairs. Who’d guess – I called it a WalkKV enumerator.

To start at the end (again!) – I wanted to be able to write a code like this.

var
  il: TGpIntegerObjectList;
  kv: TGpKeyValue;
begin
  il := TGpIntegerObjectList.Create;
  try
    il.AddObject(1, TGpString.Create('one'));
    il.AddObject(2, TGpString.Create('two'));
    il.AddObject(3, TGpString.Create('three'));
    for kv in il.WalkKV do
      lbLog.Items.Add(Format('%d: %s', [kv.Key, TGpString(kv.Value).Value]));
  finally
    FreeAndNil(il);
  end;
end;

WalkKV enumerator should walk over my list and return (Key, Value) object for each list index. I’ll then case the object to appropriate class and work with it. Even better - as a syntactic sugar, I can implement class helper for the TGpKeyValue class in the caller code and remove the ugly casting.

type
  TGpKeyStrValue = class helper for TGpKeyValue
  public
    function StrValue: string; inline;
  end; { TGpKeyStrValue }

function TGpKeyStrValue.StrValue: string;
begin
  Result := TGpString(Self.Value).Value;
end;

...

for kv in il.WalkKV do
  lbLog.Items.Add(Format('%d: %s', [kv.Key, kv.StrValue]));

The factory part is trivial and I’ll not be showing it here. You can look it up in the GpLists unit. The tricky part is returning the (Key, Value) object or, to be more specific, the lifecycle management of this object. My first idea was to return an interface to a (Key, Value) object. That surely worked but was incredibly slow. Not a surprise, after all this interface (or rather the object implementing this interface) was created and destroyed once for each returned pair.

The second idea was to return a record. That was much faster, but not very elegant. You see – when a record is passed from the function, its contents are copied byte by byte from source to result. Doing that for each pair in the list seemed very unproductive to me.

At the end I decided to return the object. Similar to the first (interface based) approach but with one small difference – object is created when the enumerator is created and destroyed when enumerator is destroyed. During the enumeration process it is not recreated at all. Only the content of the object (Key and Value) is modified. As returning an object is very quick (only an address of the object is passed to the caller), this approach was fastest of them all.

The implementation of the WalkKV enumerator is particularly interesting because it uses another neat trick. Instead of implementing the enumeration logic by itself, it simply reuses another, simpler, enumerator (called Walk and implemented by the TGpIntegerListWalkEnumerator class). Walk is an enumerator similar to the Slice (mentioned above) but with one big difference – it is stable relative to the list changes. In other words – if you delete an element from the list from inside the ‘for ... in list.Walk’ loop, the code will not crash. It will work as expected.

constructor TGpIntegerObjectListWalkKVEnumerator.Create
  (aList: TGpIntegerObjectList; idxFrom, idxTo: integer);
begin
  inherited Create;
  wkeListEnumerator := TGpIntegerListWalkEnumerator.Create(aList, idxFrom, idxTo);
  wkeCurrentKV := TGpKeyValue.Create;
end;

destructor TGpIntegerObjectListWalkKVEnumerator.Destroy;
begin
  FreeAndNil(wkeCurrentKV);
  FreeAndNil(wkeListEnumerator);
  inherited;
end;

function TGpIntegerObjectListWalkKVEnumerator.GetCurrent: TGpKeyValue;
var
  idx: integer;
begin
  idx := wkeListEnumerator.GetCurrent;
  wkeCurrentKV.Key := wkeListEnumerator.List[idx];
  wkeCurrentKV.Value := TGpIntegerObjectList(wkeListEnumerator.List)
    .Objects[idx];
  Result := wkeCurrentKV;
end;

function TGpIntegerObjectListWalkKVEnumerator.MoveNext: boolean;
begin
  Result := wkeListEnumerator.MoveNext;
end;

TDataset enumerator

At the end of my short series, I’d like to mention an enumerator written by Uwe Raabe. It demonstrates few finer points of Delphi compiler magic and allows you to write a code that iterates over datasets and is really readable. An example of the code (take from Uwe’s source) would be:

var
  Employee: Variant;
  S: string;

for Employee in QuEmployee do begin
  S := Trim(Format('%s %s', [Employee.First_Name, Employee.Last_Name]));
  if Employee.Hire_Date < EncodeDate(1991, 1, 1) then
    S := '*' + S;
end;

Uwe uses three tricks to implement his enumerator. First is a standard enumerator which enumerates over a dataset and returns special descendant of the Variant type for each record. Second is a data helper which “pushes” enumerator support into TDataset. The third, which is most “magicky” of them all, is a custom variant type which automagically accesses record fields when the caller calls .FieldName property (for example .First_Name and .LastName properties in the code above). Those properties are not defined the the TDataset enumerator, they are implemented in runtime. I must admit I didn’t event know this was possible to do in Delphi!

You can download this ingenious implementation at http://cc.codegear.com/Item/25386.

Conclusion

I believe there’s much more that can be said about enumerators and that I didn’t cover the area nearly deep enough, but nevertheless I hope I managed to show you how great the enumerators can be and how they can make your code more readable. Now go forth, and enumerate!

2 comments:

  1. Thanks for posting this. That' very inspiring! :)

    However, as little as this has to do with the subject of this article, I simply couldn't resist commenting on some of your examples:

    Slicing example: If you had simply initialized the max variable to 0 (or -MaxInt if the list may contain negative numbers) instead of il[0] then there would have been no need for the Splice enumerator and you could still write clean for..in code. But of course, then you would have had to think of a different example for the Splice enumerator... ;)

    Key/Value enumerator: When you played with returning an interface, why did you recreate the instance for every iteration? As far I can tell there is no need for that. You could simply return an interface reference to the same single object that you are now returning directly. The benefit would be that the consumer of the enumerator is no longer able to call Free or whatever other madness on the key/value object.

    Cheers,

    Oliver

    ReplyDelete
  2. @Oliver: Agree with everything you said :)

    The "constant interface" approach simply didn't cross my mind.

    ReplyDelete