Friday, March 14, 2008

Fun with enumerators - Walking the (integer) list

Do you hate such code?

    idx := 0;
while idx < il.Count do
if ShouldBeRemoved(il[idx]) then
il.Delete(idx)
else
Inc(idx);

If you do, read on!


TGpIntegerList [For the purpose of this blog entry, TGpIntegerList token is redefined to implicitly expands to "TGpIntegerList, TGpInt64List, and all classes derived from them."]  included enumeration support since Delphi 2005 times. You could, for example, write

  il := TGpIntegerList.Create([1, 3, 5, 7]);
try
for i in il do
Log('%d', [i]);
finally FreeAndNil(il); end;

and get numbers 1, 3, 5, and 7 in the log.


Slicing ...


Recently, I've extended TGpIntegerList with two additional enumerators, both (of course) springing from practice. I've noticed that I still write lots of 'old school' for loops because I want to do something special with the first element. For example, take this simple code fragment that finds a maximum element in the list.

    max := il[0];
for idx := 1 to il.Count - 1 do
if il[idx] > max then
max := il[idx];

To do such thing with for-each, you have to introduce a boolean flag.

    gotMax := false;
for i in il do
if not gotMax then begin
max := i;
gotMax := true;
end
else if i > max then
max := i;

Ugly.


Much nicer implementation can be written with the new Slice() enumerator.

    max := il[0];
for i in il.Slice(1) do
if i > max then
max := i;

Slice takes two parameters representing starting and ending index in the list and returns all values between them (indices are inclusive). Ending index can be omitted in which case it will default to Count-1 (i.e. to the end of the list).


Slice enumerator is implemented in a traditional manner - with the enumerator factory. For more details see my older articles and GpLists source code.


... And Walking


The second and much more interesting enumerator solves the problem shown in the introduction to this article. I noticed that I write lots of code that iterates over some list and removes some entries while keeping others intact. Typical scenario: removing timed-out clients from the server's list.


Something like that is impossible to do with the TGpIntegerList default enumerator. Firstly because this enumerator returns list elements and not list indices (and we typically have to access the object stored in the .Objects[] part of the list during the process) and secondly because the enumerator does not work correctly if enumerated list is modified behind its back. The second issue also prevents the use of standard for loop.


To solve both problems at once, I've introduced the Walk enumerator. It returns list indices instead of list elements and functions correctly if Add, Insert or Delete are used on the list while enumerator is active. (The second part is implemented by using TGpIntegerList internal notification mechanism, introduced just for this purpose.)


Now I can write:

    for idx in il.Walk do
if ShouldBeRemoved(il[idx]) then
il.Delete(idx);

Implementation? Check the source, Luke. It's not complicated.


Now if only I could add such enumerator to the TObjectList ...

10 comments:

  1. Anonymous15:43

    Removal loops should be itterated backwards to prevent index shuffling.

    IE: if remove items 1 and 2 and itterate forward, you want to remove item 1 and item 1, not item 1 and item 2. (Removing item 1 first from a list tends to shuffle everything forward, now your later references are all out..)

    If you itterate backwards you could just remove item 2 and then item 1, and never have to worry about it.

    ReplyDelete
  2. Hmm, I'm probably missing out on some deeper guru logic here (probably efficiency-related) but while I can certainly see the usefulness in your Slice- and Walk-enumerators and also the internal notification mechanism to make possible the enumeration of transient lists, I don't think they're the best solution to the two examples you quoted.

    Why not simply do this:

    for i := il.Count-1 downto 0 do
      if ShouldBeRemoved(il[i]) then
        il.Delete(i);

    max := -MaxInt;
    for i in il do
      if i > max then
        max := i;

    ReplyDelete
  3. Yes, counting down solves the problem in most cases.

    Examples are just that - examples. If I make them sufficiently complicated so that make sense, they are no longer examples.

    I prefer brevity and readability over efficiency in most cases. Enumerators are never efficient.

    ReplyDelete
  4. Anonymous22:02

    Simplicity does not justify an example that fails to work correctly.

    A bad example is still a bad example, no matter how simple.

    ReplyDelete
  5. Anonymous20:39

    "Slice enumerator is implemented in a traditional manner"

    But with a non-traditional API.

    Slice(N) "traditionally" (in the sense of the RTL Slice() function) returns the first N elements.

    With your implementation returning Count - N elements starting at the Nth, the potential for confusion is pretty obvious.

    fwiw I think both are wrong. Slice, by it's nature, should require first and last element indices to be specified, then there could be no confusion over what slicing with a single param might mean.


    I'd also like to echo the other comments w.r.t your initial example.

    Reversing the direction of the iteration is the simplest wasy to simplify the exemplar task, not invoking enumerators and the such like.

    So you needed an example of how enumerators help...? Then you should have started with one of your later examples (special treatment of the first element etc etc).

    In this case, a poor choice of initial example risks putting off readers who might otherwise found benefit in the REAL subject of the article.

    +0.02

    ReplyDelete
  6. @jolyon: "Slice, by it's nature, should require first and last element indices to be specified, then there could be no confusion over what slicing with a single param might mean."

    While I do agree with you, I hate to type il.Slice(1, il.Count-1). There can of course still be a confusion regarding the parameter semantics, but Delphi really helpes here - if you hover over Slice, it will show you that parameters are named idxFrom and idxTo.

    Maybe it's not a good idea that the second parameters defaults to -1. It should really default to MaxInt - I'll change this in the next GpLists release.

    ReplyDelete
  7. Anonymous02:37

    "While I do agree with you, I hate to type il.Slice(1, il.Count-1)"

    Then make it so that you can type:

    SliceTo(N);
    SliceFrom(N);
    or Slice(From, To);


    "Delphi really helps here - if you hover over Slice, it will show you that parameters are named idxFrom and idxTo."

    I don't always read code in the IDE and the IDE can't *always* find the param insight.

    :)

    ReplyDelete
  8. @Jolyon: You found my weak spot. While I like my code to be readable, I also like it to be compact and introducing three functions to do the work of one is all but that.

    ReplyDelete
  9. Anonymous03:37

    Hmm, readable vs compact. The age old tension in source.

    :)

    Of course, SliceTo/SliceFrom and Slice() would not be 3 functions to do the job of one, it would be one function with two additional intention revealing interfaces.

    With inlining those two additional functions wouldn't (likely) even *be* additional functions, since they would only call Slice() anyway so any use of SliceTo/SliceFrom should be readily inlinable by the compiler.


    SliceTo(N) == Slice(0, N-1)
    SliceFrom(N) == Slice(N-1, Length-1)


    Worth noting of course is also that the benefit of a slightly richer interface, the consumers of that interface (typically numbering much greater than the ordinality of the interface itself : i.e. 1) would themselves be more compact (for those occasions when SliceTo/SliceFrom could be used in place of a fully expanded parameter list to plain old Slice).

    But a useful post (my only point w.r.t the 'badness' of the example being that it might have put some people off from reading what might otherwise prove to be useful to them).

    ReplyDelete
  10. While I do partially agree with you, you have to admit that programming world typically isn't oriented in that direction. Heck, people even invented default parameters so that they didn't have to create multiple functions to do the same job. And even before that, TStream.CopyFrom could function in two different ways, depending on the value of the Count parameter. (Hmm, that's actually an argument _for_ your case because I don't like this CopyFrom behaviour. It's counterintuitive).

    For your case work some Visual Basic imports like LeftStr, MidStr, RightStr - nothing that Copy and Length could handle, but definitely a nice wrapper that simplifies the code.

    Your SliceTo/SliceFrom is not without its own problems. Why does it operate on element count, not on element indices? How can the user remember _this_?

    Now we see that all problems rise from the C-like usage of starting arrays with [0] and that somebody should use that time-travelling device they are building near CERN and go back in time and convince K&R in starting arrays with [1].

    ReplyDelete