Wednesday, April 01, 2009

Fluent XML [1]

Few days ago I was writing a very boring piece of code that should generate some XML document. It was full of function calls that created nodes in the XML document and set attributes. Boooooring stuff. But even worse than that – the structure of the XML document was totally lost in the code. It was hard to tell which node is child of which and how it’s all structured.

Then I did what every programmer does when he/she should write some boring code – I wrote a tool to simplify the process. [That process usually takes more time than the original approach but at least it is interesting ;) .]

I started by writing the endcode. In other words, I started thinking about how I want to create this XML document at all. Quickly I decided on the fluent interface approach. I perused it in the OmniThreadLibrary where it proved to be quite useful.

That’s how the first draft looked (Actually, it was much longer but that’s the important part.):

xmlWsdl := CreateFluentXml
.AddProcessingInstruction('xml', 'version="1.0" encoding="UTF-8"')
.AddChild('definitions')
.SetAttr('xmlns', 'http://schemas.xmlsoap.org/wsdl/')
.SetAttr('xmlns:xs', 'http://www.w3.org/2001/XMLSchema')
.SetAttr('xmlns:soap', 'http://schemas.xmlsoap.org/wsdl/soap/')
.SetAttr('xmlns:soapenc', 'http://schemas.xmlsoap.org/soap/encoding/')
.SetAttr('xmlns:mime', 'http://schemas.xmlsoap.org/wsdl/mime/');

This short fragment looks quite nice but in the full version (about 50 lines) all those SetAttr calls visually merged together with AddChild calls and the result was still unreadable (although shorter than the original code with explicit calls to XML interface).

My first idea was to merge at least some SetAttr calls into the AddChild by introducing two versions – one which takes only a node name and another which takes node name, attribute name and attribute value – but that didn’t help the code at all. Even worse – it was hard to see which AddChild calls were setting attributes and which not :(

That got me started in a new direction. If the main problem is visual clutter, I had to do something to make setting attributes stand out. Briefly I considered a complicated scheme which would use smart records and operator overloading but I couldn’t imagine a XML creating code which would use operators and be more readable than this so I rejected this approach. [It may still be a valid approach – it’s just that I cannot make it work in my head.]

Then I thought about arrays. In “classical” code I could easily add array-like support to attributes so that I could write xmlNode[attrName] := ‘some value’, but how can I make this conforming my fluent architecture?

To get or not to get

In order to be able to chain anything after the [], the indexed property hiding behind must return Self, i.e. the same interface it is living in. And because I want to use attribute name/value pairs, this property has to have two indices.

property Attrib[const name, value: XmlString]: IGpFluentXmlBuilder 
read GetAttrib; default;

That would allow me to write such code:

.AddSibling('service')['name', serviceName]
.AddChild('port')
['name', portName]
['binding', 'fs:' + bindingName]
.AddChild('soap:address')['location', serviceLocation];

As you can see, attributes can be chained and I can write attribute assignment in the same line as node creation and it is still obvious which is which and who is who.

But … assignment? In a getter? Why not! You can do anything in the property getter. To make this more obvious, my code calls this ‘getter’ SetAttrib. As a nice side effect, SetAttrib is completely the same as it was defined in the first draft and can even be used insted of the [] approach.

I’ll end today’s instalment with the complete 'fluent xml builder’ interface and with sample code that uses this interface to build an XML document. Tomorrow I’ll wrap things up by describing the interface and its implementation in all boring detail.

type
IGpFluentXmlBuilder = interface ['{91F596A3-F5E3-451C-A6B9-C5FF3F23ECCC}']
function GetXml: IXmlDocument;
//
function Anchor(var node: IXMLNode): IGpFluentXmlBuilder;
function AddChild(const name: XmlString): IGpFluentXmlBuilder;
function AddComment(const comment: XmlString): IGpFluentXmlBuilder;
function AddSibling(const name: XmlString): IGpFluentXmlBuilder;
function AddProcessingInstruction(const target, data: XmlString): IGpFluentXmlBuilder;
function Back: IGpFluentXmlBuilder;
function Here: IGpFluentXmlBuilder;
function Parent: IGpFluentXmlBuilder;
function SetAttrib(const name, value: XmlString): IGpFluentXmlBuilder;
property Attrib[const name, value: XmlString]: IGpFluentXmlBuilder
read SetAttrib; default;
property Xml: IXmlDocument read GetXml;
end; { IGpFluentXmlBuilder }
 
  xmlWsdl := CreateFluentXml
.AddProcessingInstruction('xml', 'version="1.0" encoding="UTF-8"')
.AddChild('definitions')
['xmlns', 'http://schemas.xmlsoap.org/wsdl/']
['xmlns:xs', 'http://www.w3.org/2001/XMLSchema']
['xmlns:soap', 'http://schemas.xmlsoap.org/wsdl/soap/']
['xmlns:soapenc', 'http://schemas.xmlsoap.org/soap/encoding/']
['xmlns:mime', 'http://schemas.xmlsoap.org/wsdl/mime/']
['name', serviceName]
['xmlns:ns1', 'urn:' + intfName]
['xmlns:fs', 'http://online.com/soap/']
['targetNamespace', 'http://online.com/soap/']
.AddChild('message')['name', 'fs:' + baseName + 'Request'].Anchor(nodeRequest)
.AddSibling('message')['name', 'fs:' + baseName + 'Response'].Anchor(nodeResponse)
.AddSibling('portType')['name', baseName]
.Here
.AddChild('operation')['name', baseName]
.AddChild('input')['message', 'fs:' + baseName + 'Request']
.AddSibling('output')['message', 'fs:' + baseName + 'Response']
.Back
.AddSibling('binding')
.Here
['name', bindingName]
['type', 'fs:' + intfName]
.AddChild('soap:binding')
['style', 'rpc']
['transport', 'http://schemas.xmlsoap.og/soap/http']
.AddChild('operation')['name', baseName]
.AddChild('soap:operation')
['soapAction', 'urn:' + baseName]
['style', 'rpc']
.AddSibling('input')
.AddChild('soap:body')
['use', 'encoded']
['encodingStyle', 'http://schemas.xmlsoap.org/soap/encoding/']
['namespace', 'urn:' + intfName + '-' + baseName]
.Parent
.AddSibling('output')
.AddChild('soap:body')
['use', 'encoded']
['encodingStyle', 'http://schemas.xmlsoap.org/soap/encoding/']
['namespace', 'urn:' + intfName + '-' + baseName]
.Back
.AddSibling('service')['name', serviceName]
.AddChild('port')
['name', portName]
['binding', 'fs:' + bindingName]
.AddChild('soap:address')['location', serviceLocation];

What do you think? Does my approach make any sense?

34 comments:

  1. Anonymous22:06

    Very nice idea... Maybe jQuery's .end() is better than .Back()

    ReplyDelete
  2. 1)I like this approach very much!
    Almost like a mini DSL in Delphi:)

    The only thing that may need a bit of polishing are the Back, Here, and Parent methods, as it is not immediately apparent what here or back does. (Do they work as a kind of bookmark?)
    Maybe one could also use up (to go up in the hierarchy) and top for the master node?


    .AddChild('first')
    .AddChild('first-child').up //instead of parent
    .AddChild('second');


    2) What do you think about solving this with the XML binding Wizard (which would generate class templates based on an XML file)? The Fluent approach is probably more flexible...

    3)"[That process usually takes more time than the original approach but at least it is interesting ;) .]"
    Been there, done that, but I never regretted it:)

    4)End is a reserwed word in Delphi so it can't be used as a method name.

    Best regards,
    Ajasja

    ReplyDelete
  3. Anonymous23:13

    Oh sorry, I missed that in this comment window :)

    How common can be returning to some point without knowing element? Mabybe
    Done(ElementName: string):


    .AddChild('some')
    .AddChild('subchild')
    ['some', 'value 1']
    .Done // closes nearest element
    AddChild('subchild')
    ['some', 'value 2']
    .AddChild('note')
    .Text('notes')
    .Done('some') // return to nearest 'some' element

    ReplyDelete
  4. "The only thing that may need a bit of polishing are the Back, Here, and Parent"

    Yep, those are exactly the methods I'm not happy with.

    .Up instead of .Parent is a great idea and I'll implementing.

    "Do they work as a kind of bookmark?"

    .Here pushes 'current node' onto internal stack, .Back pops a value from that stack and assigns it to the 'current node'. .Push/.Pop were also considered (but I don't like them), as were .Mark/.Return. I'm not happy with any of those solutions :(

    The XML Binding Wizard is also a great idea. If I can find some time ...

    ReplyDelete
  5. Anonymous00:15

    Very beautiful code.

    In my humble opinion I recommend to take a look at jquery
    selectors & transversing because they have a lot of
    experience in what can be needed.

    This can be the beginning of a serious dom library for Delphi.

    ReplyDelete
  6. Anonymous03:03

    Now that is a beauty to behold.

    ReplyDelete
  7. You may be interested in XDOM aka Open XML (opensource).

    Citation:
    "Open XML is a collection of XML and Unicode tools and components for the Delphi/Kylix™ programming language. All packages are freely available including source code."
    Tutorials and AddOns are available as well as 3rd party software.

    I'm using this library every now and then... since years ;)

    http://www.philo.de/xml/

    ReplyDelete
  8. Anonymous08:28

    Nice approach!

    Regarding functions Here and Back, what about using pascal notation of begin..end? Even if they are reserved words, there is possibility to use &begin and &end as valid identifiers (this syntax was introduced around D8 I think). Or maybe something like BeginBlock, EndBlock...

    ChAr

    ReplyDelete
  9. @Anonymous: I'm planning to look into jquery for quite some time. There's always the same problem - only 24 hours in a day :( Maybe this project will force me to finally find some time.

    @Nick: Thanks!

    @Lois: I know OpenXML, it's just that I'm working with the OmniXML since its conception. Still, the 'fluent xml' source is very independent from the underlying XML implementation and could be easily reimplemented above the OpenXML. You'll see today.

    ReplyDelete
  10. Code looks very nice indeed. We're using something similar at work, it definitely beats the clumsy beast that is the DOM!

    The unbalanced nature of Here/Back might prove to be troublesome to the readability, as it kinda breaks the flow of the code. IMO better have a distinct (full-blown) bookmarking syntax (Bookmark('bookmarkname'), etc.) and keep the child/up and node navigation strictly stack-like (and indentation friendly).

    Also given the number of interface temporaries involved, it would probably be worth it to introduce a class-based wrapper (esp. for native Delphi).
    A wrapper makes it easier to introduce helpers (for typed attribute assignments, etc.) without overloading the interface definition, and the bookmarking business can be taken out of the interface as well, thus reducing the interface implementation requirements to a bare minimum. The resulting codegen is also cleaner.

    ReplyDelete
  11. Using a class instead of interface would definitely help; I agree. I'll probably make this change.

    I don't understand the 'unbalanced nature of Here/Back' part. They are balanced indeed. Each Here must be paired with Back. I don't like the 'bookmark by name' idea too much (but that may just be me) - I think that then again you have to keep some information in mind which you don't really need.

    ReplyDelete
  12. Looking forward to read your part 2 ...
    Indeed, already part 1 looks good :)

    ReplyDelete
  13. Regarding Here/Back - I'm leaning more and more towards .Mark/.Return. Whaddayathink?

    ReplyDelete
  14. IMO Here/Back is ok. Mark/Return do not really embody the idea of a "sub level".
    Suggestion:
    "OpenContent/CloseContent" would comply with the idea of a container and also match with XML syntax paradigm (open/close a node/tag)

    ReplyDelete
  15. As a fan of fluent interfaces in Delphi, I like it a lot and will blog about it ASAP. What I don't like the navigational side (.back, .parent, .here), but I honestly don't know if things can be done differently. I'll think about it, and if you want to move this to direct email fell free to.

    ReplyDelete
  16. > I don't understand the 'unbalanced nature of Here/Back' part.

    Sorry, I meant in an "indenting" kind of way. It make it looks like the indenting isn't balanced, as indentation "jumps" back several indents at once. It's just cosmetic :)

    ReplyDelete
  17. .Up (ex .Parent) is necessary as there's no other way to climb up from a child. .Mark/.Return are a different story. They could usually be replaced with many .Up calls but the code is then ugly and less stable - if you add a new level (new subchild of the previously deepest child) you have to add another .Up to climb up.

    I was also considering this syntax:

    .AddChild('node1').Wrap(
    AddChild('child1')
    .AddSibling('child2')
    .AddSubchild('child2.1')
    )
    .AddSibling('node2')

    but there's a bit implementation problem :( Firstly, I would have to duplicate AddChild as a global function returning new builder inderface) and then I would have to implement .Wrap which would take one XML document (its parameter) and duplicate it in the current position. That would all work - except when you'd use .Anchor to store active node in a variable. During the node copying this reference would become invalid (i.e. it would point to an inactive and unused node). And OmniXML doesn't support moving IXMLNode entities from one DOM to another.

    ReplyDelete
  18. One can always use

    .Up
    .Up
    .Up

    if one cares about cosmetic that much :)

    ReplyDelete
  19. (In the previous comment, first .Up was indented 4 spaces, second 2 spaces and third not at all.)

    ReplyDelete
  20. Victor14:19

    For the given example, and other cases like it, where the structure of the XML to be generated is fixed, I would opt for another approach:
    - Create an XML template in a file or resource string, where all values that are variables look like "#baseName#", "#serviceName#", etc.
    - At run-time, use simple string replace commands to substitute the variables.

    If the XML structure is not quite fixed, because it is to receive repeating parts, you can still use this method. Just break up the XML template in 3 parts, for header, repeating body part and footer.

    In my opinion, this method is more readable and easier to maintain.

    ReplyDelete
  21. I was thinking of Mark/Return as well. They convey the intention better than Here/Back (as you are in fact marking a position in the hierarchy).

    I was also thinking: How hard would it be to trick the units generated by the existing XML binding wizard to use OmniXML?

    Can't wait for the next post.

    ReplyDelete
  22. @Victor: Of course, perfectly valid approach. What I don't like about it is that the (syntactica) correctness of generated XML is checked very late in the process. If I generate the XML programatically, I may get the semantics wrong but at least the syntax will always be correct.

    ReplyDelete
  23. @ajasja: I have no experience with the XML binding wizard so I cannot comment on that.

    ReplyDelete
  24. Ritsaert Hornstra08:00

    After thinking about it it seems to me this approach makes it more difficult to debug your code when things go wrong because you need to step into every call and you cannot use F8 wich will "do" the whole command in one whoop.

    ReplyDelete
  25. A valid point against this approach, I do agree.

    However, the idea behind fluent interfaces is that you don't have to debug the code because a) hidden code is well written and bug free (yeah, we all wish ;) ) and b) the intent and operation of the top-level code is obvious at the first sight.

    Still, for debugging purposes one could expand the interface with the .Breakpoint method that would only do

    function IGpFluentXmlBuilder.Breakpoint: IGpFluentXmlBuilder;
    begin
    asm int 3 end;
    Result := Self;
    end;

    When you'd execute this method, the debugger would pop up.

    ReplyDelete
  26. Does this work in Delphi 7?

    ReplyDelete
  27. I believe it should. I don't have D7 install to try it, though.

    ReplyDelete
  28. My D7 is choking on the "strict private" declaration.

    ReplyDelete
  29. Just change 'strict private' to 'private' and 'strict protected' to 'protected'.

    ReplyDelete
  30. That does it. Thanks!

    ReplyDelete
  31. If I recall correctly, one of the design principles of fluent interfaces is that method calls don't change the context. Therefore I'd argue that AddChild() should return the instance it was called on, not the instance it created and appended. This would also allow you to get rid of the whole navigational aspect; but it would require a bit of nesting of calls:

    XmlTree.AddChild(CreateChild('root_element')
    .AddChild('first_level_child_1')
    .AddChild(CreateChild('first_level_child_2').AddChild('second_level_child_1'))
    .AddChild('first_level_child_3'))

    etc...

    But I doubt that this would be an improvement over the with..do construct available in Delphi.

    ReplyDelete
  32. Actually, AddChild _does_ return the instance it was called on. It's just that it also changes the internal state of the XML builder.

    AddChild is just
    fxbActiveNode := AppendNode(ActiveNode, name);
    Result := Self;

    ReplyDelete
  33. Oh, I see it now. You're using a single master builder object with an internal DOM tree. AddChild/AddSibling/etc method names made it seem to me like they were returning individual DOM nodes, not the master object. Sorry about the confusion.

    But still, is the fluent interface much different from using with..do? It's been a couple of years since I last used Delphi, so I'm not sure about it.

    Actually, I discovered your blog while researching exactly how the fluent interface is better than while..do, which I remember using years ago, so I'd really appreciate the opinion from someone like you, who has experience with both. :)

    ReplyDelete
  34. For starters, I don't even know how I would recode this using 'with' statement. It would definitely be much uglier and harder to understand.

    ReplyDelete