Saturday, October 14, 2017

Writing a Simple DSL Compiler with Delphi [6. AST Dumper]

This article provides a description of a testing tool used for my toy language project. If you are new to this series, I would recommend to start reading with this post

Please note that this article describes an initial implementation of the parser. If you want to browse the code while reading the article, make sure that you have switched to branch dsl_v1.

Now that we have a working tokenizer and parser outputting an AST, we can start working on the compiler. Still, it would be great if we can verify whether the parser output (the AST) makes any sense. In other words, we need unit tests.

Writing unit tests for a tree structure is, however, a very tedious operation. In the next post I'll show a test for a tree with only five nodes and it will already be a process one would rather skip. Luckily, we can do something more fun - we can write a code that recreate the original program from an AST.

To do that, we'll write a code generator which will not generate a code, but a source. We can do this by simply provide a new codegen factory to the flexible compiler framework.

sl := TStringList.Create;
try
  compiler := CreateSimpleDSLCompiler;
  compiler.CodegenFactory :=
   
function: ISimpleDSLCodegen
   
begin
     
Result := CreateSimpleDSLCodegenDump(sl);
   
end;
  compiler.Compile(CMultiProcCode);
  Writeln(sl.Text);
finally FreeAndNil(sl); end;

The "codegen" will store the resulting program into string list sl, which we can pass to an anonymous method (CodegenFactory) due to a helpful variable capturing (yay! for the variable capturing!).

Our code generator - TSimpleDSLCodegenDump - must implement interface ISimpleDSLCodegen containing one function - Generate. This function takes an AST and returns something that can be executed - an ISimpleDSLProgram. As our code doesn't know how to create a runnable program, it will always return nil in that parameter.

type
  ISimpleDSLCodegen = interface ['{C359C174-E324-4709-86EF-EE61AFE3B1FD}']
    function Generate(const ast: ISimpleDSLAST; var runnable: ISimpleDSLProgram): boolean;
  end;

A useful data will be stored in the string list parameter that was passed to the constructor.

constructor TSimpleDSLCodegenDump.Create(dump: TStringList);
begin
  inherited Create;
  FDump := dump;
end;

The main "code generation" function stores the AST reference into a field so that we can access it from all methods and then recreates the source function by function by calling DumpFunction.

function TSimpleDSLCodegenDump.Generate(const ast: ISimpleDSLAST;
  var runnable: ISimpleDSLProgram): boolean;
var
  i: integer;
begin
  FErrors := false;
  FAST := ast;
  for i := 0 to ast.Functions.Count - 1 do begin
    if i > 0 then begin
      WritelnText;
      WritelnText;
    end;
    DumpFunction(ast.Functions[i]);
  end;
  FDump.Text := FText;
  Result := not FErrors;
end;

Program source is collected in a field FText: string. Two WritelnText helpers are used to append data to that field.

procedure TSimpleDSLCodegenDump.WritelnText(const s: string);
begin
  WriteText(s);
  WriteText(#13#10);
end;
procedure TSimpleDSLCodegenDump.WriteText(const s: string);
begin
  FText := FText + s;
end;

I will show just two parts of this "code generator" as it is quite dull. As a first example, this function writes out a source code for one function.

procedure TSimpleDSLCodegenDump.DumpFunction(const func: TASTFunction);
begin
  FCurrentFunc := func;
  WriteText(Format('%s(%s) ',                    [func.Name, ''.Join(',', func.ParamNames.ToArray)]));
  DumpBlock('', func.Body);
  FCurrentFunc := nil;
end;

A reference to the current function is stored away as we need it in a method that writes out a variable name (not shown here). Then we write out a function name, followed by a parameter list. A Join string helper concatenates the parameters and delimits them with commas. At the end we call DumpBlock to write out the function's body.

As a second example, method DumpExpression writes out an expression. It calls DumpTerm to write out the fist term, writes the operator (if there is any) and writes out the second term.

procedure TSimpleDSLCodegenDump.DumpExpression(const expr: TASTExpression);
begin
  DumpTerm(expr.Term1);
  case expr.BinaryOp of
    opNone:        Exit;
    opAdd:         WriteText(' + ');
    opSubtract:    WriteText(' - ');
    opCompareLess: WriteText(' < ');
    else begin
      WritelnText('*** Unexpected operator');
      FErrors := true;
    end;
  end;
  DumpTerm(expr.Term2);
end 

If we run this "code generator" on a sample program calculating i-th Fibonacci number ...

'fib(i) {                          '#13#10 +
'  if i < 3 {                      '#13#10 +
'    return 1                      '#13#10 +
'  } else {                        '#13#10 +
'    return fib(i-2) + fib(i-1)    '#13#10 +
'  }                               '#13#10 +
'}                                 ' 

... we'll see that the result is fairly close to the original.

fib(i) {
  if i < 3  {
    return 1
  }  else  {
    return fib(i - 2) + fib(i - 1)
  }
}

A spacing is a bit different, but that doesn't matter as my toy language ignores white space. This gives us a confidence that the AST is indeed correctly generated (at least in some cases). To be more convinced about a proper functioning of a parser, we would have to write more tests ... or we can simply play with the compiler framework, write more programs with it and check if they are functioning correctly. To do that, however, we need a compiler.

No comments:

Post a Comment