Sunday, November 05, 2017

Writing a Simple DSL Compiler with Delphi [7. AST Compiler]

This article provides a description of an AST compiler used for my toy language project. If you are new to this series, I would recommend to start reading with this post. At least you should read the previous post, Intermezzo, as it explains some parts of the compiler that I won't touch here.

Please note that this article describes an initial implementation of the compiler. If you want to browse the code while reading the article, make sure that you have switched to branch dsl_v1.

In my toy compiler framework, a compiler (or codegen as it is called internally), is a piece of code that implements the ISimpleDSLCodegen interface. This interface exposes only one function, Generate, which takes an abstract syntax tree and converts it into an object implementing an ISimpleDSLProgram interface which allows you to call any function in a compiled program by name.

type
  TParameters = TArray;
  TFunctionCall = reference to function (const parameters: TParameters): integer;
  ISimpleDSLProgram = interface ['{2B93BEE7-EF20-41F4-B599-4C28131D6655}']
    function  Call(const functionName: string; const params: TParameters;       var return: integer): boolean;
   end;

  ISimpleDSLCodegen = interface ['{C359C174-E324-4709-86EF-EE61AFE3B1FD}']
    function Generate(const ast: ISimpleDSLAST;      
      var
runnable: ISimpleDSLProgram): boolean;
  end;


Default compiler is implemented by the TSimpleDSLCodegen class in unit SimpleDSLCompiler.Compiler. The methods in this class mostly deal with reading and understanding the AST while the actual code is created by methods in unit SimpleDSLCompiler.Compiler.Codegen.

This compiler creates a program which is an instance of the TSimpleDSLProgram class (also stored in SimpleDSLCompiler.Compiler).

The functioning of the compiler is very similar to the compiler presented in Intermezzo - with one critical difference. Expressions in my toy language can use function parameters as terms. Because of that, expression evaluator has to have access to the parameters of the current function.

The story starts in TSimpleDSLCodegen.Generate which for each function in the tree firstly compiles the function body (CompileBlock) and secondly generates the function wrapper for that body (CodegenFunction).

function TSimpleDSLCodegen.Generate(const ast: ISimpleDSLAST; var runnable:
  ISimpleDSLProgram): boolean;
var
  block      : TStatement;
  i          : integer;
  runnableInt: ISimpleDSLProgramEx;
begin
  Result := false; //to keep compiler happy
  FAST := ast;
  runnable := TSimpleDSLProgram.Create;
  runnableInt := runnable as ISimpleDSLProgramEx;
  for i := 0 to ast.Functions.Count - 1 do begin
    if not CompileBlock(ast.Functions[i].Body, block) then
      Exit;
    runnableInt.DeclareFunction(i, ast.Functions[i].Name,       CodegenFunction(block));
  end;
  Result := true;
end; 

Let's start with the latter as this function will give us more of a context (pun intended, as you'll soon see).

type
  PExecContext = ^TExecContext;
   TExecContext = record
    Functions: TArray;
  end;

  TParameters
= TArray;
function CodegenFunction(const block: TStatement): TFunction;
begin
  Result :=
    function (execContext: PExecContext; const params: TParameters): integer
    var
      context: TContext;
    begin
      context.Exec := execContext;
      context.Params := params;
      context.Result := 0;
      block(context);
      Result := context.Result;
    end;
end;

CodegenFunction creates  a main wrapper anonymous method for that function. This anonymous method will receive an execution context which will allow this function to call other functions. Anonymous method then builds function context which roughly corresponds to a stack frame produced by a "normal" compiler. This context stores a pointer to the execution context and a copy of parameters (values) passed to the function. Then it calls block(context) to execute the passed-in block.

Moving one level down ... Function TSimpleDSLCodegen.CompileBlock compiles each statement in the block by calling CompileStatement and then calls CodegenBlock to wrap compiled statements in a block.

function CodegenBlock(const statements: TStatements): TStatement;
begin
  Result :=
    procedure (var context: TContext)
    var
      stmt: TStatement;
    begin
      for stmt in statements do
        stmt(context);
    end;
end;

A (compiled) code that implements a block is again an anonymous method. This one receives the context (passed in by the anonymous method implementing the function) and simply passes the same context to all statements in the block.

This continues on and on. Most of the code is pretty dull and predictable. For example, this is the method which generates code for an if statement.

function CodegenIfStatement(const condition: TExpression; const thenBlock,
  elseBlock: TStatement): TStatement;
begin
  Result :=
    procedure (var context: TContext)
    begin
      if condition(context) <> 0 then
        thenBlock(context)
      else
        elseBlock(context);
    end;
end;

Things get interesting once we want to compile a term. A term can represent either an (integer) constant, a parameter (called variable in the codegen as in some future variables may get supported) or a function call.

function TSimpleDSLCodegen.CompileTerm(const astTerm: IASTTerm;   var codeTerm: TExpression): boolean;
var
  termConst   : IASTTermConstant;
  termFuncCall: IASTTermFunctionCall;
  termVar     : IASTTermVariable;
begin
  Result := true;
  if Supports(astTerm, IASTTermConstant, termConst) then
    codeTerm := CodegenConstant(termConst.Value)
  else if Supports(astTerm, IASTTermVariable, termVar) then
    codeTerm := CodegenVariable(termVar.VariableIdx)
  else if Supports(astTerm, IASTTermFunctionCall, termFuncCall) then
    Result := CompileFunctionCall(termFuncCall, codeTerm)
  else
    Result := SetError('*** Unexpected term');
end;

Compiling a constant is trivial. We just need a function returning this contant.

function CodegenConstant(value: integer): TExpression;
begin
  Result :=
    function (var context: TContext): integer
    begin
      Result := value;
    end;
end;

Accessing a parameter is just a bit harder. AST contains the index of this parameter and we just have to apply it to the Parameters part of the context.

function CodegenVariable(varIndex: integer): TExpression;
begin
  Result :=
    function (var context: TContext): integer
    begin
      Result := context.Params[varIndex];
    end;
end;

Generating a function call is a bit more complicated. It must set up array of parameters that will be passed to the function call, find the correct function via the execution context and then call that function.

function CodegenFunctionCall(funcIndex: integer;   const params: TFuncCallParams): TExpression;
begin
  Result :=
    function (var context: TContext): integer
    var
      funcParams: TParameters;
      iParam    : Integer;
    begin
      SetLength(funcParams, Length(params));
      for iParam := Low(params) to High(params) do
        funcParams[iParam] := params[iParam](context);
      Result := context.Exec.Functions[funcIndex](context.Exec, funcParams);
    end;
end;

In the end, compiler produces a big anonymous method which uses another anonymous methods internally and which, when called, returns a result.

For example, this minimal program ...

inc(i) { return i+1 }

... generates something like the following monstrosity. In reality the code is even weirder as it has to handle captured variables.

function (execContext: PExecContext; const params: TParameters): integer
var
  context: TContext;
begin
  context.Exec := execContext;
  context.Params := params;
  context.Result := 0;
 (procedure (var context: TContext)
  var
    stmt: TStatement;
  begin
    for stmt in [
                procedure (var context: TContext)
                begin
                  context.Result :=
                   (function (var context: TContext): integer
                    begin
                      Result :=
                       (function (var context: TContext): integer
                        begin
                          Result := context.Params[0];
                        end)(context)
                        +
                       (function (var context: TContext): integer
                        begin
                          Result := 1;
                        end)(context);
                    end)(context);
                end
                ]
    do
      stmt(context);
  end)(context);
  Result := context.Result;
end;

It certainly isn't appropriate for the faint of heart but - hey! - you don't have to look into the compiled code (unless you are debugging the compiler, of course).

You may wonder about the speed of such code. Not very fast, I must admit. I'll give you more specific numbers in the next instalment in this series which will describe an interpreter for this language.

No comments:

Post a Comment