The Delphi Geek: Writing a Simple DSL Compiler with Delphi [7. AST Compiler]

This article provides a description of an AST compiler used for my toy language project. If you are new to this series, I would recommend to start reading with this post. At least you should read the previous post, Intermezzo, as it explains some parts of the compiler that I won't touch here.

Please note that this article describes an initial implementation of the compiler. If you want to browse the code while reading the article, make sure that you have switched to branch dsl_v1.

In my toy compiler framework, a compiler (or codegen as it is called internally), is a piece of code that implements the ISimpleDSLCodegen interface. This interface exposes only one function, Generate, which takes an abstract syntax tree and converts it into an object implementing an ISimpleDSLProgram interface which allows you to call any function in a compiled program by name.

type

TParameters = TArray;

TFunctionCall = reference to function (const parameters: TParameters): integer;

ISimpleDSLProgram = interface ['{2B93BEE7-EF20-41F4-B599-4C28131D6655}']

function Call(const functionName: string; const params: TParameters; var return: integer): boolean;

end;

ISimpleDSLCodegen = interface ['{C359C174-E324-4709-86EF-EE61AFE3B1FD}']

function Generate(const ast: ISimpleDSLAST;
var runnable: ISimpleDSLProgram): boolean;

end;

Default compiler is implemented by the TSimpleDSLCodegen class in unit SimpleDSLCompiler.Compiler. The methods in this class mostly deal with reading and understanding the AST while the actual code is created by methods in unit SimpleDSLCompiler.Compiler.Codegen.

This compiler creates a program which is an instance of the TSimpleDSLProgram class (also stored in SimpleDSLCompiler.Compiler).

The functioning of the compiler is very similar to the compiler presented in Intermezzo - with one critical difference. Expressions in my toy language can use function parameters as terms. Because of that, expression evaluator has to have access to the parameters of the current function.

The story starts in TSimpleDSLCodegen.Generate which for each function in the tree firstly compiles the function body (CompileBlock) and secondly generates the function wrapper for that body (CodegenFunction).

function TSimpleDSLCodegen.Generate(const ast: ISimpleDSLAST; var runnable:

ISimpleDSLProgram): boolean;

var

block : TStatement;

i : integer;

runnableInt: ISimpleDSLProgramEx;

begin

Result := false; //to keep compiler happy

FAST := ast;

runnable := TSimpleDSLProgram.Create;

runnableInt := runnable as ISimpleDSLProgramEx;

for i := 0 to ast.Functions.Count - 1 do begin

if not CompileBlock(ast.Functions[i].Body, block) then

Exit;

runnableInt.DeclareFunction(i, ast.Functions[i].Name, CodegenFunction(block));

end;

Result := true;

end;

Let's start with the latter as this function will give us more of a context (pun intended, as you'll soon see).

type

PExecContext = ^TExecContext;

TExecContext = record

Functions: TArray;

end;

TParameters = TArray;

function CodegenFunction(const block: TStatement): TFunction;

begin

Result :=

function (execContext: PExecContext; const params: TParameters): integer

var

context: TContext;

begin

context.Exec := execContext;

context.Params := params;

context.Result := 0;

block(context);

Result := context.Result;

end;

CodegenFunction creates a main wrapper anonymous method for that function. This anonymous method will receive an execution context which will allow this function to call other functions. Anonymous method then builds function context which roughly corresponds to a stack frame produced by a "normal" compiler. This context stores a pointer to the execution context and a copy of parameters (values) passed to the function. Then it calls block(context) to execute the passed-in block.

Moving one level down ... Function TSimpleDSLCodegen.CompileBlock compiles each statement in the block by calling CompileStatement and then calls CodegenBlock to wrap compiled statements in a block.

function CodegenBlock(const statements: TStatements): TStatement;

begin

Result :=

procedure (var context: TContext)

var

stmt: TStatement;

begin

for stmt in statements do

stmt(context);

end;

A (compiled) code that implements a block is again an anonymous method. This one receives the context (passed in by the anonymous method implementing the function) and simply passes the same context to all statements in the block.

This continues on and on. Most of the code is pretty dull and predictable. For example, this is the method which generates code for an if statement.

function CodegenIfStatement(const condition: TExpression; const thenBlock,

elseBlock: TStatement): TStatement;

begin

Result :=

procedure (var context: TContext)

begin

if condition(context) <> 0 then

thenBlock(context)

else

elseBlock(context);

end;

Things get interesting once we want to compile a term. A term can represent either an (integer) constant, a parameter (called variable in the codegen as in some future variables may get supported) or a function call.

function TSimpleDSLCodegen.CompileTerm(const astTerm: IASTTerm; var codeTerm: TExpression): boolean;

var

termConst : IASTTermConstant;

termFuncCall: IASTTermFunctionCall;

termVar : IASTTermVariable;

begin

Result := true;

if Supports(astTerm, IASTTermConstant, termConst) then

codeTerm := CodegenConstant(termConst.Value)

else if Supports(astTerm, IASTTermVariable, termVar) then

codeTerm := CodegenVariable(termVar.VariableIdx)

else if Supports(astTerm, IASTTermFunctionCall, termFuncCall) then

Result := CompileFunctionCall(termFuncCall, codeTerm)

else

Result := SetError('*** Unexpected term');

end;

Compiling a constant is trivial. We just need a function returning this contant.

function CodegenConstant(value: integer): TExpression;

begin

Result :=

function (var context: TContext): integer

begin

Result := value;

end;

Accessing a parameter is just a bit harder. AST contains the index of this parameter and we just have to apply it to the Parameters part of the context.

function CodegenVariable(varIndex: integer): TExpression;

begin

Result :=

function (var context: TContext): integer

begin

Result := context.Params[varIndex];

end;

Generating a function call is a bit more complicated. It must set up array of parameters that will be passed to the function call, find the correct function via the execution context and then call that function.

function CodegenFunctionCall(funcIndex: integer; const params: TFuncCallParams): TExpression;

begin

Result :=

function (var context: TContext): integer

var

funcParams: TParameters;

iParam : Integer;

begin

SetLength(funcParams, Length(params));

for iParam := Low(params) to High(params) do

funcParams[iParam] := params[iParam](context);

Result := context.Exec.Functions[funcIndex](context.Exec, funcParams);

end;

In the end, compiler produces a big anonymous method which uses another anonymous methods internally and which, when called, returns a result.

For example, this minimal program ...

inc(i) { return i+1 }

... generates something like the following monstrosity. In reality the code is even weirder as it has to handle captured variables.

function (execContext: PExecContext; const params: TParameters): integer

var

context: TContext;

begin

context.Exec := execContext;

context.Params := params;

context.Result := 0;

(procedure (var context: TContext)

var

stmt: TStatement;

begin

for stmt in [

procedure (var context: TContext)

begin

context.Result :=

(function (var context: TContext): integer

begin

Result :=

(function (var context: TContext): integer

begin

Result := context.Params[0];

end)(context)

(function (var context: TContext): integer

begin

Result := 1;

end)(context);

end

]

stmt(context);

end)(context);

Result := context.Result;

end;

It certainly isn't appropriate for the faint of heart but - hey! - you don't have to look into the compiled code (unless you are debugging the compiler, of course).

You may wonder about the speed of such code. Not very fast, I must admit. I'll give you more specific numbers in the next instalment in this series which will describe an interpreter for this language.

The Delphi Geek

Sunday, November 05, 2017

Writing a Simple DSL Compiler with Delphi [7. AST Compiler]

No comments:

Post a Comment