Please note that this article describes an initial implementation of the compiler. If you want to browse the code while reading the article, make sure that you have switched to branch dsl_v1.
In my toy compiler framework, a compiler (or codegen as it is called internally), is a piece of code that implements the ISimpleDSLCodegen interface. This interface exposes only one function, Generate, which takes an abstract syntax tree and converts it into an object implementing an ISimpleDSLProgram interface which allows you to call any function in a compiled program by name.
type
  TParameters 
= 
TArray; 
  
TFunctionCall 
= 
reference 
to 
function 
(const
parameters:
TParameters):
integer;
  
ISimpleDSLProgram 
= 
interface 
['{2B93BEE7-EF20-41F4-B599-4C28131D6655}']
    
function 
Call(const 
functionName: 
string; 
const 
params:
TParameters;
      var
return:
integer):
boolean;
  
end;
  
ISimpleDSLCodegen 
= 
interface 
['{C359C174-E324-4709-86EF-EE61AFE3B1FD}']
    
function 
Generate(const
ast:
ISimpleDSLAST;
      
var runnable: ISimpleDSLProgram): boolean;
var runnable: ISimpleDSLProgram): boolean;
  
end;
Default compiler is implemented by the TSimpleDSLCodegen class in unit SimpleDSLCompiler.Compiler. The methods in this class mostly deal with reading and understanding the AST while the actual code is created by methods in unit SimpleDSLCompiler.Compiler.Codegen.
This compiler creates a program which is an instance of the TSimpleDSLProgram class (also stored in SimpleDSLCompiler.Compiler).
The functioning of the compiler is very similar to the compiler presented in Intermezzo - with one critical difference. Expressions in my toy language can use function parameters as terms. Because of that, expression evaluator has to have access to the parameters of the current function.
The story starts in TSimpleDSLCodegen.Generate which for each function in the tree firstly compiles the function body (CompileBlock) and secondly generates the function wrapper for that body (CodegenFunction).
function
TSimpleDSLCodegen.Generate(const
ast:
ISimpleDSLAST;
var
runnable:
  
ISimpleDSLProgram): 
boolean;
var
  
block     
:
TStatement;
  
i         
:
integer;
  
runnableInt: 
ISimpleDSLProgramEx;
begin
  
Result 
:= 
false; 
//to keep compiler happy
  
FAST 
:= 
ast;
  
runnable 
:= 
TSimpleDSLProgram.Create;
  
runnableInt 
:= 
runnable 
as 
ISimpleDSLProgramEx;
  
for 
i 
:= 
0 
to 
ast.Functions.Count 
- 
1 
do 
begin
    
if 
not 
CompileBlock(ast.Functions[i].Body,
block)
then
      
Exit;
    
runnableInt.DeclareFunction(i,
ast.Functions[i].Name,
      CodegenFunction(block));
  
end;
  
Result 
:= 
true;
end; 
type
  
PExecContext 
= 
^TExecContext;
  
TExecContext 
= 
record
    
Functions: 
TArray; 
  
end;
TParameters = TArray
function
CodegenFunction(const
block:
TStatement):
TFunction;
begin
  
Result 
:=
    
function 
(execContext:
PExecContext;
const
params:
TParameters):
integer
    
var
      
context: 
TContext;
    
begin
      
context.Exec
:=
execContext;
      
context.Params
:=
params;
      
context.Result
:=
0;
      
block(context);
      
Result 
:= 
context.Result;
    
end;
end;
Moving one level down ... Function TSimpleDSLCodegen.CompileBlock compiles each statement in the block by calling CompileStatement and then calls CodegenBlock to wrap compiled statements in a block.
function
CodegenBlock(const
statements:
TStatements):
TStatement;
begin
  
Result 
:=
    
procedure 
(var
context:
TContext)
    
var
      
stmt: 
TStatement;
    
begin
      
for 
stmt 
in 
statements 
do
        
stmt(context);
    
end;
end;
This continues on and on. Most of the code is pretty dull and predictable. For example, this is the method which generates code for an if statement.
function
CodegenIfStatement(const
condition:
TExpression;
const
thenBlock,
  
elseBlock: 
TStatement): 
TStatement;
begin
  
Result 
:=
    
procedure 
(var
context:
TContext)
    
begin
      
if 
condition(context)
<>
0
then
        
thenBlock(context)
      
else
        
elseBlock(context);
    
end;
end;
Things get interesting once we want to compile a term. A term can represent either an (integer) constant, a parameter (called variable in the codegen as in some future variables may get supported) or a function call.
function
TSimpleDSLCodegen.CompileTerm(const
astTerm:
IASTTerm;
  var
codeTerm: TExpression): 
boolean;
var
  
termConst   
:
IASTTermConstant;
  
termFuncCall: 
IASTTermFunctionCall;
  
termVar     
:
IASTTermVariable;
begin
  
Result 
:= 
true;
  
if 
Supports(astTerm, 
IASTTermConstant, 
termConst) 
then
    
codeTerm 
:= 
CodegenConstant(termConst.Value)
  
else 
if 
Supports(astTerm, 
IASTTermVariable, 
termVar) 
then
    
codeTerm 
:= 
CodegenVariable(termVar.VariableIdx)
  
else 
if 
Supports(astTerm, 
IASTTermFunctionCall, 
termFuncCall)
then
    
Result 
:= 
CompileFunctionCall(termFuncCall,
codeTerm)
  
else
    
Result 
:= 
SetError('*** 
Unexpected term');
end;
function
CodegenConstant(value:
integer):
TExpression;
begin
  
Result 
:=
    
function 
(var
context:
TContext):
integer
    
begin
      
Result 
:= 
value;
    
end;
end;
function
CodegenVariable(varIndex:
integer):
TExpression;
begin
  
Result 
:=
    
function 
(var
context:
TContext):
integer
    
begin
      
Result 
:= 
context.Params[varIndex];
    
end;
end;
function
CodegenFunctionCall(funcIndex:
integer;
  const
params:
TFuncCallParams): TExpression;
begin
  
Result 
:=
    
function 
(var
context:
TContext):
integer
    
var
      
funcParams:
TParameters;
      
iParam   
: 
Integer;
    
begin
      
SetLength(funcParams,
Length(params));
      
for 
iParam 
:= 
Low(params)
to
High(params)
do
        
funcParams[iParam]
:=
params[iParam](context);
      
Result 
:= 
context.Exec.Functions[funcIndex](context.Exec,
funcParams);
    
end;
end;
For example, this minimal program ...
inc(i) { return i+1 }
... generates something like the following monstrosity. In reality the code is even weirder as it has to handle captured variables.
function
(execContext:
PExecContext;
const
params:
TParameters):
integer
var
  
context: 
TContext;
begin
  
context.Exec 
:= 
execContext;
  
context.Params 
:= 
params;
  
context.Result 
:= 
0;
 (procedure 
(var
context:
TContext)
  
var
    
stmt: 
TStatement;
  
begin
    
for 
stmt 
in 
[
               
procedure
(var
context:
TContext)
               
begin
                 
context.Result
:=  
                  
(function
(var
context:
TContext):
integer
                   
begin
                     
Result
:=  
                      
(function
(var
context:
TContext):
integer
                       
begin
                         
Result
:=
context.Params[0];
                       
end)(context)  
                       
+  
                      
(function
(var
context:
TContext):
integer
                       
begin
                         
Result
:=
1;
                       
end)(context);
                   
end)(context);
               
end
               
] 
    
do
      
stmt(context);
  
end)(context);
  
Result 
:= 
context.Result;
end;
It certainly isn't appropriate for the faint of heart but - hey! - you don't have to look into the compiled code (unless you are debugging the compiler, of course).
You may wonder about the speed of such code. Not very fast, I must admit. I'll give you more specific numbers in the next instalment in this series which will describe an interpreter for this language.
No comments:
Post a Comment