The result of all that is a program introduction.dpr, a self-contained console program which contains a complete (almost trivial) language together with the full documentation, written in a Literate Programming style. Simply put - you can read it from top to bottom like a story.
As an intermezzo and to simplify my explanation of the compiler, I'm posting the whole program here, reformatted as a blog post.
This program presents a gentle introduction into the 'compiler-compiler' topic. It is written in a Literate Programming manner and is intended to be read as a story from top to bottom.
program
introduction;
{$APPTYPE CONSOLE}
{$R *.res}
uses
  
System.SysUtils,
  
System.Classes,
  
System.Character,
  
System.Generics.Collections;
number1 + number2 + ... + numberN
All numbers are non-negative integers, the only operator is addition, overflows are ignored.
Formally, we can describe our programs with the following grammar.
S → Term
Term → number
Term → Term '+' Term
White space will be ignored by the parser and is therefore not part of the grammar.
We will start with a very simple AST that will store the parsed version of the program.
type
  
TTerm 
= 
class 
abstract
  
end;
  
TAST 
= 
TTerm;
A constant, as it could be expected, contains an integer value.
We are not consistent here - the language only allows positive integers but AST is more open and allows negative integers. We'll just ignore that.
  TConstant 
= 
class(TTerm)
  strict private
    
FValue: 
integer;
  
public
    
constructor
Create(AValue:
integer);
    
property 
Value:
integer
read
FValue
write
FValue;
  
end;An addition is a binary operation which operates on two terms (left and right side).
  TAddition 
= 
class(TTerm)
  strict private
    
FTerm1: 
TTerm;
    
FTerm2: 
TTerm;
  
public
    
constructor
Create(ATerm1,
ATerm2:
TTerm);
    
destructor 
Destroy; 
override;
    
property 
Term1:
TTerm
read
FTerm1
write
FTerm1;
    
property 
Term2:
TTerm
read
FTerm2
write
FTerm2;
  
end;
constructor
TConstant.Create(AValue:
integer);
begin
  
inherited 
Create;
  
FValue 
:= 
AValue;
end;
constructor
TAddition.Create(ATerm1,
ATerm2:
TTerm);
begin
  
inherited 
Create;
  
FTerm1 
:= 
ATerm1;
  
FTerm2 
:= 
ATerm2;
end;A TAddition object takes ownership of its children.
destructor
TAddition.Destroy;
begin
  
FreeAndNil(FTerm1);
  
FreeAndNil(FTerm2);
  
inherited;
end;
function
CreateAST(const
values:
TArray): 
TAST;
var
  
iValue: 
integer;
begin
  
if 
Length(values) 
= 
0 
then
    
Exit(nil);
  Result 
:= 
TConstant.Create(values[High(values)]);
  
for 
iValue 
:= 
High(values) 
- 
1 
downto 
Low(values) 
do
    
Result 
:= 
TAddition.Create(TConstant.Create(values[iValue]),
Result);
end;
TAddition
Term1 = TConstant
Value = 1
Term2 = TAddition
Term1 = TConstant
Value = 2
Term2 = TConstant
Value = 3
Let's make this into a test.
First, some helpers which test and cast at the same time.
function
IsConstant(term:
TTerm;
out
add:
TConstant):
boolean;
begin
  
Result 
:= 
term 
is 
TConstant;
  
if 
Result 
then
    
add 
:= 
TConstant(term);
end;
function
IsAddition(term:
TTerm;
out
add:
TAddition):
boolean;
begin
  
Result 
:= 
term 
is 
TAddition;
  
if 
Result 
then
    
add 
:= 
TAddition(term);
end;
procedure
TestCreateAST;
var
  
add1  
:
TAddition;
  
add2  
:
TAddition;
  
ast   
:
TAST;
  
const1: 
TConstant;
  
const2: 
TConstant;
  
const3: 
TConstant;
begin
  
ast 
:= 
CreateAST([1, 
2, 
3]);
  
try
    
if 
assigned(ast)
       
and 
IsAddition(ast,
add1)
       
and 
IsConstant(add1.Term1,
const1)
and
(const1.Value
=
1)
       
and 
IsAddition(add1.Term2,
add2)
       
and 
IsConstant(add2.Term1,
const2)
and
(const2.Value
=
2)
       
and 
IsConstant(add2.Term2,
const3)
and
(const3.Value
=
3)
    
then
      
// everything is fine
    
else
      
raise 
Exception.Create('CreateAST 
is not working correctly!');
  
finally 
FreeAndNil(ast); 
end;
end;
Our 'language' has only two tokens: a 'number' and an 'addition'. Whitespace is not important and will be ignored in the tokenizer (lexer). All unrecognized characters are returned as a token 'unknown'.
type
  
TTokenKind 
= 
(tkNumber, 
tkAddition, 
tkUnknown);
tkNumber accepts a \d+
tkAddition accepts \+
\s+ is skipped
tkUnknown accepts anything else: [^\d\+\s]
Tokenizer and parser only need the following information:
- Input string.
 - Current position.
 
A `TStringStream` class wraps all that so we'll just reuse it.
  TParserState 
= 
TStringStream;
This implementation is very simple but also extremely unoptimized.
function
GetToken(state:
TParserState;
var
token:
TTokenKind;
  var
value:
string):
boolean;
var
  
nextChar: 
string;
  
position: 
int64;
begin
  
repeat
    
nextChar 
:= 
state.ReadString(1);
    
Result 
:= 
(nextChar
<>
'');
    
// Ignore whitespace
  
until 
(not 
Result) 
or 
(not 
nextChar[1].IsWhiteSpace);
  
if 
Result 
then 
begin
    
value 
:= 
nextChar[1];
    
// Addition
    
if 
value 
= 
'+' 
then
      
token 
:= 
tkAddition
    
// Number
    
else 
if 
value[1].IsNumber
then
begin
      
token 
:= 
tkNumber;
      
repeat
        
position 
:= 
state.Position;
        
nextChar 
:= 
state.ReadString(1);
        
// End of stream, stop
        
if 
nextChar 
= 
'' 
then
         
break 
//repeat
        
// Another number, append
        
else 
if 
nextChar[1].IsNumber
then
         
value 
:= 
value 
+ 
nextChar[1]
        
// Read too far, retract
        
else 
begin
         
state.Position
:=
position;
         
break; 
//repeat
        
end;
      
until 
false;
    
end
    
// Unexpected input
    
else
      
token 
:= 
tkUnknown;
  
end;
end;
ExpectFail(state) calls GetToken and expects it to return False
procedure
ExpectFail(state:
TParserState);
var
  
token: 
TTokenKind;
  
value: 
string;
begin
  
if 
GetToken(state, 
token, 
value) 
then
    
raise 
Exception.Create('ExpectFail 
failed');
end;
Expect(State, token, value) calls GetNextToken and expects it to return True and the same token/value as passed in the parameters.
procedure
Expect(state:
TParserState;
expectedToken:
TTokenKind;
  expectedValue:
string);
var
  
token: 
TTokenKind;
  
value: 
string;
begin
  
if 
not 
GetToken(state, 
token, 
value) 
then
    
raise 
Exception.Create('Expect 
failed')
  
else 
if 
token 
<> 
expectedToken 
then
    
raise 
Exception.CreateFmt(            'Expect 
encountered invalid token kind (%d, expected %d)',
            [Ord(token),
Ord(expectedToken)])
  
else 
if 
value 
<> 
expectedValue 
then
    
raise 
Exception.CreateFmt(            'Expect 
encountered invalid value (%s, expected %s)',
            [value,
expectedValue])
end;
procedure
TestGetToken;
var
  
state: 
TParserState;
begin
  
state 
:= 
TParserState.Create('');
  
ExpectFail(state);
  
FreeAndNil(state);
  
state 
:= 
TParserState.Create('1');
  
Expect(state, 
tkNumber, 
'1');
  
ExpectFail(state);
  
FreeAndNil(state);
  
state 
:= 
TParserState.Create('1+22 333 Ab');
  
Expect(state, 
tkNumber, 
'1');
  
Expect(state, 
tkAddition, 
'+');
  
Expect(state, 
tkNumber, 
'22');
  
Expect(state, 
tkNumber, 
'333');
  
Expect(state, 
tkUnknown, 
'A');
  
Expect(state, 
tkUnknown, 
'b');
  
ExpectFail(state);
  
FreeAndNil(state);
end;
If a program is valid, it will create an AST for the program, return it in the `ast` parameter, and set result to True.
If a program is not valid, `ast` will be nil and result will be False.
Empty input is not accepted.
function
Parse(const
prog:
string;
var
ast:
TAST):
boolean;
var
  
accept 
: 
TTokenKind;
  
numbers: 
TList; 
  
state  
:
TParserState;
  
token  
:
TTokenKind;
  
value  
:
string;
beginWe can easily see that the above grammar generates exactly the following sequence of tokens:
tkNumber (tkAddition tkNumber)*
(The proof is left out as an excercise for the reader.)
The code will check the syntax and extract all numbers in an TArray
At the end it will pass this array to the CreateAST function to create the AST.
  
ast 
:= 
nil;
  
Result 
:= 
false;
  
state 
:= 
TParserState.Create(prog);
  
try
    
numbers 
:= 
TList.Create; 
    
try
      
accept 
:= 
tkNumber;
      
while 
GetToken(state,
token,
value)
do
begin
        
if 
token 
<> 
accept
then
         
Exit;
        
if 
accept 
= 
tkNumber
then
begin
         
numbers.Add(StrToInt(value));
         
accept 
:= 
tkAddition;
        
end
        
else
         
accept 
:= 
tkNumber;
      
end;
      
if 
accept 
= 
tkNumber
then
        
// Last token in the program was 
tkAddition, which is not allowed.
        
Exit;
      
if 
numbers.Count
>
0
then
begin
        
ast 
:= 
CreateAST(numbers.ToArray);
        
Result 
:= 
true;
      
end;
    
finally 
FreeAndNil(numbers);
end;
  
finally 
FreeAndNil(state); 
end;
end;
procedure
TestParse;
var
  
add1  
:
TAddition;
  
add2  
:
TAddition;
  
ast   
:
TAST;
  
const1: 
TConstant;
  
const2: 
TConstant;
  
const3: 
TConstant;
begin
  
if 
not 
Parse('1+2 + 3', 
ast) 
then
    
raise 
Exception.Create('Parser 
failed');
  
try
    
if 
assigned(ast)
       
and 
IsAddition(ast,
add1)
       
and 
IsConstant(add1.Term1,
const1)
and
(const1.Value
=
1)
       
and 
IsAddition(add1.Term2,
add2)
       
and 
IsConstant(add2.Term1,
const2)
and
(const2.Value
=
2)
       
and 
IsConstant(add2.Term2,
const3)
and
(const3.Value
=
3)
    
then
      
// everything is fine
    
else
      
raise 
Exception.Create('CreateAST 
is not working correctly!');
  
finally 
FreeAndNil(ast); 
end;
  
if 
Parse('1+2 +', 
ast) 
then 
begin
    
if 
assigned(ast)
then
      
raise 
Exception.Create('Invalid 
program resulted in an AST!)')
    
else
      
raise 
Exception.Create('Invalid 
program compiled into an empty AST!');
  
end;
end;
function
InterpretAST(ast:
TAST):
integer;
var
  
add1  
:
TAddition;
  
const1: 
TConstant;
begin
  
if 
not 
assigned(ast) 
then
    
raise 
Exception.Create('Result 
is undefined!');
  
// Alternatively, we could use Nullable as result, 
// with Nullable.Null as a default value.
// with Nullable.Null as a
  
if 
IsConstant(ast, 
const1) 
then
    
Result 
:= 
const1.Value
  
else 
if 
IsAddition(ast, 
add1) 
then
    
Result 
:= 
InterpretAST(add1.Term1)
+
InterpretAST(add1.Term2)
  
else
    
raise 
Exception.Create('Internal 
error. Unknown AST element: ' 
+      ast.ClassName);
end;
procedure
TestInterpretAST;
  
procedure 
Test(const 
testName: 
string; 
const 
values: 
TArray;     expectedResult:
integer);
  
var
    
ast      
: 
TAST;
    
calcResult:
integer;
  
begin
    
ast 
:= 
CreateAST(values);
    
if 
not 
assigned(ast)
then
      
raise 
Exception.CreateFmt('Compilation 
failed in test %s', 
[testName]);
    
try
      
calcResult 
:= 
InterpretAST(ast);
      
if 
calcResult 
<>
expectedResult
then
        
raise 
Exception.CreateFmt(
               
'Evaluation failed in 
test %s. ' +
                'Calculated result %d <> expected result %d',
               
[testName,
calcResult,
expectedResult]);
    
finally
      
FreeAndNil(ast);
    
end;
  
end;
begin
  
Test('1', 
[42], 
42);
  
Test('2', 
[1, 
2, 
3], 
6);
  
Test('3', 
[2, 
-2, 
3, 
-3], 
0);
end;
- Change each 'constant' node into an anonymous function that returns the value of that node.
 - Change each 'summation' node into an anonymous function that returns the sum of two parameters.
 - The first is an anonymous function which calculates the value of the left term and
 - the second is an anonymous function which calculates the value of the right term.
 
function
MakeConstant(value:
integer):
TFunc; 
begin
  
Result 
:=
    
function: 
integer
    
begin
      
Result 
:= 
value;
    
end;
end;
function
MakeAddition(const
term1,
term2:
TFunc): 
TFunc; 
begin
  
Result 
:=
    
function: 
integer
    
begin
      
Result 
:= 
term1()
+
term2();
    
end;
end;The important point here is that neither MakeConstant nor MakeAddition does any calculation. They merely set up an anonymous method and return a reference to it, which is more or less the same as creating an object and returning an interface to it, but with the added value of variable capturing.
BTW, as our "language" just calculates integer expressions that always return an integer, a 'function returning an integer' or TFunc
To 'compile' an AST we have to use recursion as we need to create a child-calculating anonymous functions before we can use them (as a parameter) to create an anonymous function calculating the parent node.
function
CompileAST(ast:
TTerm):
TFunc; 
var
  
add1: 
TAddition;
  
const1: 
TConstant;
begin
  
if 
IsConstant(ast, 
const1) 
then
    
// this node represents a 
constant
    
Result 
:= 
MakeConstant(const1.Value)
  
else 
if 
IsAddition(ast, 
add1) 
then
    
// this node represent an 
expression
    
Result 
:= 
MakeAddition(CompileAST(add1.Term1),
CompileAST(add1.Term2))
  
else
    
raise 
Exception.Create('Internal 
error. Unknown AST element: ' 
+      ast.ClassName);
end;
(*
function:
integer
begin
  Result 
:=
    (function: 
integer
     begin
       Result 
:= 
1;
     end)()
    +
    (function: 
integer
     begin
       Result 
:=
         (function: 
integer
          begin
            Result
:=
2;
          end)()
         +
         (function: 
integer
          begin
            Result
:=
3;
          end)();
     end)();
end;
*)
It is hard to verify if generated anonymous function is in correct form, but we can execute it for some number of test cases and hope that everything is ok ;)
procedure
TestCompileAST;
  
procedure 
Test(const 
testName: 
string; 
const 
prog: 
string; 
expectedResult: 
integer);
  
var
    
add1     
: 
TAddition;
    
ast      
: 
TAST;
    
calcResult:
integer;
    
code     
: 
TFunc; 
    
const1   
: 
TConstant;
  
begin
    
if 
not 
(Parse(prog,
ast)
and
assigned(ast))
then
      
raise 
Exception.CreateFmt('Parser 
failed in test %s', 
[testName]);
    
try
      
code 
:= 
CompileAST(ast);
      
if 
not 
assigned(code)
then
        
raise 
Exception.CreateFmt('Compilation 
failed in test %s', 
[testName]);
Changing AST now should not affect the compiled code.
      
if 
(IsAddition(ast,
add1)
and
IsConstant(add1.Term1,
const1))
        
or 
IsConstant(ast,
const1)
      
then
        
const1.Value
:=
const1.Value
+
1
      
else
        
raise 
Exception.CreateFmt('Unexpected 
AST format in test %s',         [testName]);
      
calcResult 
:= 
code();
//execute the compiled code
      
if 
calcResult 
<>
expectedResult
then
        
raise 
Exception.CreateFmt(
               
'Evaluation failed in 
test %s. ' +
'Codegen result %d <> expected result %d',
'Codegen result %d <> expected result %d',
               
[testName,
calcResult,
expectedResult]);
    
finally
      
FreeAndNil(ast);
    
end;
  
end;
begin
  
Test('1', 
'42', 
42);
  
Test('2', 
'1 + 2 + 3', 
6);
  
Test('3', 
'2 + 2 +3+3', 
10);
end;
procedure
RunREPL;
var
  
ast 
: 
TAST;
  
prog: 
string;
begin
  
repeat
    
Write('Enter an expression 
(empty line exits): ');
    
Readln(prog);
    
if 
prog 
= 
'' 
then
      
break;
    
if 
not 
Parse(prog,
ast)
then
      
Writeln('Syntax is not valid')
    
else
      
Writeln('Result is: ',
CompileAST(ast)());
  
until 
false;
end;
begin
  
tryRun all unit tests to verify program correctness.
    
Writeln('Running AST creation 
tests ...');
    
TestCreateAST;
    
Writeln('Running tokenizer tests 
...');
    
TestGetToken;
    
Writeln('Running parser test 
...');
    
TestParse;
    
Writeln('Running AST interpreter 
tests ...');
    
TestInterpretAST;
    
Writeln('Running AST compilation 
tests ...');
    
TestCompileAST;
    
RunREPL;
  
except
    
on 
E: 
Exception 
do 
begin
      
Writeln(E.ClassName,
': ',
E.Message);
      
Readln;
    
end;
  
end;
end.
I have changed a little to run in XE.
ReplyDeleteIt's simple but great.
Good job.