Saturday, December 13, 2014

Attribute-based Command Line Parsing

Oh, command line parsing, the old enemy of mine!

imageIn ye olden days, when I was learning Pascal, I did some programming on VAX/VMS systems, where you could (and should) leave the job of command line parsing to the DCL – the command line interpreter (something like CMD.EXE on Windows). You just wrote a definition file (see example on the right, found on the web) and DCL did the rest.

Since those days I hated that I have to parse command line programmatically. Sure, there were parsing libraries – and I wrote few of them myself – but I was never happy with them. There was always too much code to write.

Then I was looking for good examples on using attributes in Delphi and suddenly it occurred to me that I could use them for command line parsing, too! And that’s how GpCommandLineParser was born.

Before continuing, I should mention that I have based it (concept-wise, implementation is all my own) on the CommandParser unit, which comes with Delphi and can be found in Samples\Object Pascal\Database\dbExpress\Utils. This is a very nice but mostly unknown command line parser which I can definitely recommend.

Hello, Command Line

Let’s start with a simple Hello, world program. A configurable Hello, world – you can pass the message to be written on the command line and you can also tell the program how many times to print out the message.

program hello;

{$APPTYPE CONSOLE}

{$R *.res}

uses
System.SysUtils,
GpCommandLineParser;

type
TCommandLine =
class
strict
private
FRepetitions:
integer;
FText: string
;
public
property Repetitions: integer read FRepetitions
write FRepetitions;
property Text: string read FText write
FText;
end
;

var
cl:
TCommandLine;
i:
integer;

begin
try

cl :=
TCommandLine.Create;
try
if not CommandLineParser.Parse(cl)
then
Writeln('Invalid command line'
)
else
for i := 1 to cl.Repetitions
do
Writeln(cl.Text);
finally FreeAndNil(cl); end
;

except
on E: Exception
do
Writeln(E.ClassName, ': ', E.Message
);
end
;
if DebugHook <> 0
then
Readln;
end
.

What can we see from this example?



  1. To parse command line parameters, you should declare a class with public properties. Each property represents one parameter. (There are no attributes in this example. I’ll introduce them later.)

  2. To parse the command line, you should create an instance of this class and pass it to CommandLineParser.Parse. Command line parameters will be copied to fields in the definition instance and you can access them via properties.

A simple test is, of course, in order.


image


As every Hello, world type program, it is too simplistic to be useful in real life. From the top of my head, I can list four improvements.



  1. Repetitions should default to 1 if command line parameter is missing.

  2. Text parameter should be required. Currently, you can run the program without parameters and nothing happens (because fields in the definition object are initialized to 0 and ‘’).
    image

  3. It would be nice if we could use shorter names (for example /rep instead of /repetitions).

  4. Error message (when invalid parameters are entered) is not very helpful. A supported syntax should be displayed.
    image

Enter Attributes


To change behaviour of the command line parser, you just add attributes to properties. The example below sets default for Repetitions to 1 (CLPDefault), provides the user with a way to shorten the repetitions parameter name (CLPLongName), makes Text required (CLPRequired) and adds descriptions (CLPDescription).

[CLPDefault('1'), CLPLongName('repetitions', 'rep'),
CLPDescription('Number of repetitions'
)]
property Repetitions: integer read FRepetitions
write FRepetitions;

[CLPRequired, CLPDescription('Message to display'
)]
property Text: string read FText write FText;

image


Descriptions are used in the Usage function which generates a human-readable description of command line parameters, returned as an array of string.

if not CommandLineParser.Parse(cl) then
for s in CommandLineParser.Usage
do
Writeln(s)
else
for i := 1 to cl.Repetitions
do
Writeln(cl.Text);

image


There’s also a way to provide Unix-style single letter parameters by adding the CLPName attribute.

[CLPDefault('1'), CLPName('r'), 
CLPLongName('repetitions', 'rep'),

CLPDescription('Number of repetitions'
)]
property Repetitions: integer read FRepetitions
write FRepetitions;

[CLPRequired, CLPName('t'),

CLPDescription('Message to display'
)]
property Text: string read FText write FText;

image


Parameters can be introduced by /, - and --. Values can be separated from parameter names by : or =. When a short (single letter) name is used, separator is not required (but providing a separator doesn’t raise a flag). Just like a honey badger, the parser doesn’t care – you can throw all of that at it and you can mix and match separators as you want.


File Name Handling


The command line parser also supports a concept of positional parameters – that is, parameters that are not provided in a switch form (-switch, –switch or /switch). [For example, if you type dcc32 hello.dpr /b in a command line, hello.dpr is a positional parameter and /b is a command switch.] Positional parameters are introduced by the CLPPosition attribute.

type
TCommandLine =
class
strict
private
FSourceFile: string
;
FDestFile: string
;
FEncrypt:
boolean;
public
[CLPPosition(1), CLPRequired, CLPDescription('Source file'
)]
property SourceFile: string read FSourceFile
write FSourceFile;

[CLPPosition(2), CLPRequired,

CLPDescription('Destination file'
)]
property DestFile: string read FDestFile write
FDestFile;

[CLPDescription('Encrypt file'
)]
property Encrypt: boolean read FEncrypt write
FEncrypt;
end;

The example above also shows support for boolean switches. Boolean switches are False by default and are set to True if user passes a switch name on the command line.

cl := TCommandLine.Create;
try
if not CommandLineParser.Parse(cl)
then
for s in CommandLineParser.Usage
do
Writeln(s)
else
Writeln('Copying ', cl.SourceFile, ' to ', cl.DestFile,

'. Encryption is ', COnOff[cl.Encrypt], '.'
);
finally FreeAndNil(cl); end;

BTW, integer, string, and boolean are the only supported types. If you want to accept floating point numbers, dates, times, or other more complex types, you should declare the parameter as string and do your own interpretation of the result in the code.


Let’s see copyfile in action.


image


As you can see, switches can be provided anywhere in the command line and in any order (ok, that’s not visible in the example above) while positional parameters must be always provided in the correct order (hence the name positional).


Overview – Attributes


Examples above demonstrate most of the GpCommandLineParser functionality but there are still some attributes which weren’t covered in them. This part of the article documents all supported attributes.


CLPNameAttribute
Specifies short (one letter) name for the switch.


CLPLongNameAttribute
Specifies long name for the switch. If not set, property name is used for long name.
A short form of the long name can also be provided which must match the beginning of the long form. In this case, parser will accept shortened versions of the long name, but no shorter than the short form.
Example: if 'longName' = 'autotest' and 'shortForm' = 'auto' then the parser will accept 'auto', 'autot', 'autote', 'autotes' and 'autotest', but not 'aut', 'au' and 'a'.


CLPDefaultAttribute
Specifies default value which will be used if switch is not found on the command line. This attribute always takes a string parameter and is supported only for string and integer properties.


CLPDescriptionAttribute
Provides switch description, used for the Usage function.


CLPRequiredAttribute
When present, specifies that the switch is required.


CLPPositionAttribute
Specifies position of a positional (unnamed) switch. First positional switch has position 1.


CLPPositionRestAttribute
Specifies property that will receive a #13-delimited list of all positional parameters for which switch definitions don't exist.


Overview – Parser


For parsing you can use either a global singleton (CommandLineParser) or you can create another instance of the parser (CreateCommandLineParser). For parsing command line from a background thread, a latter approach is recommended.

function CommandLineParser: IGpCommandLineParser;
function CreateCommandLineParser: IGpCommandLineParser;

A parser implements a very simple interface with two Parse functions (one parses a command line, another an arbitrary string), the Usage function which generates a description of the command line interface, and the ErrorInfo property which returns a detailed description in case of invalid command line parameters.

type 
TCLPErrorInfo = record
   IsError  :
boolean;
   Kind  :
TCLPErrorKind;
  Detailed  :
TCLPErrorDetailed;
   Position  :
integer;
  SwitchName: string
;
   Text  : string
;
  end
;

  IGpCommandLineParser = interface ['{C9B729D4-3706-46DB-A8A2-1E07E04F497B}']
function GetErrorInfo:
TCLPErrorInfo;
//
function Usage: TArray<string
>;
function Parse(commandData: TObject): boolean; overload
;
function Parse(const commandLine: string; commandData: TObject): boolean;
overload
;
property ErrorInfo: TCLPErrorInfo read
GetErrorInfo;
end;

Error Handling


There are two kinds of errors that can be returned from the Parse functions. Definition errors are returned as exceptions and parameter errors are returned as a False result. Both provide error information in the ErrorInfo property.


Definition errors are programmer errors. They occur when an invalid combination of attributes is applied to a definition class (for example specifying both CLPPositional and CLPLongName on the same property). Usually there will be no need to handle this error – if an exception occurs, you should fix the definition class and once it is correctly defined, the exception cannot occur anymore.


Parameter errors are user errors – for example when an required parameter is missing from the command line. They should be reported and handled in code.


The fragment below (taken from a demo program which comes with the GpCommandLineParser) demonstrates how to handle both kinds of errors.

try
parsed := CommandLineParser.Parse(inpCommandLine.Text,
cl);
except
on E: ECLPConfigurationError do
begin
lbLog.Items.Add('*** Configuration error ***'
);
lbLog.Items.Add(Format('%s, position = %d, name = %s'
,
[E.ErrorInfo.Text, E.ErrorInfo.Position,
E.ErrorInfo.SwitchName]));
Exit;
end
;
end
;

if not parsed then
begin
lbLog.Items.Add(Format('%s, position = %d, name = %s'
,
[CommandLineParser.ErrorInfo.Text,
CommandLineParser.ErrorInfo.Position,
CommandLineParser.ErrorInfo.SwitchName]));
lbLog.Items.Add(''
);
lbLog.Items.AddStrings(CommandLineParser.Usage);
end
else

Download


The unit can be downloaded from the Google Code archive. The simplest way is to check out the entire GpDelphiUnits repository. Alternatively, you can just download the GpCommandLineParser source.


The parser is distributed as a free software. The license is very simple: Free for personal and commercial use. No rights reserved.


The code, as is now, should be OS-agnostic. If you find any problems, related to parsing or to execution on any operating system supported by Delphi, please let me know.

4 comments:

  1. Thanks for the work, there is another command line parser that is configured with anon methods when you register command line switch.

    https://github.com/VSoftTechnologies/VSoft.CommandLineParser

    ReplyDelete
    Replies
    1. There are many more, I presume :) Thanks for pointing this one out, I didn't know about it.

      Delete
    2. Under XE4, the compiler frequently emit strange compilation errors when compiling VSoft.CommandLineParser, I guess it must has something to do with the generics language feature it uses. So next time I'll try gabr's implementation :)

      Delete
  2. Anonymous08:45

    A very neat piece of software for sure. Would it be possible to add some CLPIgnore attribute that skips a property? It would allow to do calculations inside the class based on the command line parameters

    ReplyDelete