Friday, April 03, 2015

Runtime SQL Query Builder

I don’t want to use long SQL strings in the code.

Really.

Firstly, it is a pain to write long multiline strings in Object Pascal. (Embarcadero, are you listening? Can we please get multiline strings in Delphi? Please?)

Secondly, I’m very OCD when it comes to compiler watching my every step. I absolutely hate runtime errors and I want every problem to be detected during the compilation. That includes typos in database field names and changes attributed to the entropy (i.e. changes in the database structure).

On a tangent, that’s why we are creating and managing tables and databases in the code. We have all fields defined as constants, all SQL statements created in runtime and everything stored under version control.Upgrading a database to a version required by the program is therefore a very simple operation. (Downgrading is still a problem. Downgrading is hard.)

Thirdly, concatenating query strings together when you are dynamically building a query string based on the conditions in the program is a mess.

That’s why long years ago I created a simple runtime query builder. It was dormant (in the “good enough for me” state) for a long time but recently I required more functionality from it and I extended it further. It is now in a state which may be useful for somebody else but me so I’m releasing it as an open source under a BSD license.

For impatient: Get GpSQLBuilder on GitHub.

The query builder is implemented as an interface (IGpSQLBuilder). You call the global CreateGpSQLBuilder function to create a new instance. Then you call various SQL-like functions on that interface to set up the query. At the end you call the AsString method to convert the query into a string.

A simple, self-contained example:

const
DBField_Deleted = 'dbi_deleted';
DBField_ID = 'dbi_id';
DBT_SBF = 'sub_files';
var
query: IGpSQLBuilder;
begin
query := CreateGpSQLBuilder
.Select(DBField_ID).From(DBT_SBF)
.Where([DBField_ID, '>= 0'])
.AndE([DBField_Deleted, '= 0']);
ShowMessage(query.AsString);
end
;

image


The query builder is smart enough to put spaces and parenthesis where they are required and to convert numbers into strings if you provide the data as an open array (see Where and AndE above).



BTW, AndE means “and expression” and is used because I don’t like the &And approach too much. &And is, however, also implemented as an alias if you prefer to write the code that way.


A more complicated example below (taken straight from my code) shows that you don’t have to set up the complete query in one call and that you can return to any section at the later time. (For example, a Where is partially set first, then OrderBy and then Where is updated if some condition is met.)

query := CreateGpSQLBuilder.Select;
if FMaxSelect > 0 then
query.First(FMaxSelect).Skip(0);
query.From(DBT_PLY);
if not (qoPlaylistOnly in Options) then begin
query
.LeftJoin(DBT_SBF)
.&On([DBSBF_PLY_ID,
'=', DBPLY_DBField_ID])
.LeftJoin(DBT_STL)
.&On([DBSBF_STL_ID,
'=', DBSTL_DBField_ID]);
end;
query
.Where([DBPLY_DBField_VerID, '>= 0'])
.OrderBy(DBPLY_DATETIME);
if FCutOldest then
query.Desc;
query.OrderBy
.Column(DBPLY_CHN_ID)
.Column(DBT_PLY, CDBField_ID);
if not (qoSearchInDeleted in Options) then
query.Where
.&And([DBPLY_DBField_Deleted, '= 0'
]);

Another real-life example:

query := CreateGpSQLBuilder
.Select(DBPLY_AUTOMATION_ID)
.From(DBT_PLY)
.Where([DBPLY_AUTOMATION_ID, '<> '''''])
.GroupBy(DBPLY_AUTOMATION_ID)
.Having(['Count(DISTINCT', CDBField_ID, ') > 1'])
.OrderBy(DBPLY_AUTOMATION_ID);

if FLastAutomation <> '' then
query.Where.AndE([DBPLY_AUTOMATION_ID, '>', ':',
CParamAutomationID]);

You can also clear a whole section and rebuild it from scratch if you need two very similar queries. (Create one, convert it to a string, change the part that needs to be changed, convert that one to a string too.)

strQueryCount := query.AsString;

query.Select.Clear;
query
.Select
.Column(CAliasPly1, CDBField_ID)
.AsAlias(
CAliasPlyID)
.Column(CAliasPly1, DBPLY_AUTOMATION_ID)
.AsAlias(C
AliasPlyAutoID)
.OrderBy
.Column(CAliasPly1,
CDBField_ID).Desc;


And the last example:

query := CreateGpSQLBuilder
.Select
.Column(['DISTINCT', DBPLY_DBField_ID])
.&Case
.When([DBSBF_DBField_CreatedTime, '< ''2010-01-01'''])
.&
Then('*')
.&Else('')
.&End
.From(DBT_PLY)
.LeftJoin(DBT_SBF)
.&
On([DBSBF_PLY_ID, '=', DBPLY_DBField_ID])
.OrderBy(DBPLY_DBField_ID);

For now there’s no documentation. It will be probably added as a set of unit tests.


I’ll be happy of any contribution or feedback.

36 comments:

  1. IMO, this doesn't add clarity to the SQL code. Also, it becomes tedious to yank out the SQL code, paste into f.x. SSMS and tweak or debug the query, then add it back into the code with changes.
    As an academic experiment, fun - but for writing maintainable code - not the direction I would choose.
    Add parameter substitution, table aliasing, sub queries, and functions such as ISNULL or CONVERT - and you will have a bit of a mess.

    ReplyDelete
    Replies
    1. I presume it all depends on the kind of SQL one is writing and on the personality of the programmer. To each its own.

      Delete
    2. Unfortunately, until we get LINQ-like support in the language we are all condemn to this kind of tricks and ad-hoc code, thus reinventing the wheel over and over again (this is the kind of stuff we prefer to have control over, because they are fun to write but complex to use unless you wrote it in the first place.)

      BTW, I have SQL all over my code and, like Primož, database creation and upgrades are handled in code.

      Delete
    3. C Johnson18:02

      At least it doesn't leave clear SQL statements in your binary code that could easily be read or modified by a third party.

      Delete
    4. There are numerous methods to avoid clear SQL in code if that is a requirement. Compressed and encrypted in a component or resource string, f.x.

      Delete
    5. @Leus - We have 450+ discrete select expressions, in addition to the standard view selects which as constructed from the list and element classes, and a dozen or so dynamically built queries. The db has 150+ tables, 230+ views, 420+ procedures, nearly 100 functions, 20+ triggers and a couple of .net assemblies. All changes are made using SSMS or ApexSQL Edit, and synchronized out from a staging db using ApexSQL Diff.
      Checking for dependencies would have been a nightmare if it wasn't for unit testing.
      If I'd felt that this construct would improve how we use in-code SQL - I would have used it.

      Delete
    6. I'm with Lars on this. You're turning ugly but standard SQL strings into some awful mess of proprietary code that is both bigger and more complex than the original. Database code should look like database code, and especially it should be portable. Can you paste this into a query optimising tool? Can you paste it into an iPhone app? Or a Node.js app?

      To be better the resulting code needs to be simpler or smaller, ideally both. OPFs sometimes manage a bit of it, and OPF+LINQ can be really good, but all too often you end up going "just give me the SQL so I can see the error/optimise it". At least your builder allows that, LINQ as a rule does not (and that's usually where the problems start).

      My general rule is that if the code does not embody business rules is should be in the database as a pre-optimised view or stored procedure. Especially if it just encodes structure ... DeleteInvoice(InvoiceId) is a stored procedure that copies the record into an archive table then hard deletes it. Or maybe later if will just set a deleted flag. Either way, that code lives in the database. Ditto views, especially views across several tables. Then the code is just "SELECT ... FROM view_whatever".

      For more complex stuff I've taken to table-valued stored procedures, because there are a couple of places in our setup where the SQL query plans that come out of the naive SELECT are unfortunate. Even realising that that is the case with this kind of obfuscated SQL can be difficult... you just end up with blocks of code that take a long time to run.

      Delete
    7. This comment has been removed by the author.

      Delete
    8. This comment has been removed by the author.

      Delete
    9. It appears my browser for some reason reposted my two previous comments, so I removed them.

      Delete
  2. Anonymous17:18

    There are zero reasons to use it. It might be readable by you, as you are the author, but it's a gibberish for others. I can imagine somebody looking at 100 pages of code like this trying to figure out what the resulting query is. Please, stop doing this. Really.

    ReplyDelete
    Replies
    1. Don't use it then. Nobody is forcing you.

      Delete
    2. There might be no for now, but there will be when the engine is capable of escaping names based on how certain DBMS does it (configurable). You might say simply use names that don't clash with SQL keywords, but I prefer having my names as readable as possible.

      Delete
    3. >There are zero reasons to use it.

      There are lots of reasons to use it, and things like this are quite popular outside of Delphi Island (especially among web developers). It offers you the power of programmatically generated SQL.

      Here's an example from the tutorial for a similar library for another language called SQLAlchemy:

      ---------
      Suppose you’re writing a search function, which receives criterion and then must construct a select from it. To accomplish this, upon each criterion encountered, you apply “generative” criterion to an existing select() construct with new elements, one at a time. We start with a basic select() constructed with the shortcut method available on the users table:

      >>> query = users.select()

      We encounter search criterion of “name=’jack’”. So we apply WHERE criterion stating such:

      >>> query = query.where(users.c.name=='jack')

      Next, we encounter that they’d like the results in descending order by full name. We apply ORDER BY, using an extra modifier desc:

      >>> query = query.order_by(users.c.fullname.desc())

      We also come across that they’d like only users who have an address at MSN. A quick way to tack this on is by using an EXISTS clause, which we correlate to the users table in the enclosing SELECT:

      >>> from sqlalchemy.sql import exists
      >>> query = query.where(
      exists([addresses.c.id],
      and_(addresses.c.user_id==users.c.id, addresses.c.email_address.like('%@msn.com'))
      ).correlate(users))

      And finally, the application also wants to see the listing of email addresses at once; so to save queries, we outerjoin the addresses table (using an outer join so that users with no addresses come back as well; since we’re programmatic, we might not have kept track that we used an EXISTS clause against the addresses table too...). Additionally, since the users and addresses table both have a column named id, let’s isolate their names from each other in the COLUMNS clause by using labels:

      >>> query = query.column(addresses).select_from(users.outerjoin(addresses)).apply_labels()

      Let’s bake for .0001 seconds and see what rises:

      >>> conn.execute(query).fetchall()
      SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, addresses.email_address AS addresses_email_address
      FROM users LEFT OUTER JOIN addresses ON users.id = addresses.user_id
      WHERE users.name = ? AND (EXISTS (SELECT addresses.id
      FROM addresses
      WHERE addresses.user_id = users.id AND addresses.email_address LIKE ?)) ORDER BY users.fullname DESC
      ('jack', '%@msn.com')


      The generative approach is about starting small, adding one thing at a time, to arrive with a full statement.
      --------------------------

      How else are you going to generate that massive SQL statement in a simple fashion?

      > It might be readable by you, as you are the author, but it's a gibberish for others.
      It looks very SQL-like to me. And because it can be done iteratively, it can be a lot easier to read it step by step that one giant mess of an SQL statement.



      Delete
  3. Very creative and flexible. There are times I like to use Query components that contain a big SQL string, but there are times I am building SQL in code, and this would be a nice way to handle that.

    I can't help but think it would be useful to have a tool that would generate and object representation of some tables to use with this, instead of declaring the Constanta manually.

    ReplyDelete
    Replies
    1. In similar libraries for other languages, "table reflection" is used to generate that object representation. The problem is that Delphi doesn't have operator overloading, so if you wanted to be able to use operators directly in the expression generation; e.g. "where(users.age > 7)" you're going to need to use records with methods. Now that I think about it, Delphi's static typing is also a hindrance as with dynamic typing these objects can be built on the fly. I guess you'd be left with generating the code, perhaps a unit, which would need to be imported. :-(

      Delete
  4. Buy Embarcadero DBPower studio :-)

    Lot of support there.

    But, maybe Embarcadero should add a hook to that in Delphi, like they do in ER/Studio where you can declare DB Powerstudio as default SQL editor, like in stored procedures etc.

    ReplyDelete
    Replies
    1. DBPower studio is ridiculously expensive unless you're already paying huge fees for Oracle or MS-SQL, and the functionality outside those platforms is limited anyway. I've just spent a few days looking at tools at DBPower was $4500 for nearly as much functionality as Navicat at $180. Except that Navicat will run under WINE so I don't also need a separate Windows VM with all the licensing nonsense that Microsoft bundle with their OS.

      Delete
    2. $4500?!? I could get a commercial version of the PostgreSQL database, Enterprise DB, for about that much!

      Navicat's a very popular program and I've come across several people recommending it (funnily enough, one was a Delphi fan site that I don't think realized they were an EMBT competitor).

      DBeaver is a nice, cross-platform database tool that's open source (GPL2) and free which you might want to look at; it may suit your needs.

      You use WINE? Do you run OS X or Linux?

      Delete
  5. Anonymous20:39

    I like it. Yes, it might see a bit messy, but having the ability to dynamically add fields, where clauses, etc is very helpful. Thanks for sharing! As an aside, I'm not sure why someone would post to tell you to stop doing this. That was bizarre.

    ReplyDelete
  6. Some times ago I used a delphi preprocessor that allowed me to write strings soanning several lines with a veri nice and simple syntax. But I can't remember the name of that plug in. Can someone please help me ?

    ReplyDelete
  7. >Embarcadero, are you listening? Can we please get multiline strings in Delphi? Please?

    Probably not. I asked Allen Bauer about that a while back, and he said that the relevant portion of the lexer dates back to Turbo Pascal days and can't handle string literals longer than 255 characters, and would have to be completely re-engineered to fix it.

    ReplyDelete
    Replies
    1. $%&#$*@@^! I wrote some test code that parsed HTML not too long ago and I ran into that limit when I couldn't paste the HTML snippet straight into the IDE and had to join strings together, which I thought was ridiculous. I couldn't conceive of why that limit existed; at least Bauer provided an honest (depressing) answer. I think there are far too many things in modern Delphi that actually date back to Turbo Pascal days. Oddly enough, TP had a function to turn a date or datetime into a record which Delphi does not, which means in some ways Turbo Pascal had a more structured way of working with date and time than modern Delphi!

      There's one thing that this answer doesn't explain though. Upon discovering the 255-character limit I got frustrated and said, "Fine! I'll do it in FreePascal!" I installed and fired up FreePascal, and Lazarus *had the same limit*. :-( What were they thinking?

      Delete
  8. Hard make this totally universal, but maybe some Abstract component thingy you and/or DAC-vendor could write to get also some validity Checking, always and/or in Debug build.

    Connection to the DB could do the Checkup is those Tables and Fields actually there, and maybe other validity checks also...

    Need more caffeine to think straight, but this just came into this idea and thought to just write it up, so I do not forget :D

    ReplyDelete
  9. Anonymous05:23

    I like it and I will use it.
    In my code I always use constants for table/field names. When I do any little change in my DB I can change one constant name an detect all the parts in my code that are related to that element.
    Currently I use:
    'Select '+Tbl1Field1+', '+Tbl1Field2+' from '+Table1Name+';' etc...
    I definitely prefer to use your code.

    Many thanks for share it.

    ReplyDelete
  10. I like it too

    You can use a class, replace const

    type
    [TTable('UNI_FON')]
    TUNI_FON = class(TRTTI_Class)
    private
    { Private declarations }
    FUNI_SIGLA: TField;
    FUNI_DESCRICAO: TField;
    public
    { Public declarations }
    class function _UNI_SIGLA: String; static;
    class function _UNI_DESCRICAO: String; static;

    [TTableField('UNI_SIGLA', tkString)]
    property UNI_SIGLA: TField read FUNI_SIGLA write FUNI_SIGLA;
    [TTableField('UNI_DESCRICAO', tkString)]
    property UNI_DESCRICAO: TField read FUNI_DESCRICAO write FUNI_DESCRICAO;
    end;

    ...

    procedure TTestGpSQLBuilder.TestIsaquePinheiro;
    const
    CExpected = 'SELECT UNI_SIGLA, UNI_DESCRICAO FROM UNI_FON ORDER BY UNI_SIGLA';
    begin
    with TUNI_FON do
    begin
    SQL.Select
    .Column(_UNI_SIGLA)
    .Column(_UNI_DESCRICAO)
    .From('UNI_FON')
    .OrderBy(_UNI_SIGLA);
    end;
    Assert.AreEqual(CExpected, SQL.AsString);
    end;


    this class can be created with rtti it from your database.

    ReplyDelete
  11. > longer than 255 characters, and would have to be completely re-engineered to fix it.

    no, it would not. DCC32 can perfectly create several KB size constants, if you split it like 'xxxx' + 'yyyy' + 'zzzz'

    Thus a simple pre-processor could split the multiline to lesser chunks and feed it to the legacy TP-times parser

    ReplyDelete
  12. I have some questions, if you don't mind me asking. 1) What does AST stand for, in symbols like IGpSQLAST? 2) Is there a better way to 'construct' a sentence like FIELD = 'Value' instead of 'Field = ''Value''' or [ 'FIELD', '=', 'Value' ]?

    ReplyDelete
    Replies
    1. 1) Abstract Syntax Tree.

      2) At the moment there's no better way but I'm open to suggestions.

      Delete
    2. Regarding (2), if instead of string constants you were using records-with-methods for the fields couldn't you use operator overloading and write those expressions directly? The cost would be some initial setup mapping the objects to their corresponding fields.

      Delete
  13. gabr, I love it. Sadly, in the Delphi community, many of the remaining developers prefer the hard way to do things. :-( You mention that you can iteratively form the query and that's really the key advantage here.

    A library for another language called SQLAlchemy has both an ORM and programmatically generated SQL (from which the ORM is built). This is from its tutorial:

    >We’ve now gained the ability to construct very sophisticated statements. We can use all kinds of
    >operators, table constructs, text, joins, and aliases. The point of all of this, as mentioned earlier, is
    >not that it’s an “easier” or “better” way to write SQL than just writing a SQL statement yourself; the
    >point is that it’s better for writing programmatically generated SQL which can be morphed and
    >adapted as needed in automated scenarios.
    >
    >To support this, the select() construct we’ve been working with supports piecemeal construction, in
    >addition to the “all at once” method we’ve been doing. Suppose you’re writing a search function,
    >which receives criterion and then must construct a select from it. To accomplish this, upon each
    >criterion encountered, you apply “generative” criterion to an existing select() construct with new
    >elements, one at a time....
    >The generative approach is about starting small, adding one thing at a time, to arrive with a full
    >statement.

    All in all, great job, as a library like this was on my (long) list of enterprise-class functionality missing from the Delphi ecosystem.

    ReplyDelete
  14. Hello Primoz,

    I'm trying to create a generic "Upsert" query.


    Insert Into MyTable(KeyCol, Data)
    Select 'key', 'Data'
    From ( Select 1 As Value ) As Z
    Where Not Exists (
    Select 1
    From MyTable As T1
    Where T1.KeyCol = 'key'
    );

    My problem is that I don't know how to build such queries with gpSQLBuilder.

    ReplyDelete
    Replies
    1. I added few small enhancements so now you can write:

      query :=
      SQL.Concat([
      CreateGpSQLBuilder
      .Insert
      .Into('MyTable')
      .Column('KeyCol')
      .Column('Data'),
      CreateGpSQLBuilder
      .Select
      .Column('key')
      .Column('Data')
      .From(CreateGpSQLBuilder.Select('1').&As('Value'))
      .&As('Z')
      .Where(SQL.Not(SQL.Exists(
      CreateGpSQLBuilder
      .Select('1')
      .From('MyTable').&As('T1')
      .Where(['T1.KeyCol', '=', SQL.Q('key')]))))
      ]);

      Delete
    2. re: ".Where(['T1.KeyCol', '=', SQL.Q('key')])"

      Can T1. here be not a magic constant but a property of the builder, it would pick from .From or from .&As ?

      Delete
    3. I don't see how that could be done.

      Delete