Database migration in LibreOffice: bug fixes and more

In my 4th blog post about database migration I’ll explain most of the bugs or missing features which were discovered after publishing the migration work, and the solution for them. After that I’ll show some opportunities for further improvement. My previous post can be found here.

One of the most critical missing chunk was the migration of views and relationships. For that I had to improve the SQL parser to interpret statements like “create view..”. Some of the View definitions are still not working at the time of writing this post. The reason of this is that there are a bunch of database specific functions and keywords, like “concat” or “weekday”, which have to be handled specially for each database. For example Firebird’s dialect uses two vertical lines “||” instead of “concat”.

After that I fixed some bugs related to the migration of specific types, and I added a few types which I generously forgot to implement before. That’s how I fixed the migration of numeric and time/date columns. I also added the “REAL” type, which is something like “FLOAT”. In some databases REAL can store floating-point number of 4 bytes, while FLOAT is somewhat bigger. In case of a HyperSQL database, REAL, FLOAT and DOUBLE should be all mapped to the C++ double
type.

There were also some general bugs which I can’t classify to a category, but they are still interesting. One of my favorite bug was that the parser had not been able to parse multi-column primary keys. The “create table” statements had specified the primary key right after the column definition. The solution was a more general approach, where the primary key is defined after all the column definitions. Another bug which was fun to fix was the migration of default values for columns, which was entirely missing. For that I had to improve the parser to interpret the “DEFAULT ” part of the SQL statement.

And now I’d like to explain what could be done to make the whole migration library work better. There is an SQL parser implemented in LibreOffice called SQLParser in module connectivity. I decided not to use this parser in the migration library, because I had problems with it.

This SQL parser can parse the statements in a tree structure. Each element of a node can be reached by the method “getChild(int idx)”, where idx is the index of the element. The problem with this approach is that you can find method-calls all over the code with magic numbers. For example if you’d like to get the referenced column name of a node representing a foreign key definition, you may call getChild(7). The real pain comes when you realise that you want to modify the tree structure, because a DBMS figured out a new keyword somewhere in the middle. Then all the magic numbers after that node have to be incremented. Instead of this, a wrapper should be created around SQLParser, which translates the index to an associative name. This task could take a while.

With the above modification of SQLParser the parser could be improved and could be used to parse HSQLDB statements and compose other dialect’s statements. That would be a more general and maintainable solution.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s