Final steps of LibreOffice database migration

It is my third post about implementing HSQL migration inside odb documents in connection with a TDF tender. The previous posts were about creating the schema import, and the import of binary data.

Today I will write about the final steps which has to be made in order to
toggle off experimental feature from Firebird and use HSQLDB migration by default. This process also involves the creation of some unit tests, and
modifying some others.

First, I’d like to mention that there were already some unit tests, which tested the HSQLDB driver. These tests can be easily reused, since the test cases use the sdbc interface which makes the underlying driver transparent. It means that these tests – after enabling database migration – will be useful not only for testing the sdbc driver, but also to test the migration.

Besides that, I added two CppUnitTest for testing the migration indirectly. One of them tests the sql parser which should get rid of HSQLDB specific parts of the statement. These test cases get SQL strings as an input, and assures that the output of the parser makes sense in the actual sdbc driver’s context.

There were some special cases to cover up. For example HSQLDB’s VARCHAR doesn’t necessarily need a parameter during the creation of the table. This would cause an error with Firebird, so in that case a default size parameter has to be added.

The other test suite aims the binary import by using an example odb file containing a HSQLDB database. The test asserts that the values in each cell of the example database are the same after the migration.

At this point, I had some trouble with the DATE, TIME and DATETIME values.
HSQLDB interprets an incoming input of one of these types as local time. That means if you insert 06.04.2018 00:00 into the database – either with direct sql or in the graphical table view in LibreOffice – it will check your timezone, and convert the time to GMT (in case of CACHED tables, as it is in case of an embedded HSQLDB document). If I created the database from Budapest with time zone GMT+2, the actual value stored will be the elapsed milliseconds since 1970 until 05.04.2018 22:00. This issue is tracked in Bugzilla here.

There is no problem with that when you read out your date with the same time zone. But If I want to send my odb file to a friend in China, all the values will be changed. An easier way to reproduce this is to run LO as follows:

DATE=$(TZ=Asia/Taipei date +”%F %H:%M”) libreoffice –base

This anomaly comes up when migrating the database too. Firebird (by default) doesn’t care about time zones, it is the user’s responsibility to interpret the values the correct way. It seems to me that this would be the correct behaviour in case of an embedded database, since these files can be moved easily.

In the current state, after migrating the database the time related columns will be slightly different. But this is intentional, since the migrated values are represented in UTC.

Next step was to flip migration to default. After that a bunch of unit tests
failed related to HSQLDB. After adding the firebird_sdbc component, there were still some failing tests, because they contained HSQLDB specific statements. They has to be simply replaced with Firebird specific stuff. For example there is no such thing as “drop table myTable if exists” in Firebird. You can use PSQL to check the existence of myTable instead.

The current state of flipping to default can be found on gerrit:

https://gerrit.libreoffice.org/52008/

Advertisements

DBMS migration in LibreOffice: HSQLDB binary import

Last time I wrote a blog post about the challenges of some Firebird bugs in LibreOffice, and the main concept of implementing the schema import of HyperSQL databases inside a Base document.

Since lately, I continued working on the tender published by TDF with Collabora. Next step was to create a way of importing table rows from a HSQL database. Base creates tables as CACHED tables – meaning in HSQL terminology that data is stored on disk, and pulled up to memory in demand – which implies that the data is stored in a binary file near the schema description.

To interpret the binary file, we get some help from the “script” file, which contains metadata for the database. For each table there is a statement which pairs a set of numbers to them (SET TABLE … INDEX … ). These numbers are file pointers to the root element of AVL trees. Each node of the tree stores a row.

In order to read a row from the table, it is necessary to know the ordered set of its column types, because the row does not contain this information. Fortunately, the column types are known from the schema import.

The next step is to read the data of each column in a row. The data is stored continuously in the file and each type has its own format. For example, the VARCHAR type is stored as follows: The first 4 bytes can be interpreted as an (unsigned) number, which tells us the size – number of unicode characters – of the string. The next array of bytes contains the string itself.

Finally, we have to adopt the data with the new DBMS. For that I use the sdbc layer’s PreparedStatement. By calling the appropriate setXXX method, and executing the statement, the data is finally migrated to the new database.

For those, who were enthusiastic enough to read this post so far, this is the related commit on gerrit:
https://gerrit.libreoffice.org/#/c/50536/

The job is not done yet. There are some type interpretations which are not implemented yet (e.g. Date/Time, Numeric/Decimal). Also I intend to write unit tests for the whole library, and I expect some upcoming problems while testing it with different databases.

DBMS migration in LibreOffice: Firebird and HSQLDB schema import

TDF has published tenders to improve LibreOffice in some areas, including its database management program, Base. The goal here is to implement the import of HSQLDB databases without Java Runtime Environment, in order to facilitate the change-over to
Firebird. In this post I would like to explain what steps are made at Collabora so far to achieve this goal.

Before dealing with the compatibility of the old HSQLDB-based odf files, an improvement had to be made on the Firebird driver. Within this context I implemented those types which could be created with the HSQLDB driver, but not with Firebird. Those Hyper SQL types are:

  • LONGVARCHAR, implemented as a BLOB holding text in Firebird context;
  • BINARY (fix) and VARBINARY. These types can be created as a CHAR/VARCHAR
    field with a special character set called OCTETS. This character set indicates
    that it stores binary format, and it is needed to indicate that no character
    set or collation makes sense here.
  • LONGVARBINARY. This type is responsible for holding images. As images can be
    really big, I implemented this as a subtype of BLOB.

Besides that, to prevent the changing of the DBMS back end being a setback, there were some bugs with high priority to fix. For example the impossibility to copy integer values from Calc to Base (tdf#70425) is fixed by allowing the driver to set numerical values even with the setString method, when it is called on a column with numerical type (int, smallint, etc.).

After that, in order to prepare the removal of the legacy java-based DBMS, it is needed to create a way to import the database schema and the binary data stored in older odf files without Java. Technically it means a new library in module “dbaccess”, which does all the import, and which is used in the dba (core) library when creating connection.

Splitting it to subtasks, I decided to implement the schema import first. The database schema is zipped into the odf file in a text file called “script”. This file consists of DDL statements in a strict format. The new library reads these SQL statements line-by-line, transforms it a little, and then executes them using the sdbc driver. A modification is needed in statements which contain column definitions (since the types typically differ in different systems), and there are some HSQLDB specific keywords too (e.g. CACHED tables).

The migration can be enabled using DBACCESS_HSQL_MIGRATION=1 when starting
LibreOffice. Currently, as the binary import is not ready yet, the result of the migration is some empty tables mostly.

Next step will be to implement the binary import and create unit tests for all.
the changes I have made.

Tmux to improve productivity

Sometimes I like to waste my free time with learning something new about the tools I use. If I have more time I like experimenting with tools I’ve never used before. This time I got to know an application called tmux and I want to show you why you should use it if you work using the terminal regularly.

Tmux is a terminal multiplexer which means that you can have several windows and panes in one screen. Mainly that’s it, but I’ll show you how useful it can be.

Before I used tmux, I always had several terminals running separately (several konsole applications to be more accurate, as I use KDE). There are several reasons why it is less effective than having only one screen.

Firstly and most importantly tmux will provide you a more stable layout of the terminal instances. During the work one thing that can improve your productivity is to make sure that you know exactly which terminal is responsible for the task you want to do next, and which command you should use to access it directly.

That is something you can try with several terminal screens too. You can say for example that you’ll have a terminal window on the left side for git, grepping stuff, system commands, etc. And one on the right side for editing source code. The problem is that OS windows are too flexible for that and you may end up using a lot of terminal screens which are in chaotic positions and you might search for the wanted screen with pushing Alt+Tab madly after you find the good one at last.

With tmux things getting much easier. The relative position of each panes are fixed (you can change it, but you probably won’t), and the windows can be named so it makes sure you won’t use it for something else. They can also be accessed easily, because the windows are identified with a number that can be used as a shortcut to get it in focus.

I’ll show you my current setup as an example:

  1. First window has two panes: one for gdb, one for looking around in code or in git history. This is a good combo, because most of the time I’d like to look at the code while debugging.
  2. The second window is for editing source code. There is only one pane where I run vim automatically on startup with LibreOffice root directory as current directory. I create panes in vim itself, because it allows me to copy-paste text between the buffers.
  3. Third is for building the code and to handle versions with git. I like separating this logic from poking around with git diff, git show, etc., because those commands come up in a different time for a different purpose. That’s why I don’t use all the git commands in the same window.
  4. It is for system commands only. As the windows have own history, it is a good way to easily access commonly used commands like “sudo pacman -S …” or alsamixer or whatever you tend to use.

That way (after some time of practice) I will always know I have to push Ctrl+a+2 to edit something, which is much faster than browsing in OS windows.

There are a lot of other benefits of using tmux. I only highlighted the one which was the most important for me.

LibreOffice conference 2017

This time LibreOffice conference was held in Rome

This year’s LibreOffice conference was held in Rome. I’d already been in a LibO conference last year, so this time I had a guess on what to expect. This time, I went as a software engineer of Collabora Productivity, which made the conference more interesting. I spent six days in Rome, in which there were three conference days, one day for Collabora meetup and team-building and I had two days for sightseeing too (the first and the last day).

About Rome

I arrived to Leonardo da Vinci International Airport (which is called simply Fiumicino by the Italians) on Monday afternoon. The airport is near to the sea, so I had to take a train to go to the city center. The ticket is quite expensive (14), but the fun fact is that if you don’t validate your ticket with the quite unnoticeable ticket validating machine, you have to pay an extra €50.

After I had arrived to the train station and I had taken over the accommodation I went sightseeing. The first thing I wanted to see was the Colosseum, which I found enormous and fascinating from the first glance. It was interesting to stand on a building which I’ve heard a lot about but I’ve never seen it with my own eyes. One thing which was a bit annoying was the countless litter thrown away at the foot of the building and actually all over the streets of Rome.

After that I wanted to visit the Forum Romanum. Then, I headed to the fountain of Trevi through piazza Venezia, the Church of the Gesú (which is often said to be the first church built in baroque style) and the Pantheon. I found an unreasonable number of tourists in piazza Trevi (piazza means square). I know that it’s famous, and I agree that it’s a beautiful fountain, but even so… I couldn’t get to the edge of the fountain because of the crowd.

dav

Tuesday

It was time to meet the colleges, not through the screen of my computer this time. We played airsoft, so I had finally the opportunity to shoot the unsuspicious colleges in the back. 🙂

After that, we had a pretty good lunch in a nice Italian restaurant, and had a team-meeting with brainstorming in the apartment.

Wednesday

First day of the conference. The conference building was in the hearth of Rome, in an unbelievably fancy museum, university or something in between.

I had a talk in the evening with the title “Quantum computing with VBA”. It is still a bit stressful for me to present for an international audience which is probably full of developers more experienced than me. However, the talk seemed to be welcomed with great enthusiasm and it was successful after all. I think it is an important thing to practice public speech, especially by software development, where sometimes the key of success is mostly about communication skills.

DWave-Quantum-approved

Thursday

In the second day the most interesting thing was the hack night after the conference. This event is about programming in groups, while you can eat sandwiches and drink beer. For me, it meant a great opportunity to see how others approach a LibreOffice-related problem, how they use their computer, and how they set up their environment. I learnt a bunch of useful stuff in just one night.

Friday

Last day of the conference. I listened to some fascinating talks, e.g. “Cursing Compilers” from Stephann Bergmann, and “Brute force clang plugins” from Noel Grandin (I’ve always loved thoughts about compilers, without any explanation for that). After that I had a little talk in the Lightning Talks session about JODConverter, which is a java based document format converter program that now can use Collabora Online for converting the documents. Finally, we got to know that the next conference will be in Tirana, Albania. My day ended with visiting the Castel Sant’Angelo and the Vatican.

dav

I had a really good time in Rome, and I am waiting for the next conference in Albania. There are a lot of bugs to fix until that of course.