VS code IDE integration for LibreOffice

Successfully submitting a very first patch to LibreOffice can be quite challenging. Not only because you have to deal with the build dependencies first, but because the project doesn’t come with a default/suggested development environment. Even though it can be annoying, it is actually a good thing, because you can use your favorite. In this blog post I want to show you how to use VS code for LibreOffice hacking:

      1. Build LibreOffice.. (make check)
      2. Create compile_commands.json with the following command:
        make vim-ide-integration
      3. Install VS code, but I guess you did it already.
      4. Install the C++ extension.
      5. File -> Preferences -> Settings -> Workspace Settings -> Extensions -> C/C++. Find a link to your settings.json file.
      6. Insert the following line:
"C_Cpp.default.compileCommands":"${workspaceFolder}/compile_commands.json"

The compile_commands.json file helps the IDE find all the include paths/dependencies. Luckily, VS code can use the same structure as the YouCompleteMe code-completion engine for vim. The above line in settings.json specifies the location of the file relative to the workspace folder.

After this the semantic code completion features (code completion, Goto Definition, …) should work. In order to get debug working, you can follow this description. For handy tricks you should take a look at this and this.

Disclaimer: I couldn’t configure the debugger which symbols to load. It doesn’t affect the code completion features, but for the debugging – unless you know how to restrict symbol loading (then tell me in a comment) – you’ll need more than 8 GB of RAM.

The new MySQL driver in LibreOffice

Base – the database editing program of LibreOffice – offers several features when it comes to external database connection. One interesting feature is that Base lets you connect to an external database not only with ODBC and JDBC but with native connectors too. Using the native connector instead of a standard like JDBC sometimes has a positive impact on performance.

Considering that MySQL is one of the most used database management systems worldwide it is clear that the support of connecting to a MySQL database is an unavoidable part of Base. Currently there are three ways supported to connect to a MySQL database: with ODBC, JDBC and using a native connector. However, the last option was only available as an extension, and it is the part of the core project only since November 2018.

Why was it only an extension at the first place? The native connection was implemented by using the C++ connector of MySQL, which is licensed under GPL. Because of that it could not be put into the core project. So what is the solution then?

Because of the above mentioned problem we, at Collabora decided to use the connector of MariaDB instead. MariaDB is a fork of MySQL which has the same database structure and indexes as MySQL. It has a C API which can be used to connect to MySQL and MariaDB databases too. What is more, the connector is LGPL licensed, which means that it can be used in the core project too. So the next task was to use this API to implement the sdbc driver.

Some parts of the implementation is pretty straightforward. It is pretty easy to create a connection and pass options through the function “mysql_options”. It was also clear to me how to pass things like user and password to the DBMS.

There were a few things though which were a bit more challenging. The XDatabaseMetadata interface for example – which is responsible for providing information about the current state and capabilities of the database – could not be implemented with only a few function calls. In order to implement them, I had to construct SQL queries for most of the methods. Database information are available in the schema called “INFORAMTION_SCHEMA”. That task took some time since the interface has more than a hundred methods which had to be implemented.

Another challenge was a bug about parallel execution of result sets. The C API does not support the usage of two result sets simultaneously, but the sdbc standard supports it. The solution was to store the result of a query and free the mysql resources afterwards. That way a new result set can be fetched with the C API while the previously fetched data is stored in memory.

Having the native MySQL connector in the core project has several advantages. First, it is easier for the user to install it. It is bundled with the core project, there is nothing to do there. Also, it improves maintainability, because it does not have to be maintained separately as an extension. For example the automatically triggered clang plugins do not run on extensions. It can easily happen With an extension that after a few releases it is not usable anymore, because of the lack of maintenance.

Besides that, in order to improve quality and maintainability I created a test suite to test the implementation of the sdbc driver of a MySQL database. This test suite does not run automatically with each build though. In order to test it, a running MySQL or MariaDB server is needed, so only manual execution of the test suite is supported. The test suite can be triggered with the same method as the other tests, but you have to declare an environment variable as well (CONNECTIVITY_TEST_MYSQL_DRIVER). The variable should store the URL of an available MySQL or MariaDB server. It should contain a user, a password, and a schema, which can be used to test the functionality of the driver.

Although a lot of things work now, there might be some bugs hidden. Feel free to test the driver if you like, file a bug and please CC me in Bugzilla if you found something interesting.

You can also take a look at my talk on FOSDEM about the MySQL connector and other improvements of Base.

Database migration in LibreOffice: bug fixes and more

In my 4th blog post about database migration I’ll explain most of the bugs or missing features which were discovered after publishing the migration work, and the solution for them. After that I’ll show some opportunities for further improvement. My previous post can be found here.

One of the most critical missing chunk was the migration of views and relationships. For that I had to improve the SQL parser to interpret statements like “create view..”. Some of the View definitions are still not working at the time of writing this post. The reason of this is that there are a bunch of database specific functions and keywords, like “concat” or “weekday”, which have to be handled specially for each database. For example Firebird’s dialect uses two vertical lines “||” instead of “concat”.

After that I fixed some bugs related to the migration of specific types, and I added a few types which I generously forgot to implement before. That’s how I fixed the migration of numeric and time/date columns. I also added the “REAL” type, which is something like “FLOAT”. In some databases REAL can store floating-point number of 4 bytes, while FLOAT is somewhat bigger. In case of a HyperSQL database, REAL, FLOAT and DOUBLE should be all mapped to the C++ double
type.

There were also some general bugs which I can’t classify to a category, but they are still interesting. One of my favorite bug was that the parser had not been able to parse multi-column primary keys. The “create table” statements had specified the primary key right after the column definition. The solution was a more general approach, where the primary key is defined after all the column definitions. Another bug which was fun to fix was the migration of default values for columns, which was entirely missing. For that I had to improve the parser to interpret the “DEFAULT ” part of the SQL statement.

And now I’d like to explain what could be done to make the whole migration library work better. There is an SQL parser implemented in LibreOffice called SQLParser in module connectivity. I decided not to use this parser in the migration library, because I had problems with it.

This SQL parser can parse the statements in a tree structure. Each element of a node can be reached by the method “getChild(int idx)”, where idx is the index of the element. The problem with this approach is that you can find method-calls all over the code with magic numbers. For example if you’d like to get the referenced column name of a node representing a foreign key definition, you may call getChild(7). The real pain comes when you realise that you want to modify the tree structure, because a DBMS figured out a new keyword somewhere in the middle. Then all the magic numbers after that node have to be incremented. Instead of this, a wrapper should be created around SQLParser, which translates the index to an associative name. This task could take a while.

With the above modification of SQLParser the parser could be improved and could be used to parse HSQLDB statements and compose other dialect’s statements. That would be a more general and maintainable solution.

Final steps of LibreOffice database migration

It is my third post about implementing HSQL migration inside odb documents in connection with a TDF tender. The previous posts were about creating the schema import, and the import of binary data.

Today I will write about the final steps which has to be made in order to
toggle off experimental feature from Firebird and use HSQLDB migration by default. This process also involves the creation of some unit tests, and
modifying some others.

First, I’d like to mention that there were already some unit tests, which tested the HSQLDB driver. These tests can be easily reused, since the test cases use the sdbc interface which makes the underlying driver transparent. It means that these tests – after enabling database migration – will be useful not only for testing the sdbc driver, but also to test the migration.

Besides that, I added two CppUnitTest for testing the migration indirectly. One of them tests the sql parser which should get rid of HSQLDB specific parts of the statement. These test cases get SQL strings as an input, and assures that the output of the parser makes sense in the actual sdbc driver’s context.

There were some special cases to cover up. For example HSQLDB’s VARCHAR doesn’t necessarily need a parameter during the creation of the table. This would cause an error with Firebird, so in that case a default size parameter has to be added.

The other test suite aims the binary import by using an example odb file containing a HSQLDB database. The test asserts that the values in each cell of the example database are the same after the migration.

At this point, I had some trouble with the DATE, TIME and DATETIME values.
HSQLDB interprets an incoming input of one of these types as local time. That means if you insert 06.04.2018 00:00 into the database – either with direct sql or in the graphical table view in LibreOffice – it will check your timezone, and convert the time to GMT (in case of CACHED tables, as it is in case of an embedded HSQLDB document). If I created the database from Budapest with time zone GMT+2, the actual value stored will be the elapsed milliseconds since 1970 until 05.04.2018 22:00. This issue is tracked in Bugzilla here.

There is no problem with that when you read out your date with the same time zone. But If I want to send my odb file to a friend in China, all the values will be changed. An easier way to reproduce this is to run LO as follows:

DATE=$(TZ=Asia/Taipei date +”%F %H:%M”) libreoffice –base

This anomaly comes up when migrating the database too. Firebird (by default) doesn’t care about time zones, it is the user’s responsibility to interpret the values the correct way. It seems to me that this would be the correct behaviour in case of an embedded database, since these files can be moved easily.

In the current state, after migrating the database the time related columns will be slightly different. But this is intentional, since the migrated values are represented in UTC.

Next step was to flip migration to default. After that a bunch of unit tests
failed related to HSQLDB. After adding the firebird_sdbc component, there were still some failing tests, because they contained HSQLDB specific statements. They has to be simply replaced with Firebird specific stuff. For example there is no such thing as “drop table myTable if exists” in Firebird. You can use PSQL to check the existence of myTable instead.

The current state of flipping to default can be found on gerrit:

https://gerrit.libreoffice.org/52008/

DBMS migration in LibreOffice: HSQLDB binary import

Last time I wrote a blog post about the challenges of some Firebird bugs in LibreOffice, and the main concept of implementing the schema import of HyperSQL databases inside a Base document.

Since lately, I continued working on the tender published by TDF with Collabora. Next step was to create a way of importing table rows from a HSQL database. Base creates tables as CACHED tables – meaning in HSQL terminology that data is stored on disk, and pulled up to memory in demand – which implies that the data is stored in a binary file near the schema description.

To interpret the binary file, we get some help from the “script” file, which contains metadata for the database. For each table there is a statement which pairs a set of numbers to them (SET TABLE … INDEX … ). These numbers are file pointers to the root element of AVL trees. Each node of the tree stores a row.

In order to read a row from the table, it is necessary to know the ordered set of its column types, because the row does not contain this information. Fortunately, the column types are known from the schema import.

The next step is to read the data of each column in a row. The data is stored continuously in the file and each type has its own format. For example, the VARCHAR type is stored as follows: The first 4 bytes can be interpreted as an (unsigned) number, which tells us the size – number of unicode characters – of the string. The next array of bytes contains the string itself.

Finally, we have to adopt the data with the new DBMS. For that I use the sdbc layer’s PreparedStatement. By calling the appropriate setXXX method, and executing the statement, the data is finally migrated to the new database.

For those, who were enthusiastic enough to read this post so far, this is the related commit on gerrit:
https://gerrit.libreoffice.org/#/c/50536/

The job is not done yet. There are some type interpretations which are not implemented yet (e.g. Date/Time, Numeric/Decimal). Also I intend to write unit tests for the whole library, and I expect some upcoming problems while testing it with different databases.

DBMS migration in LibreOffice: Firebird and HSQLDB schema import

TDF has published tenders to improve LibreOffice in some areas, including its database management program, Base. The goal here is to implement the import of HSQLDB databases without Java Runtime Environment, in order to facilitate the change-over to
Firebird. In this post I would like to explain what steps are made at Collabora so far to achieve this goal.

Before dealing with the compatibility of the old HSQLDB-based odf files, an improvement had to be made on the Firebird driver. Within this context I implemented those types which could be created with the HSQLDB driver, but not with Firebird. Those Hyper SQL types are:

  • LONGVARCHAR, implemented as a BLOB holding text in Firebird context;
  • BINARY (fix) and VARBINARY. These types can be created as a CHAR/VARCHAR
    field with a special character set called OCTETS. This character set indicates
    that it stores binary format, and it is needed to indicate that no character
    set or collation makes sense here.
  • LONGVARBINARY. This type is responsible for holding images. As images can be
    really big, I implemented this as a subtype of BLOB.

Besides that, to prevent the changing of the DBMS back end being a setback, there were some bugs with high priority to fix. For example the impossibility to copy integer values from Calc to Base (tdf#70425) is fixed by allowing the driver to set numerical values even with the setString method, when it is called on a column with numerical type (int, smallint, etc.).

After that, in order to prepare the removal of the legacy java-based DBMS, it is needed to create a way to import the database schema and the binary data stored in older odf files without Java. Technically it means a new library in module “dbaccess”, which does all the import, and which is used in the dba (core) library when creating connection.

Splitting it to subtasks, I decided to implement the schema import first. The database schema is zipped into the odf file in a text file called “script”. This file consists of DDL statements in a strict format. The new library reads these SQL statements line-by-line, transforms it a little, and then executes them using the sdbc driver. A modification is needed in statements which contain column definitions (since the types typically differ in different systems), and there are some HSQLDB specific keywords too (e.g. CACHED tables).

The migration can be enabled using DBACCESS_HSQL_MIGRATION=1 when starting
LibreOffice. Currently, as the binary import is not ready yet, the result of the migration is some empty tables mostly.

Next step will be to implement the binary import and create unit tests for all.
the changes I have made.

LibreOffice conference 2017

This time LibreOffice conference was held in Rome

This year’s LibreOffice conference was held in Rome. I’d already been in a LibO conference last year, so this time I had a guess on what to expect. This time, I went as a software engineer of Collabora Productivity, which made the conference more interesting. I spent six days in Rome, in which there were three conference days, one day for Collabora meetup and team-building and I had two days for sightseeing too (the first and the last day).

About Rome

I arrived to Leonardo da Vinci International Airport (which is called simply Fiumicino by the Italians) on Monday afternoon. The airport is near to the sea, so I had to take a train to go to the city center. The ticket is quite expensive (14), but the fun fact is that if you don’t validate your ticket with the quite unnoticeable ticket validating machine, you have to pay an extra €50.

After I had arrived to the train station and I had taken over the accommodation I went sightseeing. The first thing I wanted to see was the Colosseum, which I found enormous and fascinating from the first glance. It was interesting to stand on a building which I’ve heard a lot about but I’ve never seen it with my own eyes. One thing which was a bit annoying was the countless litter thrown away at the foot of the building and actually all over the streets of Rome.

After that I wanted to visit the Forum Romanum. Then, I headed to the fountain of Trevi through piazza Venezia, the Church of the Gesú (which is often said to be the first church built in baroque style) and the Pantheon. I found an unreasonable number of tourists in piazza Trevi (piazza means square). I know that it’s famous, and I agree that it’s a beautiful fountain, but even so… I couldn’t get to the edge of the fountain because of the crowd.

dav

Tuesday

It was time to meet the colleges, not through the screen of my computer this time. We played airsoft, so I had finally the opportunity to shoot the unsuspicious colleges in the back. 🙂

After that, we had a pretty good lunch in a nice Italian restaurant, and had a team-meeting with brainstorming in the apartment.

Wednesday

First day of the conference. The conference building was in the hearth of Rome, in an unbelievably fancy museum, university or something in between.

I had a talk in the evening with the title “Quantum computing with VBA”. It is still a bit stressful for me to present for an international audience which is probably full of developers more experienced than me. However, the talk seemed to be welcomed with great enthusiasm and it was successful after all. I think it is an important thing to practice public speech, especially by software development, where sometimes the key of success is mostly about communication skills.

DWave-Quantum-approved

Thursday

In the second day the most interesting thing was the hack night after the conference. This event is about programming in groups, while you can eat sandwiches and drink beer. For me, it meant a great opportunity to see how others approach a LibreOffice-related problem, how they use their computer, and how they set up their environment. I learnt a bunch of useful stuff in just one night.

Friday

Last day of the conference. I listened to some fascinating talks, e.g. “Cursing Compilers” from Stephann Bergmann, and “Brute force clang plugins” from Noel Grandin (I’ve always loved thoughts about compilers, without any explanation for that). After that I had a little talk in the Lightning Talks session about JODConverter, which is a java based document format converter program that now can use Collabora Online for converting the documents. Finally, we got to know that the next conference will be in Tirana, Albania. My day ended with visiting the Castel Sant’Angelo and the Vatican.

dav

I had a really good time in Rome, and I am waiting for the next conference in Albania. There are a lot of bugs to fix until that of course.