LAPTOP i7: Distributing application specific databases with Sparkl

Thursday, May 11, 2017

Distributing application specific databases with Sparkl

The challenge

Did I say that Sparkl is amazing? Probably a few times :)

When were developing operational applications, utilities, even dashboards theres a common requirement that often comes into play - the ability to have a storage engine available.

Sparkl is itself a plugin that runs on the Pentaho BA server (Check this older blog post for the Sparkl fundamentals), and the BA server needs to be connected to a database, and is usually connected to several.

However, this poses a problem if were distributing our Sparkl app, as we would need to know in advance the configuration of the database so we can connect to it; On top of that, wed need to do the initialization of the database, create tables, etc.

Francesco Corti took a great approach to it on the AAAR plugin, a pentaho - alfresco integration, where he built a configuration screen to configure all the necessary dependencies

Embedding a database

But wouldnt it be cool if we could remove that dependency and somehow have a database that would just work (tm)?

Not only it would be cool, it actually works as a charm. We did that when we built the SDR sample form that was part of the Pentaho 5.2 release.

What we did was using the h2 database in embedded mode; Just by playing with the connection string, we can create a database on the fly thats persisted on a file. Obviously this is hardly something that should be used to store huge amounts of data. Im giving you the tool, use it wisely.

So, how does it work? The trick, like I mentioned, is on the connection string. All we need to do is to create a connection in PDI with the following connection string:

jdbc:h2:${cpk.solution.system.dir}/${cnp_db};DB_CLOSE_DELAY=-1;AUTO_SERVER=TRUE;INIT=RUNSCRIPT FROM ${cpk.plugin.dir}/resources/ddl/cnp_db.ddl

This uses some of the CPK magic variables; CPK, which stands for Community Plugk Kickstarter is the library that powers Sparkl. In runtime it passes some variables that give system environment information, which is going to be very useful for this connection. So using this connection string we can define a new database connection in spoon:

New connection in PDI

The available variables can be seen in the transformation properties of the Sparkl endpoints (when the UI is used to create them).

CPK Variables

In here you see the available variables. By default they are "commented" out (a sparkl convention) and to use them all you need to do is un-comment them. Their meaning is described in the description field. Obviously, the standard parameters can be used, and in here you see that I defined our database name as _cnp_db. Theres another sparkl convention here; this parameter, that holds the database name, starts with an underscore, which means that this parameter is internal only - cant be passed from the endpoint calls.

Initializing the database

Were building applications, utilities, tools; While its cool to have access to a database connection, its kinda pointless to have an empty database. Thats what the following instruction in the init connect string is for:

INIT=RUNSCRIPT FROM ${cpk.plugin.dir}/resources/ddl/cnp_db.ddl

I literally have in my sparkl app a cnp_db.ddl file with the initialization script to be run at startup. And its simply a set of create X if not exists instructions. My script currently reads:

create table if not exists notifications (
    id INT PRIMARY KEY AUTO_INCREMENT,
    eventtype VARCHAR(64) NOT NULL,
    author VARCHAR(1024) NOT NULL,
    rcpt VARCHAR,
    title VARCHAR(2048),
    message VARCHAR,
    style VARCHAR(64) NOT NULL,
    link VARCHAR
    );

You can complement this with whatever sql instruction that will make sure the system is properly initialized.

Developing the KTR from within spoon

As you wont be surprised to hear, those variables are only populated when the transformation or job is executed from the BA server. Its CPK / Sparkl thats responsible for injecting them; spoon has no idea what cpk is. But we still need to develop and test the transformation in our environment.

So what well do is simply define in kettle.properties the values that would be replaced by cpk. Heres the snippet of my kettle.properties (Im working on a plugin called cnp, but more on that later)

cpk.plugin.id = cnp
cpk.plugin.dir = /home/pedro/tex/pentaho/project-sugar/solution/system/cnp
cpk.plugin.system.dir = /home/pedro/tex/pentaho/project-sugar/solution/system/cnp/system
cpk.solution.system.dir = /home/pedro/tex/pentaho/project-sugar/solution/system
cpk.webapp.dir = /home/pedro/tex/pentaho/target-dist/server/webapps/pentaho/
cpk.session.username = admin
cpk.session.roles = Administrator,Authenticated

Now we can work as we usually do, and the behaviour will be the same wither from within spoon or running from the server

Exploring the database contents

Not only for the development process but to do some extra debugging, we need access to the database contents. PDI offers some abilities to do database exploration but I dont think thats a replacement for a proper sql client.

But this is exactly why the flag AUTO_SERVER=TRUE is on the init string. This allows us to connect to the same database using other clients. Im a user of SQuirreL for a long time, and in order to use it all I needed to do was using the same connect string

Configuring SQuirreL to use our database

The only care we need is to make sure we use the expanded path, as squirrel has no idea what kettle variables are. So my connect string is:

jdbc:h2:file:/home/pedro/tex/pentaho/project-sugar/solution/system/.cnp_dbx;DB_CLOSE_DELAY=-1;AUTO_SERVER=TRUE;INIT=RUNSCRIPT FROM /home/pedro/tex/pentaho/project-sugar/solution/system/cnp/resources/ddl/cnp_db.ddl

This allows me to just use this database as if it was a regular server, from within spoon, the BA server or squirrel.

Putting it all together

In the end I was able to achieve what I wanted - a plugin that includes a storage layer and... just works.

Plugin using H2 embedded database

Cheers!

-pedro

LAPTOP i7

Pages

Thursday, May 11, 2017

Distributing application specific databases with Sparkl

Distributing application specific databases with Sparkl

The challenge

Embedding a database

Initializing the database

Developing the KTR from within spoon

Exploring the database contents

Putting it all together

No comments:

Post a Comment