Kibana DashboardAs mentioned elsewhere, I’ve been setting up some infrastructure for a project I’m working on using the Galera Cluster for MariaDB (see https://mariadb.com/kb/en/mariadb/what-is-mariadb-galera-cluster/ for more info on this), and also setting up a centralised log collection and analytics platform using the ELK stack from elastic.co.

The application that collects and processes is called logstash and it has the ability to process logs in a variety of formats, many of which are predefined such as Linux syslog and Apache access or error logs. Where a pre-defined format isn’t available, there is a ‘scripting language’ available so that you can write your own parsing. It quickly became apparent there is no existing filter for processing MySQL & MariaDB / Galera logs for logstash though.

Generally this is fairly straightforward, but we’re using Galera Cluster for MariaDB (the well-known fork of the MySQL database), and it’s quickly become apparent that the MySQL logs are a strange format. I’ve spent some time recently and put together a definition which can now handle both standard MySQL and MariaDB logs, and can also parse out the messages for the Galera cluster which are held in the same log file and pass the data for logging in the ELK backend, elasticSearch.

MySQL / MariaDB log format.

Below you can see an extract from a sample logfile from MariaDB & Galera.

 As you can see, most but not all lines start with a timestamp – but the format of the timestamp changes depending on the time of day, with times before 10AM only having a single hour for the digit. Some of the log entries have a severity level indicated in square brackets such as [ERROR] or [Info], and some of the entries even span multiple lines.

All in all, not a straight-forward format to deal with, but I’ve created the filter below which deals with this in logstash.

Logstash Filter

 When the remote machines collect the log entries and send it to logstash, they specify the type of entries we’re receiving .In this case, I’ve set the type to MySQL and we can check for this in line 3 of the filter.

Once we’ve identified we’re working with MySQL/MariaDB logs, the first step is to deal with the entries that span multiple lines (lines 4-8), checking for if the line begins with a timestamp and if not, bundling it with previous lines.

Lines 11-15 then parse out the timestamp and create a new field called mysql_time . Importantly, we must remember to process the message as a multi-line field, and this is specified in the (?m)  modifier passed to the grok pattern in line 12. We remove the time and date from the log entry (stored in the message field) and update the message field with the remaining text in the entry.

Next, in lines 18-23, we look to see if the entry contains a severity code delimited by []’s, and if so strip it out of the log entry.

Finally, we look to see if the remaining text in the log entry starts with WSREP: which signifies that the message is coming from the Galera cluster controller. If so, we change the log entry type from MySQL to Galera, before stripping out WSREP: from the message and passing the remaining data as the final log entry.

Hopefully, this should be useful to anyone needing to parse MySQL or MariaDB log entries for logstash, especially if the database is being used in conjunction with Galera.