Example MySQL to Gmail script

This is a usage example for the MySQL to HTML table combined with the BI mailer. We will make use of the two previous scripts to easily create a scheduled email sending from a MySQL database through a Gmail address with minimal effort.

The idea behind it was to be able to create a new email with only the base building blocks: the MySQL query for data source, subject/recipients as envelope/destination.

The advantage of this combination is the necessity of only 3(!) lines of code for a new email, allowing you to leverage email as an easy and flexible distribution channel for BI data.

In the example below we also add a greeting along with a relevant subject line, which raises our grand total to 4 easily readable lines of code to send an email from MySQL to Gmail. The connection string I used for the query is an open MySQL database with DNA info (this should go in your sql_to_html file).

The email contains a random sample of DNA shortened to 64 characters length. It takes a MySQL query and sends an HTML table via Gmail.

Result:
MySQL to Gmail result

While this example is Gmail specific, it can be easily changed to use any email service. Keep in mind that even when using Gmail you should be able to reach any other email, and if you choose to avoid Gmail altogether, you can just change the bimailer.py to use any other email server.

MySQL query to HTML table with Python

For easy email reporting, I like to send plain HTML tables. An easy implementation is to take the query output and HTML-ize it.

The script below does just that – it connects to MySQL, sends a query, and turns the result into an HTML table. This table can further be used to be sent with the bimailer, or in any other way.

The MySQL connector for Python can be found on the MySQL official website here.

Scheduled emails with Python for BI

As part of reporting in business intelligence, scheduled emails are a very useful tool. They allow you to send various daily summaries, or attach larger reports.

For these purposes I put together a Python BI-mailer class to allow for easy management of all things you might normally require to send, such as text, summary/KPI html tables, embedded images or charts, or attach larger reports.

I will put the code here, and in the next articles I will give examples of how to use this easy Python emailer to accomplish daily reporting chores.

With this script you are able to:
send an email
optionally with text or multiple tables in body
optionally with attachments
optionally use the attachments as embedded images.

 

MySQL Federated database creation

MySQL does not support entire federated databases, but it does support federated tables  – as long as the option is enabled in my.ini or my.cnf.

To create an entire federated DB, in a non manual, and easy to maintain way (refresh table definition if source table structure changes), we can use the following procedure.

First, the caveats:

1. Federated tables do not use indices in the usual way, so you may need to create a table locally and index before expecting any performance.

2. When using a client such as Heidi, that does show table status, the connection between your DB and the source DB of the fed table remains open. An issue can arise if the number of tables is significant, while max connections on source db is low. To get around this, periodically use flush tables on the destination DB to close the connections. There are alternatives to this, but this is the simplest workaround (may not be the best).

Personally, to get around this issue I scheduled a MySQL event that flushes every federated table (not FLUSH TABLES, but FLUSH TABLE TABLENAME;FLUSH TABLE TABLENAME2), to avoid blocking other queries. FLUSH TABLES flushes all tables, so it has to wait for other queries to finish, and any new queries have to wait for it to finish, causing no possibility of concurrent queries.

1. Get a nodata mysqldump in the same way as you get a normal dump, but with -d switch for no data

 

2. Convert the engine part of the tables into federated+connection (here done with Python)

 

3. Load up your new dump

 

You can run this periodically or on demand to refresh the table definitions if the source definitions change

Stream gzip with Ruby

Here is a quick overview on how to stream gzip with Ruby: For the purpose of an OS-agnostic script to gzip a SQL dump, I needed a Ruby script. Unfortunately, for large files Ruby would return an error if the file size is too large to fit in the RAM. The error looks as below:

So I decided to do a sort of stream, combining some basic examples online. As I was not able to find something ready made, I combined file streaming with gzipping.

The script below does a SQL dump with date, gzips it in chunks with Ruby, and deletes the non-zipped file. (It is written for a Windows server, so change the system commands if moving to Linux.)

Additionally I wanted to check performance, so I also added timing.

Stream gzip with Ruby

There you go, an easy way to stream gzip with Ruby. Enjoy!

Normalizing codes, mySQL extracting alphanumeric functions

Wanted to share an sql function I end up using quite frequently when dealing with user input data – normalization to alphanumeric. It can also be modified into a function to extract numeric values. If you do modify it to that, make sure you keep returns varchar for the cases of extracting numeric values beginning with 0. You can also add other characters, such as the minus sign, if required.

Example use cases:

Matching producer codes:

Normalizing/matching  phone numbers:

 

function:

 

KPI emails with google charts url parameters

I recently created a job that calculates some KPIs and writes a txt file that gets picked up as an email body for daily delivery. It was a few lines of text, with the date and kpi value, like the example below

2014-01-16  New Customers: 7    Sold_Items: 15
2014-01-15   New Customers: 6    Sold_Items: 12
2014-01-14   New Customers: 5    Sold_Items: 6
2014-01-13   New Customers: 4    Sold_Items: 8
2014-01-12   New Customers: 3     Sold_Items: 8
2014-01-11   New Customers: 2     Sold_Items: 4
2014-01-10   New Customers: 1     Sold_Items: 3

It was the easiest approach to just concatenate date, customers, items coming from my data source with some strings, but of course it was hard to read. There were several such blocks of text in it, so while the data was there, it was hard to see a trend.

the sql for it went something like:

SELECT concat_ws(' ', date, ' New Customers:', users, ' Sold_Items:', items)
from kpi_table
ORDER BY date DESC
LIMIT 7
INTO OUTFILE 'C:\\folder\\email_kpi.txt' FIELDS TERMINATED BY ';';

Easy, but by far not the best way to deliver an email report.

To properly display this data with the idea in mind that the recipient will have to read and understand it, it must be represented differently.

Best ways are a visual aid, along with a structured format to allow the user to work with the data, so I decided for a chart and a table. Luckily, google charts offers a simple way that allows you to put parameters in an url, which will then return the chart. So I had a go at it, and came up with below.
Note the url is not secure; on the other ahnd, this is not a problem, as you can only access the specific url with the data you put in it. In other words, if you don’t have the data in the first place, you can’t access it. It’s not dynamic, so there’s no possibility of future leaks.

 

Google Image chart for emailing, with parameters in url
Google Image chart for emailing, with parameters in url

As for the table, a simple html table will do – it allows you to copy paste straight into spreadsheets and keep the formatting, and the simpler the better. Table formatting may vary, it will normally look pretty plain, but some email clients support inline css, so if you have the time, feel free to experiment.

date New Customers Sold Items
2014-01-26 15 30
2014-01-25 20 40
2014-01-24 49 80
2014-01-23 45 73
2014-01-22 22 40
2014-01-21 21 50
2014-01-20 18 33

What about the code to create this every day?

I use plain mysql that generates the two lines as text

 

 

for the table:

for the chart:

For the google image chart it’s a bit more work to get the vertical limit and the axes to update according to your maximum values, but otherwise pretty straightforward. The easiest way to figure out what you need is in the google playground in the link below. Alternatively you can just change the parameters in the url of the google chart above

https://developers.google.com/chart/image/docs/chart_playground?

My parameters were as follows:

cht=lc
chs=650x300
chd=t:30,40,80,73,40,50,33
15,20,49,45,22,21,18
chds=0,100
chxt=x,y
chxs=0,022222,11,0,lt
1,011111,10,1,lt
chdl=Items_Sold
New_Customers
chxl=0:
2014-01-20
2014-01-21
2014-01-22
2014-01-23
2014-01-24
2014-01-25
2014-01-26
1:
0
20
40
60
80
100
chco=044444,099999
chg=16.67,20
chm=N,004040,0,-1,11
N,005050,1,-1,11
chtt=Daily_Sales

You end up with just one line for each, perfect for shooting into a html email. Easy to read, looks good.

Easy html email
Easy html email

and the final result:

google charts email with html table
google charts email with html table

Stay tuned for more!

 

Cleaning noisy data: Postgresql Mode and Modal Share aggregate function

Calculating mode and modal share on postgresql has never been easier than just running these functions.

Some background:
I needed to calculate the aggregate mode and the frequency of that mode to be able to draw some conclusions about the stability of some particularly noisy data. Sample standard deviation can be very misleading for measuring data noise, as it’s a mix of amplitude of variation and frequency.

Imagine measuring the best temperature to drink wine based on what people would report.

Some would say for a particular wine the temperature would be 10 celsius, while many others will say 11-12 degrees.  As we are talking about user input, someone may jokingly recommend drinking it at -20 Celsius or at 120 Celsius. The data cannot be trusted.

So let’s take the array recommended_temperature [8,10,10,12,11,10,10,12,10,11,-20,200,10,10]

Finding the right temperature to serve the wine at from this array can be tricky. Data is noisy and not to be trusted.

Total Numbers: 14

Mean (Average): 21.71429

Standard deviation:51.95624

Variance(Standard deviation):2699.45055

Do any of the above numbers even come close? No. You would not serve the wine at an average of 22 Celsius, and the standard deviation is high, but that doesn’t tell you if the values are spread, or if an outlier is messing everything up.

What about the mode? Mode in our case is 10 (the most common value)
Sounds reasonable, wine tastes nice at 10 degrees. But how sure are we that this is the right temperature?
We need more info, mainly modal share (number of values in the mode/all values)

For that, I put together in PostgreSQL a mode_count function, to allow you to find how many values you have in the mode, and possibly calculate a % of all values.
so now we have Mode:10;Modal_share 0.5 (7/14)
We can say with 50% certainty that the appropriate temperature to serve the wine at, based on user recommendations, is 10. Add in the high standard deviation, and you will easily detect the fact that there are strong outliers present in the data, giving you even more confidence towards your mode value.

The function from postgres manual for mode: 

and the custom mode count function:

They are aggregate functions, so call them as:

Hope you enjoy!

Installing a lightweight PHP dev server

I have written before about creating a live chart based on Google Charts API. This post is for those who are new at this, or non technical.

If you just wish to try out your PHP and use it locally, I recommend EasyPHP. It is a dev server, so you cannot publish your pages to the web, however I recommend you do not create a webserver from a personal PC anyway, unless you are willing to lose all your local data. You can get the devserver here.

Download and install it, and remember where you install it. If you are using windows, make sure that your windows user has the right to access and modify files at the installation path (Would recommend installing it outside of c://program files)

 

Changing the default port to avoid clashes with other software:

The default port for your new dev server will be port 80. A lot of other software (even skype) can use that port by default, so I would change it right away before any clashes occur. In the install folder, under binaries/conf_files folder  you will find a file called httpd.conf . Open the file in a text editor such as notepad, and look for the text  ‘Listen 127.0.0.1:80’ and ‘ServerName 127.0.0.1:80’ . Here we can see the port  that we have to change (:80) at the end of the IP. Personally, I like to start from 8079 and go down, as some software has 8080 as default. Change the port to 8079, so now the ip will look like 127.0.0.1:8079 .

‘Uploading’ files:

Every web server has a root folder for its files, in the case of EasyPhp it will be installfolder\data\localweb . You can create folders there to place your PHP files, and then access them through your browser. For example, I have a ‘charts’ folder, with the file chart.php inside.

I would access by typing 127.0.0.1:8079/charts/chart.php in my address bar. If you want to troubleshoot your file, you can view the source by accessing ‘view-source:127.0.0.1:8079/charts/chart.php’

EasyPHP
Accessing your PHP file

If you want to make this content accessible online, I bid you not to do so from your local computer. But if you really need to, you can install XAMPP web server. Easyphp is for local access only. I only use it to create files and test them locally, and once they are done I move them to my hosted web server. A host is cheap, so don’t put your personal computer at risk. Besides, most home internet service providers will give you a very limited bandwidth for uploading, so when somebody tries to access your locally hosted webpage, it will load incredibly slow.

Have fun with your PHP files!

Google Charts API – from your SQL database to a live chart with no coding skills

While data will most often be used as a tool to assist in making decisions, it sometimes also serves as a motivator. Live dashboards can allow your client to react quickly to any changes in their KPIs, or can assist a sales team in seeing how close they are to their daily targets.

Stick them in fullscreen mode on a wall display or a TV, and they will be there to offer this transparency non stop.

There are quite a few pieces of software out there that allow you to create live, automatically updating, automatically refreshing dashboards, but most of them cost more than it’s worth, and some are quite glitchy. I will not go into these tools now, but will rather offer a very fast and easy alternative.

When looking for this, my first requirement was that the refresh is easy to accomplish and smooth. HTML offers that, so I wanted something that can be wrapped in HTML.

HTML also offers the advantage of being flexible in regards to where you put it: as standalone on a dedicated web page, or embedded in an admin frontend or dashboard panel.

Google has a nice free Javascript API for creating charts, compatible with data pulled from almost any source. You can find more details about the look and feel of it here.

My final result was this, for a daily chart updated every 3 minutes:

Daily Updating Chart
Daily Updating Chart

 

Google does not provide any connectors for MySql or other common Sql, and most often that’s where live data will be.

I tried looking around for some resources that will pull the sql data into the chart, but my search was not very fruitful. Google API wants the data in a very specific format, so I took the liberty of writing a little PHP to do the job.

If you are not interested in how this works, just scroll all the way down and copy paste the full code into Notepad or TextEdit and save it as PHP. If you want to edit the code, I use Notepad++ as it highlights the operators.

1. First part: connecting to the database

 

2. Second part: transforming the data in the right format. We do this with an echo.

 

 

3. Final part: creating the final PHP with the Google API. You can find all the documentation for Google Charts API here

Note the refresh is set to 180 seconds with meta http-equiv="refresh" content="180" The php is run, query pulls data, and only after that is successful, the chart refreshes, so the update will be very smooth to the eye.

 

4.  Running the PHP file

To run the PHP you need a dev server, a web server, or someone who can upload it somewhere for you. I will cover how to install a lightweight dev server in a separate post. I recommend EasyPHP dev server if this is all you will ever do with it, alternatively XAMPP is a better option allowing you to publish to web as well. Guide for installing and configuring easyphp here.

 

5. Result:

Month_example

 

So I hope you enjoyed this guide, and that you will find this useful. One final note: Google API has all kinds of charts, and it’s easy to switch between types. You can change this chart to a column chart by changing just one word in the html. Just give it a shot 🙂