Example MySQL to Gmail script

This is a usage example for the MySQL to HTML table combined with the BI mailer. We will make use of the two previous scripts to easily create a scheduled email sending from a MySQL database through a Gmail address with minimal effort.

The idea behind it was to be able to create a new email with only the base building blocks: the MySQL query for data source, subject/recipients as envelope/destination.

The advantage of this combination is the necessity of only 3(!) lines of code for a new email, allowing you to leverage email as an easy and flexible distribution channel for BI data.

In the example below we also add a greeting along with a relevant subject line, which raises our grand total to 4 easily readable lines of code to send an email from MySQL to Gmail. The connection string I used for the query is an open MySQL database with DNA info (this should go in your sql_to_html file).

The email contains a random sample of DNA shortened to 64 characters length. It takes a MySQL query and sends an HTML table via Gmail.

Result:
MySQL to Gmail result

While this example is Gmail specific, it can be easily changed to use any email service. Keep in mind that even when using Gmail you should be able to reach any other email, and if you choose to avoid Gmail altogether, you can just change the bimailer.py to use any other email server.

MySQL query to HTML table with Python

For easy email reporting, I like to send plain HTML tables. An easy implementation is to take the query output and HTML-ize it.

The script below does just that – it connects to MySQL, sends a query, and turns the result into an HTML table. This table can further be used to be sent with the bimailer, or in any other way.

The MySQL connector for Python can be found on the MySQL official website here.

Scheduled emails with Python for BI

As part of reporting in business intelligence, scheduled emails are a very useful tool. They allow you to send various daily summaries, or attach larger reports.

For these purposes I put together a Python BI-mailer class to allow for easy management of all things you might normally require to send, such as text, summary/KPI html tables, embedded images or charts, or attach larger reports.

I will put the code here, and in the next articles I will give examples of how to use this easy Python emailer to accomplish daily reporting chores.

With this script you are able to:
send an email
optionally with text or multiple tables in body
optionally with attachments
optionally use the attachments as embedded images.

 

MySQL Federated database creation

MySQL does not support entire federated databases, but it does support federated tables  – as long as the option is enabled in my.ini or my.cnf.

To create an entire federated DB, in a non manual, and easy to maintain way (refresh table definition if source table structure changes), we can use the following procedure.

First, the caveats:

1. Federated tables do not use indices in the usual way, so you may need to create a table locally and index before expecting any performance.

2. When using a client such as Heidi, that does show table status, the connection between your DB and the source DB of the fed table remains open. An issue can arise if the number of tables is significant, while max connections on source db is low. To get around this, periodically use flush tables on the destination DB to close the connections. There are alternatives to this, but this is the simplest workaround (may not be the best).

Personally, to get around this issue I scheduled a MySQL event that flushes every federated table (not FLUSH TABLES, but FLUSH TABLE TABLENAME;FLUSH TABLE TABLENAME2), to avoid blocking other queries. FLUSH TABLES flushes all tables, so it has to wait for other queries to finish, and any new queries have to wait for it to finish, causing no possibility of concurrent queries.

1. Get a nodata mysqldump in the same way as you get a normal dump, but with -d switch for no data

 

2. Convert the engine part of the tables into federated+connection (here done with Python)

 

3. Load up your new dump

 

You can run this periodically or on demand to refresh the table definitions if the source definitions change

MySQL Rank() function

UPDATE: On July 18 2018 the MySql server team announced the introduction of window functions. The article was written before this, offering an workaround to gain some similar functionality. I updated this article now with the latest and best practice way to do this correctly, but sill leave the old article for reference.

AS of Mysql 8, you can now use window functions such as  rownum() OVER(PARTITION BY country) . Interestingly, you can just reference your partition by an alias, and define the partition in a separate window clause section (like where clause), allowing reuse:

 

Happy window aggregating!

OLD ARTICLE FROM 2014 BELOW:

A quick and dirty custom MySQL Rank() function to emulate rank from PostgreSQL into mySQL. Same disclaimers as the custom rownum() function. In this case, I am using it to rank decimals, but feel free to change the input type.

 

to invoke, use rank(column to rank),such as below:

output:

MySQL rownum() row number function

UPDATE: On July 18 2018 the MySql server team announced the introduction of window functions. The article was written before this, offering an workaround to gain some similar functionality. I updated this article now with the latest and best practice way to do this correctly, but sill leave the old article for reference.

AS of Mysql 8, you can now use window functions such as  rownum() OVER(PARTITION BY country) . Interestingly, you can just reference your partition by an alias, and define the partition in a separate window clause section (like where clause), allowing reuse:

 

Happy window aggregating!

 

OLD ARTICLE FROM 2014 BELOW:

MySQL does not have a built in row number function, which makes it an ugly hack to get the row number –  an ugly hack that needs to be repeated over and over, every time you need it.

I have taken the liberty of writing a rownum() function, that is just as bad, but at least it will keep your code clean.

DISCLAIMER:

  1. does not work on older versions of mySQL
  2. It is based on using variables that only reset when time changes. If the function is invoked in 2 separate places in the same microsecond(1 millionth of a second), the output will not be as expected. The variable will not reset, the output will not start counting from 1.
  3. If you run the function more than once in the same query, it will increment every time it is run, so possibly multiple times per row.

For the purposes of BI, I find the associated risks, when used as intended, non existent.
For older versions of mySQL server, you can try finding another creative way to reset the counter for the row number.

 

Row number function

 

to invoke it, use rownum(), such as below

Good luck!

Get yesterday’s date in a windows batch file – correct version

Hi all,

If you use a windows server and get yesterday’s date for something like a download, you might probably use this popular script that can be found here and there. Unfortunately, the script is somewhat broken when dealing with getting the last day of the month. The logic is solid, but somewhere between the idea and implementation, something was lost.

see below the working, correct version:

 

 

Stream gzip with Ruby

Here is a quick overview on how to stream gzip with Ruby: For the purpose of an OS-agnostic script to gzip a SQL dump, I needed a Ruby script. Unfortunately, for large files Ruby would return an error if the file size is too large to fit in the RAM. The error looks as below:

So I decided to do a sort of stream, combining some basic examples online. As I was not able to find something ready made, I combined file streaming with gzipping.

The script below does a SQL dump with date, gzips it in chunks with Ruby, and deletes the non-zipped file. (It is written for a Windows server, so change the system commands if moving to Linux.)

Additionally I wanted to check performance, so I also added timing.

Stream gzip with Ruby

There you go, an easy way to stream gzip with Ruby. Enjoy!

Using Cost Revenue Ratio to measure Campaign Efficiency

The Cost Revenue Ratio (CRR), also known as efficiency ratio, is a way of comparing expenses to revenue. While the CRR is mainly used in banking businesses, it is also an interesting key performance indicator for performance marketers.

The calculation is very straightforward: Just divide the spendings of your campaign by revenue of that campaign. The result will be a percentage, telling you how much you have spent to generate one euro/dollar of revenue, meaning: the lower the percentage, the more efficient at creating revenue. If your result is larger than 100%, your costs are higher than the created revenue – optimization of the campaign is urgently needed!

CRR = Spend/Revenue
CRR formula

To use this metric for measuring your Google Adwords campaigns, make sure to assign a value to each conversion action. This will enable you to use the “Total conv. Value” column in your campaign overview, you will need it to calculate the CRR.

The result could look like in the following example:

extract campaign report

Here you see an extract from a campaign performance report with the Cost Revenue Ratio in the last column. Cost of the respective campaign gets divided by total conversion value. When not having any CRR benchmarks from your financial department, it might be a good idea to use the brand campaign as orientation. This campaign will most likely reach the best cost revenue ratio you can find in your account, so the rest of your campaigns should be somewhere between this and 100%, the closer to the brand campaign benchmark, the better.

Normalizing codes, mySQL extracting alphanumeric functions

Wanted to share an sql function I end up using quite frequently when dealing with user input data – normalization to alphanumeric. It can also be modified into a function to extract numeric values. If you do modify it to that, make sure you keep returns varchar for the cases of extracting numeric values beginning with 0. You can also add other characters, such as the minus sign, if required.

Example use cases:

Matching producer codes:

Normalizing/matching  phone numbers:

 

function: