Querying Google Analytics with Ruby

December 22nd, 2011

A client asked us to show Google Analytics data on one of our sites, so that they would not have to leave our site to get that data. I decided that the best way to do this is to import the requested Google Analytics metrics daily at 3:00 a.m. This way, we query Google Analytics only once per day (rather than on every page request). There is a quota policy on querying Google Analytics data, and this ensures that we don’t come close to filling that quota.

The example below borrows from the Google Analytics curl tester.

There are a few Ruby libraries for accomplishing this task, but my needs were so simple that I preferred to retrieve the XML from Google Analytics and get the metrics from XPath queries.

Google Analytics metrics, request and response, and XML are well-documented.

require 'net/http'
require 'uri'
require 'rexml/document'

http = Net::HTTP.new('www.google.com',443)
http.use_ssl = true

# in order to query Google Analytics, you have to log into Google and acquire an authorization token
path = '/accounts/ClientLogin'
data = {'Email' => 'YOUR_EMAIL',
'Passwd' => 'YOUR_PASSWORD',
'accountType' => 'GOOGLE',
'source' => 'GA-curl-tester',
'service' => 'analytics'}

resp,data = http.post(path,data.to_query)
auth_token = resp.body.split("\n").detect{|x|x=~/^Auth=/}.sub(/^Auth=/,'')

# now use the authorization token get the GA data
data = {'ids' => 'ga:YOUR_ACCOUNT_ID',
'metrics' => 'ga:visitors,ga:pageviews',
'start-date' => '2011-01-01',
'end-date' => Date.yesterday.strftime('%Y-%m-%d')}

path = "/analytics/feeds/data/?#{data.to_query}"
h = {'Authorization' => "GoogleLogin auth=#{auth_token}"}
resp,data = http.get(path,h)
ga_xml_string = resp.body
doc = REXML::Document.new(ga_xml_string)

# you can store these variables in a database or handle them in whatever manner you see fit
ga_visitors=REXML::XPath.first(doc,"//dxp:aggregates/dxp:metric[@name='ga:visitors']").attributes['value']
ga_pageviews=REXML::XPath.first(doc,"//dxp:aggregates/dxp:metric[@name='ga:pageviews']").attributes['value']

Bash script: delta (shows difference between lines of input)

November 17th, 2011

Today, I wanted to take the set:
10
13
15
19

and output the difference between each number and the previous (i.e 13-10, 15-13, 19-15):
3
2
4

What I started with (and didn’t want):

mysql –password=1234 -D stuff -Nse “select myField from myTable”
70636127
70651486
70668927
70678701

What I do want:

mysql –password=1234 -D stuff -Nse “select myField from myTable” | ./delta
15359
17441
9774

I wrote a Bash script (a filter) to do that:

#!/bin/bash
while read line; do
  if [ "$prev" != "" ]; then
    echo $(($line-$prev))
  fi
  prev=$line
done

I did it this way because it’s much cleaner to do it through a filter than it is to do it through the SQL statement. This keeps the SQL statement clean and easy to tweak, and this filter is useful for other purposes such as reading from an input file:

delta < numbers.txt
#this works, too
cat numbers.txt | delta

Send email with multiple attachments from command line

June 27th, 2011

I needed to email a few files to myself from a Linux box. I did not want to use the typical command, which results in two emails:

uuencode file1 | mail -s "your first file" me@mydomain.com
uuencode file2 | mail -s "your second file" me@mydomain.com


I wanted to send them in one email. I tried what’s below, which did not work, because each uuencode command wrote to stdout separately. I ended up with an email only containing second_file.

uuencode file1 first_file && uuencode file2 second_file | mail -s "a bunch of files" me@mydomain.com


Next I tried what’s below, which works, but is just too much code. I prefer one-liners for this sort of thing.

uuencode file1 first_file >> uu.txt
uuencode file2 second_file >> uu.txt
cat uu.txt | mail -s "a bunch of files" me@mydomain.com
rm uu.txt


It turns out that attachments, in an email body, are just concatenated together, so you can chain together the uuencode commands to make an email with multiple attachments.

Notice the parentheses — they’re required, otherwise you’ll only get one file, the last file, as an attachment. The parentheses ensure all of the files are sent to stdout at once (rather than separate instances or processes), and they’re piped to mail as one blob (forgive me if I don’t have my Linux terms correct):

(uuencode file1 first_file && uuencode file2 second_file) | mail -s "a bunch of files" me@mydomain.com


Note that file1 is the name of the file on disk, and first_file is the name of the attachment as it appears in the email

Rotate an image and make it into a sprite

June 3rd, 2011

Today I needed to rotate an image several times and put each rotation it in a sprite. I used ImageMagick’s convert and montage commands. They’re both spectacular tools, and I’ve used less than 1% of their capabilities below. The printf command is naming each file with leading zeroes (e.g. 000.png, 015.png, 030.png, 045.png), and this ensures that they’re named in sequence. Comment out the rm command to see what actually happens.

Here is the image (a rather boring example, I admit):

#!/bin/bash
#make a temp folder
mkdir .tmp

#rotate example-arrow.png across the horizon in 15-degree increments with convert
for i in {0..180..15}
do
  convert -rotate $i example-arrow.png .tmp/$(printf %03d $i).png
  echo rotated $i degrees
done

#make a sprite (a vertical strip) containing all 13 images with montage
montage -tile 1x13 .tmp/*.png sprite.png

#delete the arrows that were shoved into the sprite
rm -rf ./.tmp

And here is the sprite.

Check disk space with threshold, email result, in Linux

May 26th, 2011

A few days ago, my /tmp partition filled up. Oops. It slowly filled up over the course of 11 months. Now, I have a script that emails me when any partition is more than 80% full. I can change the threshold easily.

It uses the df command to print disk usage information, uses sed to trip out the % symbol (not sure that’s necessary — awk seems to treat, say, 25 and 25% equally), then uses awk to output lines that are over the threshold, and then wc sees how many lines awk put out. If awk put out anything, then the threshold was crossed and the output of df will be sent in an email.

#!/bin/bash
threshold=80
if [ "$(df -h | sed 's/%//g' | awk 'NR>1{if($5 > '$threshold') print "WARNING"}' | wc -l)" -gt "0" ]
  then df -h | mail -s "Disk utilization on $(hostname) > $threshold%" me@mydomain.com
fi

Building a Password Regular Expression for Ruby on Rails

April 28th, 2011

Each subdomain on each of our sites has varying, configurable password settings. For example, client A might not care what passwords its users choose. Client B might want a combination of numbers, letters, and symbols with a total length between 8 and 20. I solved this problem by dynamically building the regular expression based on the options chosen by the client.

password_min = 8
password_max = 20
password_min_letters = 0
password_symbols = 2
password_min_numbers = 2

PasswordRegexStubs = {
  :base=>"^.*OPTIONS.*$",
  :length=>"(?=.{MIN_LENGTH,MAX_LENGTH})",
  :symbols=>"(?=.*([@\#\$\%^&\*+=][^@\#\$\%^&\*+=]*){MIN_SYMBOLS})",
  :numbers=>"(?=.*([0-9][^0-9]*){MIN_NUMBERS})",
  :letters=>"(?=.*([a-z,A-Z][^a-z,^A-Z]*){MIN_LETTERS})"
}

options = ""
options << PasswordRegexStubs[:length].gsub('MIN_LENGTH',password_min).gsub('MAX_LENGTH',password_max) if password_min > 0
options << PasswordRegexStubs[:symbols].gsub('MIN_SYMBOLS',password_min_symbols.to_s) if password_min_symbols > 0
options << PasswordRegexStubs[:numbers].gsub('MIN_NUMBERS',password_min_numbers.to_s) if password_min_numbers > 0
options << PasswordRegexStubs[:letters].gsub('MIN_LETTERS',password_min_letters.to_s) if password_min_letters > 0
regex = /#{PasswordRegexStubs[:base].gsub('OPTIONS',options)}/

Now you can use regex however you please…. In Ruby:

'asdf'=~regex
 =>nil

'a$sd3f'=~regex
=>nil

'a$s^d3f2'=~regex
=>0

Or use it in a template or partial for client-side use with Javascript:

  

Speeding up an avg() SQL statement

April 8th, 2011

I came across a slow SQL statement today. Here it is abbreviated. I ended up killing it after 300-something seconds. I want to show the average of my_integer for every day for the past year and a half. There are just over two million rows in the table. The stats table has an index on reported_on,my_integer.

This looks straightforward, right?

select avg(stats.my_integer) as 'Average Integer', reported_on
from stats
where reported_on between '2010-01-01' and '2011-04-08'
group by reported_on

(I killed this after 300 seconds)

This, although it doesn’t seem straightforward, is amazingly fast. Same circumstances apply, all other things are equal — the only index is by reported_on,my_integer. What I’m doing is summing my_integer in one join and counting the rows in another join. In other words, I am retrieving the pieces of the average separately, and I’m averaging them manually.

select
  count_by_day.reported_on `Reported On`,
  sum_by_day.sum/count_by_day.count `Average My Integer`
from (
  select reported_on,
  sum(stats.my_integer) as sum
  from stats
  where reported_on between '2010-01-01' and '2011-04-08'
  and my_integer > 0
  group by reported_on ) sum_by_day
left join (
    select reported_on,
    count(1) as count,
    from stats
    where reported_on between '2010-01-01' and '2011-04-08'
    group by reported_on ) count_by_day
on (count_by_day.reported_on = sum_by_day.reported_on)

462 rows in set (0.49 sec)

ActionMailer from, to, and bcc in Rails 3.0 vs Rails 2.0 and 1.0

March 2nd, 2011

In Rails 1.x and 2.x, if you needed to do something advanced with the recipients of an email you could do something like what’s below. I don’t have a straightforward example of why you would want to do this, but it’s something that I did in a delayed-email-delivery plug-in which worked in Rails 1.x and 2.x but needed a little fixing when I moved to Rails 3.x.

  mail = Mailer.create_foo_email
  mail.bcc "John Stanfield <noreply@johnstanfield.com>"
  from = mail.from_addrs.first
  puts "hello from #{from.name} #{from.address}"
  bccs=mail.bcc_addrs
  bccs.each do |bcc|
    puts "hello to #{bcc.name} #{bcc.address}"
  end

However, in Rails 3.x, TMail (and TMail::Address) is no longer used. Rails 3.x uses Mail (and Mail::Address) which behaves differently. To accomplish what’s above, you have to do this:

  mail = Mailer.create_foo_email
  mail.bcc "John Stanfield <noreply@johnstanfield.com>"
  from = Mail::Address.new(mail.header[:from].decoded).name
  puts "hello from #{from.name} #{from.address}"
  bccs = Mail::AddressList.new(mail.header[:bcc].decoded)
  bccs.addresses.each do |bcc|
    puts "hello to #{bcc.name} #{bcc.address}"
  end

Decrypting Data Encrypted in Java

January 17th, 2011

Recently a client had me do some data decryption. They use Java, I use Ruby.

Not being a Java developer, I can’t say for sure… But it looks like Java returns printable strings rather than unprintable garbage when encrypting (but maybe it was something the client was doing). Ruby prefers unadulterated garbage. It took a call to the client to figure out why I couldn’t decrypt their string using their supplied key. I tried base-64 decoding and a bunch of other stuff. Turns out I had to take their key and string and decode each set of two characters as the hex representation of the garbage.

In other words, “1234″ is not “1234″ but is instead 0×12 followed by 0×34.

This ended up working.

  # string and key are hex encoded -- 89f8 is 0x89 and 0xf8
  string=s2h("12345678")
  key=s2h("90abcdef")

  aes = OpenSSL::Cipher::Cipher.new('AES-128-ECB')
  aes.decrypt
  aes.key = key
  render :text=>aes.update(string)+aes.final

  # convert "1234" into 0x12 and 0x34
  # convert each of those to integer using base-16
  # return the character represented by that integer
  def s2h(s)
    hex=''
    (s.size/2).times do |t|
      h=s[(t*2)..((t*2)+1)]
      hex<<h.to_i(16).chr
    end
    hex
  end

Converting MySQL Output to CSV

January 5th, 2011

I needed an easy way to take MySQL output and turn it into a CSV file. Below is perhaps an acceptable example of what CSV output should look like (notice the escaped double-quotes in the nickname). I’m sure there are other gotchas to watch out for, but I’m working with data in a specific format that doesn’t include odd things like line breaks and/or funky characters.

"First Name","Last Name","Nickname"
"John","Stanfield","""the"" owner of this domain"

OK, so the first attempt (making MySQL write the file) was messy. For various reasons, MySQL will not write to an existing file, so the file must be a new file.

F=`mktempfile`
rm $F
mysql
  --password=xxx
  --user=yyy
  --database=zzz
  --execute="
    select first_name
      ,last_name
      ,replace(nick_name,'\"','\"\"') nick_name
    into outfile '$F'
    fields terminated by ','
      optionally enclosed by '\"'
      escaped by ''
    lines terminated by '\n'
    from my_table"
# now you can do whatever you want with $F

What I finally settled on was using mysql to execute the query, then have sed format it for me, then redirect the output to a file:

F=`mktempfile`
mysql
  --password=xxx
  --user=yyy
  --database=zzz
  --batch
  --skip-column-names
  --execute="
    select first_name
      ,last_name
      ,replace(nick_name,'\"','\"\"') nick_name
    from my_table"
 | sed -e 's/\t/","/g'
      -e 's/^/"/'
        -e 's/$/"/g' > $F
# now you can do whatever you want with $F

The sed command turns this:
field1<tab>field2<tab>field3
into this:
field1″,”field2″,”field3
into this:
“field1″,”field2″,”field3
into this:
“field1″,”field2″,”field3″