Google Analytics Opt Out? Not really.

Google just announced it is offering customers the ability to opt out of google analytics tracking. But what does that mean for your privacy?

Here's the announcement: Here and the download link for the opt-out is Here

Let me start with a full disclosure. In a past life I worked for a small-time analytics company, and I architected a web-statistics system from scratch... tracking js, scaling servers, database analysis the whole bit. In-fact, google just recently caught up to the combinatorics approach I created a long time ago -- though they dont do it well. That said, this post is about privacy, and while I've always taken the tact that when browsing the web you're passing out a lot of information about your computer, web browser and browsing habits. I'm ok with that. I'm aware of my disclosure and I'm happy to share this information -- however, with these google announcements, and recent developments in the EU, its clear that some people have a real problem with google collecting this much information.

Now. Opt-Out. Is it really? The short answer is no. The long answer is well-sorta.

First thing to note is that the google analytics blocker does not stop your browser talking to google-analytics.com. Period. what it does, under certain scenarios, is limit a couple pieces of JS-collected information. It may also have internal effects at google, however, none of that matters if the data gets to their server in the first place.

So lets dissect. More after the break.

I'm going to use two websites for analysis. First, my corporate site -- it has not been updated in a long time and uses the old-school urchin.js method for analytics collection. The second, a site google analytics advertises as a client, the Huffington Post.

First, my corporate site. Google Analytics Opt Out does NOT LIMIT any information that would otherwise have been sent. Because my site uses the urchin.js method, the following data is still collected.

Note: Test rig, Firefox 3.6.3, Live Http Headers, Ubuntu 9.10, 64 Bit Architecture. En-Us locale.Google Analytics Opt Out installed and enabled.

Upon loading my site from google, the following URL is accessed: Bold for important data.

http://www.google-analytics.com/__utm.gif?utmwv=1.3&utmn=1128239546&utmc...1680x1050&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmcr=1&utmdt=StormTide%20Digital%20Studios%20Inc&utmhn=www.stormtide.ca&utmhid=845063553&utmr=http://www.google.ca/search?q=stormtide&ie=utf-8&oe=utf-8&aq=t&rls=com.u...&utmp=/&utmac=UA-2054892-1&utmcc=__utma%3D98697564.1245834129.1274809649.1274809649.1274809649.1%3B%2B__utmz%3D98697564.1274810870.1.2.utmccn%3D(organic)%7Cutmcsr%3Dgoogle%7Cutmctr%3Dstormtide%7Cutmcmd%3Dorganic%3B%2B

In addition to these pieces of data in the URL, the following data is gleaned from the HTTP headers (extracted, there are more headers that aren't relevant)

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100407 Ubuntu/9.10 (karmic) Firefox/3.6.3
Accept-Language: en-us,en;q=0.5
Referer: http://www.stormtide.ca/

Additionally, my IP address is determined from the TCP/IP connection used to access the server.

To summarize: When google opt-out is enabled. Google still collects.

Screen Resolution: 1680x1050
Screen Depth: 24bit
Language: en-us
Google Analytics Account: UA-2054892-1
Page Title: StormTide Digital Studios Inc.
Domain: www.stormtide.ca
ORIGINAL REFERRER: http://www.google.ca/search?q=stormtide&ie=utf-8&oe=utf-8&aq=t&rls=com.u...
Referrer: http://www.stormtide.ca/
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100407 Ubuntu/9.10 (karmic) Firefox/3.6.3
IP Address: 0.0.0.0 (Redacted for privacy)
IP derived information: ISP, Approximate Location, Country, and potential tie-in to internal databases by IP.

... So... really. No opt out at all. Might not be passing the data through to my analytics account. (Not verified that either). So what happens when it encounteres a GA.js analytics script the method purported to be blocked by the add-on? The huffington post uses GA.js lets see what information I send to them on a page load. Scenario, google search for huffington post, click the link to their site.

A request is made for ga.js

This includes the information:

http://www.google-analytics.com/ga.js
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100407 Ubuntu/9.10 (karmic) Firefox/3.6.3
Accept-Language: en-us,en;q=0.5
Referer: http://www.huffingtonpost.com/

Plus IP address.

So a lot of data appears to be missing, add-on working right? Lets examine further. They still get the url, from the script referrer (so they know what im looking at, but not the original referrer from google). They don't collect the screen resolution or bit depth Now, huffington post, you'll see is on the google analytics homepage -- they're proud of the integration for this site. Lets expand the opt-out to include data set to Other Google Tracking Scripts on the page. The most notable is the googleadservices.com

It requests the following url:

http://www.googleadservices.com/pagead/conversion/1026439070/?random=127...1050&u_w=1680&u_ah=1050&u_aw=1680&u_cd=24&u_his=4&u_tz=-420&u_java=true&u_nplug=7&u_nmime=78&url=http%3A//www.huffingtonpost.com/

There's that pesky screen resolution again. 1680x1050 @ 24 bits depth. They also collect a u_tz(-420) (timezone?) and a u_java=true indicating that i have Java support.

So now we're missing the google analytics account: the UA-12345. But google has a web crawler and by matching the url in both the referer and/or the googleadservices url= variable, you can easily match the page to the account. The data is there. So lets assume, if they want it, its easily gotten. Same goes for page title.

Whats missing? Well, the original referrer. My google search. Well, it does appear that installing the GA.js blocker stops that info from going to google directly, however. Our friends at the huffinton post happily collect that information anyway. Through a request to vertical-stats.huffingtonpost.com

http://vertical-stats.huffingtonpost.com/?-1&http%3A//www.google.ca/search%3Fq%3Dhuffinton+post%26ie%3Dutf-8%26oe%3Dutf-8%26aq...&a51be5357b88e8

So.... in the end. Does the google analytics opt out prevent any data from getting to google. Maybe. The original referrer seems to be left out, but otherwise, there's still a ton of data being sent to google. Do they treat it differently internally if you've opted out, most likely -- but does that matter, and have they said they dont store/archive that data for their own internal use, even if they don't pass it to the analytics subscriber?

Now, I'm not against web stats. I use google analytics on my sites, Ive designed stats systems, and I even do hostname analysis on this site, and other properties I run. That said, I'm not out there giving people the impression that they've opted out when they're still sending me a lot of data.

What google could have done with this add-on would have been to completely stop any requests to google-analytics.com, googleadservices.com and whatever other data collection mechanisms they employ across the web. But then, that would actually break the statistics systems for people who want to opt out...

$0.02