开发者

Google Analytics API - Choice of Metric Affects Dimension Values Returned?

Good morning. I've seen this behavior in the Google Analytics API, which as a SQL guy I find bizarre. I'd like to get a list of all values for adContent, so I query ga:adContent and (because I must also select a metric, for no well-defined reason) ga:organicSearches. It's in the same group (Campaign), so maybe it'll perform better back on the server.

I get one row: adContent is "(not set)", organicSearches is 516,674. Huh, I guess adContent isn't being used. But the marketing department swears that it is, and produce some convincing screen shots.

Later on, I arbitrarily change the metric to ga:transactions. In the universe I woke up in, this should have absolutely no impact on anything, e开发者_StackOverflowxcept the actual value returned in that column. Instead, I get zillions of rows, with plausible values for ga:adContent. The value for ga:transactions is sometimes zero, so it's not the case that GA was filtering for "metric > 0".

There are no filters in my query. I did not change the date range between these two variants. Can anyone tell me what's going on? I expect the above queries to translate to something like this, which should return exactly the same number of rows:

SELECT adContent, SUM(organicSearches)
FROM Campaign
WHERE Date BETWEEN X AND Y
GROUP BY adContent

SELECT adContent, SUM(transactions)
FROM Campaign INNER JOIN ECommerce ON <something>
WHERE Date BETWEEN X AND Y
GROUP BY adContent

I realize that GA probably isn't using an ordinary RDMS on the back end, but surely 1 + 1 still equals 2 in any database!


By definition ga:organicSearches will almost never have any matches for ga:adContent (edge cases aside). ga:adContent is for the content of an advertisement, where ga:organicSearches is for organic search result visits within a session (like if you use Google multiple times within the same session to try to find something specific on a site). Don't use it for anything besides trying to measure that particular phenomenon.

Try not to use an SQL mindframe here; Google Analytics doesn't use SQL on the backend, so the notions you have of traditional relationships aren't applicable. IIRC, they use a few things, amongst them a BigTable variant, which is a NoSQL-type database.

From a Google Paper on BigTable from 2006:

We briefly describe two of the tables used by Google Analytics. The raw click table ( ̃200 TB) maintains a row for each end-user session. The row name is a tuple containing the website’s name and the time at which the session was created. This schema ensures that sessions that visit the same web site are contiguous, and that they are sorted chronologically. This table compresses to 14% of its original size. The summary table ( ̃20 TB) contains various predefined summaries for each website. This table is generated from the raw click table by periodically scheduled MapReduce jobs. Each MapReduce job extracts recent session data from the raw click table. The overall system’s throughput is limited by the throughput of GFS. This table compresses to 29% of its original size.

If you want the lowest common denominator for a metric for a list of all dimensions, use ga:pageviews.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜