Facebook Linter / Open Graph cuts off the URL path
I've been scouring the web and StackOverflow for an answer, but I've found no case that exactly applies to my situation. I'm using Facebook Linter to debug the way FB is scraping my meta tags. If I use it on a simple About page, it picks up everything fine, particularly the og:url meta tag.
See: http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Felectionstats.com%2Fabout%2Fprivacy_policy
The trouble starts when I scrape my normal content pages. Although I've triple-checked that my tags are formed well, the FB Linter cuts the URI off the URL, so it reports that the og:url tag only has the domain name, electionstats.com/!
See: http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Felectionstats.com%2Fsearch%2Fyear_from%3A2010%2Fyear_to%3A2010%2Foffice_id%3A6开发者_开发问答
The og:url tag that is actually on the page looks like this:
I am skeptical that it is an issue with FB caching the pages, because on my About pages I have made quick code changes that change the meta tag output, then re-run the same page through the Linter, and the Linter shows these quick changes, without fail, every time. But for some reason, when I try dozens of different URL combinations on the main content pages (the /search/ pages), I always get a cut-off URL and consequently only meta fields from my homepage.
I had even theorized that FB will ignore a URL that looks like a "search" page, so I re-routed the URL and the title tag to use the nomenclature "explore" instead of "search", but this still did nothing -- the URI would still get chopped off.
Oy, this is embarrassing.
I have code at the beginning of each page request that detects if the user's browser accepts cookies; if not, it kicks the user back to the homepage. The Facebook web crawler, like other web crawlers, does not use cookies. Thus, it kept ending up back on the homepage and reading the homepage's og/meta tags. The greater unintended consequence of my code was that it kicks out ALL web crawlers trying to get a sense of my website, including Google's.
The fix: skip the cookie-handling check if the user agent string matches part the UA provided by common web crawlers, e.g http://www.cult-f.net/detect-crawlers-with-php/
精彩评论