开发者

How to find out the language (not programming!) of a web page in C#

Say I open a website in Chrome and it's in Russian. Chrome tells me it's in Russian and offers to translate it for me. How ca开发者_Python百科n I find out the language of a web page using C#? It's love to find out the actual language such as English, Spanish, Russian etc.


You could try parsing the <meta http-equiv="language" content="ru" /> and <meta http-equiv="content-language" content="ru" /> tags in the head of a page.

Usually these tags are not available on every page.

I think if these tags are missing Google does kind of "word lookup" in an internal database to try to determine the most probable language of the page.

Edit

You could also use the SOAP API of Bing to detect the language.

An example from their site:

var client = new TranslatorService.LanguageServiceClient();
var result = client.Detect(
    "myAppId", 
    "I have no idea what this language may be");

Console.WriteLine("The detected language friendly code is: " + result);

Just extract some text (e.g. with HTML Agility Pack) from the HTML page you want to detect from and pass it to the SOAP function.


Use Google's api , send some (or all?) text from the page to the API to detect language.

For .NET library, see answer to this question.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜