How to determine whether a web page has RSS or not in C#
I have a task to do.
I need to download a web page and to see if the page contains any RSS feeds.
I know how to download a web page to string using Http APIs in C#, but how can I determine the http page开发者_JAVA技巧 string contains any RSS feeds or not?
Thanks
Jack
I expect you would have to load the page into a dom (XmlDocument
, XDocument
or HtmlDocument
) and check for any nodes like:
<link rel="alternate" type="application/atom+xml" ...
This should be (in xpath) something like "/html/head/link[@rel='alternate' and @type='application/atom+xml']"
- then look at @title
and @href
.
Instead of loading the HTML into an XMLDocument (which may not be possible if it isn't XHTML compliant), try the HTML Agility Pack instead. It gives you XMLDocument-like syntax but you can use malformed HTML with it.
but generally, you would look for that link tag in the pages head..
Use a regular expression to check the HTML for the link tag.
An exhaustive approach would be to spider each href link and examine the content-type and presence of rss or atom tags...
精彩评论