html source does not show all visible data
if you go here:
http://whois.domaintools.com/iconplc.com
and view the source
why can't you see the registrant data in the开发者_JAVA百科 HTML source?
is it at all possible to get this data through the html source?
this stuff is not in the html source:
Registrant:
ICON Clinical Research
212 Church Road
North Wales, PA 19454
US
Domain Name: ICONPLC.COM
Administrative Contact, Technical Contact:
ICON Clinical Research
212 Church Road
North Wales, PA 19454
US
215-616-3359 fax: 123 123 1234
Record expires on 08-Sep-2019.
Record created on 12-Dec-2007.
Domain servers in listed order:
UDNS1.ULTRADNS.NET
UDNS2.ULTRADNS.NET
even after i save the webpage as .html, i am still unable to find the email address
You can use the Selenium C# Client driver to write code that checks for this css locator css=div.whois_record . You can then write code to scrape every
under that particular div. The email address found on the page is an image so you would have to save it.
If you look at the source, they have linked to an ajax application. My guess would be that they are pulling it down after the HTML has loaded, and so the information won't be viewable by looking at the source.
Here is a link talking about how to scrape ajax sites:
How do you scrape AJAX pages?
Looks like the page is put together with AJAX. Firebug in Firefox, or Developer tools in IE should help you get to it.
Because it is generated with JavaScript. Grep the source for whois_data
i have chrome browser and it shows the content you want but not in the same format like this:
ajaxUpdate("3","Registrant:ICON Clinical Research
212 Church Road
North Wales, PA 19454
US
Domain Name: ICONPLC.COM
Administrative Contact, Technical Contact:
ICON Clinical Research
212 Church Road
North Wales, PA 19454
US
215-616-3359 fax: 123 123 1234
Record expires on 08-Sep-2019.
Record created on 12-Dec-2007.
Domain servers in listed order:
UDNS1.ULTRADNS.NET
UDNS2.ULTRADNS.NET")
I just looked at the source and the text you mention is there, with the only mention that it has
s instead of spaces.
<div class=\'whois_record\'>Registrant:<br/>ICON Clinical Research<br/> 212 Church Road<br/> North Wales, PA 19454<br/> US<br/><br/> Domain Name: ICONPLC.COM<br/><br/> Administrative Contact, Technical Contact:<br/> ICON Clinical Research etc.
Also, as already mentioned, extra text can always be added to a page at a later time by client-side scripts.
精彩评论