getting text off webpage (NOT HTML SOURCE)
how would i put the contents of a webpage into a string?
it would be the same thing as hitting ctrl+A and copying and pasting it.
is there a way to do this programmatically without 'sendkeys' ?
i do not want to look at the html source at all, i just want to copy the 开发者_运维百科text on the site
I've done a fair bit of screen scraping for applications and have found this to be invaluable: https://github.com/MindTouch/SGMLReader
There is a bit of sample code on that page but I've added a bit extra here that will return exactly what you want
Imports System.Xml
Imports System.IO
Imports System.Net
Imports System.Text
Function FromHtml(ByVal reader As TextReader) As XmlDocument
'' setup SgmlReader
Dim sgmlReader As Sgml.SgmlReader = New Sgml.SgmlReader()
sgmlReader.DocType = "HTML"
sgmlReader.WhitespaceHandling = WhitespaceHandling.None
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower
sgmlReader.InputStream = reader
'' create document
Dim doc As XmlDocument = New XmlDocument()
doc.PreserveWhitespace = True
doc.XmlResolver = Nothing
doc.Load(sgmlReader)
Return doc
End Function
Function LoadWebText(ByVal URL As String) As String
Dim objWebClient As New WebClient()
Dim objUTF8 As New UTF8Encoding()
Dim xml As New XmlDocument
xml = FromHtml(New StringReader(objUTF8.GetString(objWebClient.DownloadData(URL))))
Return xml.InnerText()
End Function
Here is some code to load yahoo.com through Microsoft's Internet controls and print the text.
Create a new project in Visual Studio, go to the Add Reference page, click the COM tab and add Microsoft Internet Controls.
Then paste the code below in a function.
Dim MyBrowser As New SHDocVw.InternetExplorer
MyBrowser.Navigate("http://www.yahoo.com/")
Do Until MyBrowser.Busy = False
System.Threading.Thread.Sleep(100)
Loop
Debug.Print(MyBrowser.Document.body.innerText)
if you are looking to be able to quickly copy all the content to the clipboard, you can use a bookmark that runs javascript (bookmarklet), instead of creating a bookmark with a url you would include the following:
javascript:void function(){document.addEventListener("copyText",function(t){t.preventDefault(),t.clipboardData%26%26t.clipboardData.setData("text/plain",document.body.innerText)}),document.execCommand("copyText")}();
Yes. Check this out.
http://www.searcharoo.net/
精彩评论