开发者

getting text off webpage (NOT HTML SOURCE)

how would i put the contents of a webpage into a string?

it would be the same thing as hitting ctrl+A and copying and pasting it.

is there a way to do this programmatically without 'sendkeys' ?

i do not want to look at the html source at all, i just want to copy the 开发者_运维百科text on the site


I've done a fair bit of screen scraping for applications and have found this to be invaluable: https://github.com/MindTouch/SGMLReader

There is a bit of sample code on that page but I've added a bit extra here that will return exactly what you want

Imports System.Xml
Imports System.IO
Imports System.Net
Imports System.Text

Function FromHtml(ByVal reader As TextReader) As XmlDocument
    '' setup SgmlReader   
    Dim sgmlReader As Sgml.SgmlReader = New Sgml.SgmlReader()
    sgmlReader.DocType = "HTML"
    sgmlReader.WhitespaceHandling = WhitespaceHandling.None
    sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower
    sgmlReader.InputStream = reader
    '' create document 
    Dim doc As XmlDocument = New XmlDocument()
    doc.PreserveWhitespace = True
    doc.XmlResolver = Nothing
    doc.Load(sgmlReader)
    Return doc
End Function

Function LoadWebText(ByVal URL As String) As String
    Dim objWebClient As New WebClient()
    Dim objUTF8 As New UTF8Encoding()

    Dim xml As New XmlDocument
    xml = FromHtml(New StringReader(objUTF8.GetString(objWebClient.DownloadData(URL))))

    Return xml.InnerText()

End Function


Here is some code to load yahoo.com through Microsoft's Internet controls and print the text.

Create a new project in Visual Studio, go to the Add Reference page, click the COM tab and add Microsoft Internet Controls.

Then paste the code below in a function.

Dim MyBrowser As New SHDocVw.InternetExplorer

MyBrowser.Navigate("http://www.yahoo.com/")

Do Until MyBrowser.Busy = False

System.Threading.Thread.Sleep(100)

Loop

Debug.Print(MyBrowser.Document.body.innerText)


if you are looking to be able to quickly copy all the content to the clipboard, you can use a bookmark that runs javascript (bookmarklet), instead of creating a bookmark with a url you would include the following:

javascript:void function(){document.addEventListener("copyText",function(t){t.preventDefault(),t.clipboardData%26%26t.clipboardData.setData("text/plain",document.body.innerText)}),document.execCommand("copyText")}();


Yes. Check this out.

http://www.searcharoo.net/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜