HTTP Builder/Groovy - get source text _and_ XmlSlurper output?
I am reading here: http://groovy.codehaus.org/modules/http-builder/doc/get.html
I seem to be able to get
i) XMLSlurper output as parsed by NekoHTML using:
def http = new HTTPBuilder('http://www.google.com')
def html = http.get( path : '/search', query : [q:'Groovy'] )
ii) Raw text using:
http.get( path : '/search',
contentType : TEXT,
query : [q:'Groovy'] ) { resp, reader ->
println "response status: ${resp.statusLine}"
println 'Headers: -----------'
resp.headers.each { h ->
println " ${h.name} : ${h.value}"
}
println 'Response data: -----'
System.out << reader
println '\n--------------------'
}
I am having some trouble and would like to get BOTH (i) and (ii) to debug my XmlSlurper code on the actual html I am getting.
Any suggestions how I might go about doing this?
I can easily instantiate an XmlSlurper object with the relevant string using the parseString(string) method or the parse(reader) method, but I cannot seem to get the Neko processing step correct.
Any 开发者_StackOverflowhints?
Thank you! Misha
Ok here it is.
Figured out from: http://groovy.codehaus.org/Testing+Web+Applications
def html=http.get(uri:'http://www.google.com',contentType:groovyx.net.http.ContentType.TEXT) { resp,reader ->
def s=reader.text
new File("temp.html")<<s
new XmlSlurper(new org.cyberneko.html.parsers.SAXParser()).parseText(s)
}
Thank you! Misha
Rather than having to dump to file first and read from it, you could achieve the same with the following implementation using reader.readLines():
def html=http.get(uri:'http://www.google.com',contentType:groovyx.net.http.ContentType.TEXT) { resp,reader ->
String response = (reader.readLines().join() as String)
new XmlSlurper(new org.cyberneko.html.parsers.SAXParser()).parseText( response)
}
精彩评论