How to search in not stricted HTML with java?
I have a service that connects to remote site and searches for some elements in the HTML, the incomming data is abount 100-200kbytes but parsing it with strings is sooooooooo slow. I want some suggestions开发者_JAVA技巧 for fast framework... so any one???
1) If you can afford about 1Mb memory usage to parse the html into DOM tree you can use tolerant html parsers (NekoHTML, for example).
2) Otherwise extract the data using regular expressions. This will be faster, less memory required. But you'll have to come up with some good expressions and you won't be able to extract some sophisticated structure information.
you can give a try to Tagsoup
精彩评论