Query pertaining to Jericho HTML Parser
I want to determine the position number of occurrence of a specific H2 (or h3/h4/h5/h6) text within the body text, using the Parser. By position number I mean count of the number of 'words' that have occurred before this particular h2(or h3/h4/h5/h6) phrase...Also if a phrase occurs in 开发者_JAVA百科both h2 and h4 text (for example) then how do I ensure that I am getting the correct position number for both of these texts?
There's no easy way to do such non-standard requirement.
You must find tags, find text in them, remember their position, parse everything before and count 'words'. It's not rocket science, but it can take some time to code and test.
精彩评论