开发者

Parsing or extracting data for input in a database

I have the following text file:

VERDICT: 
MR. FOREMAN:  Guilty.        
THE COURT:  Accused and, you have been found guilty on the charges as you have heard the Foreman for the jury say.  You are remanded.  I have requested a probation report and you are remanded until sentencing, until the Court receives the probation report. 
THE COURT:  Mr. Foreman and members of the jury, on behalf of the administration of justice   
THE CLERK:  Joh Doe the jury have found you guilty.  Have you anything to say before Her Ladyship, the Judge, proceeds to sentence you?                      
SENTENCE:
THE COURT:  John Doe.

I would like to use the keywords such as verdict, foreman, court, clerk, sentence as tags to enter this information in a database. Please tell me how I can extract these words to create tags to form an xml document 开发者_JAVA技巧to place it in a database. i have been searching using regex and data extraction but I have not found anything as yet.


Do you have a list of expected tags?

  • If yes, what part is not clear?
    • Just extract all relevant strings from XML (using any parser, you haven't mentioned language so can't give examples).
    • apply regExs that contain the allowed tag and if a match then add the tag.
    • PS: If you have too many tags and/or too much data to deal with applying one regEx/tag to each input string may not be most performant.
  • if no, then I suppose you're expected to assume some words are tags and add them. Though I don't like the idea (usually I would expect the user to think and give me tags he wants to mark his inputs with) one way I can think of is to make a list of words you do NOT want to used as tags (e.g. "and", "or", "I", "we", ...), remove all these words using regEx replace, take remaining word
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜