开发者

Is OpenNLP unable to identify dates of the format "January 10th, 2009"?

OpenNLP(in Java) is unable to identify dates of the format "Januar开发者_JAVA技巧y 10th, 2010" or "January 10, 2010". I replaced all ','s in the text with an empty string "" before using OpenNLP tokenizer and it works fine for dates of the form "January 10, 2010". So, I tried to replace "th," with ",", but it did not work. How can we make sure that the dates of the above forms are identified with OpenNLP?

Thanks in advance


For an explanation of date finding and format, this newer post works well. It talks about the models recognizing dates within the context of the tokens around it since it is a statistical model.

For the th case above, as the comment says if you want to replace both the th and the , then you have to apply both replaces, or better yet do a single replace of th, with empty string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜