Is OpenNLP unable to identify dates of the format "January 10th, 2009"?
OpenNLP(in Java) is unable to identify dates of the format "Januar开发者_JAVA技巧y 10th, 2010" or "January 10, 2010". I replaced all ','s in the text with an empty string "" before using OpenNLP tokenizer and it works fine for dates of the form "January 10, 2010". So, I tried to replace "th," with ",", but it did not work. How can we make sure that the dates of the above forms are identified with OpenNLP?
Thanks in advance
For an explanation of date finding and format, this newer post works well. It talks about the models recognizing dates within the context of the tokens around it since it is a statistical model.
For the th case above, as the comment says if you want to replace both the th and the , then you have to apply both replaces, or better yet do a single replace of th, with empty string.
精彩评论