How to get POS tagging using Stanford Parser
I'm using Stanford Parser to parse the dependence relations between pair of words, but I also need the tagging of words. However, in the ParseDemo.java, the program only output the Tagging Tree. I need each word's tagging like this:
My/PRP$ dog/NN also/RB likes/VBZ eating/VBG bananas/NNS ./.
not like this:
(ROOT
(S
(NP (PRP$ My) (NN dog))
(ADVP (RB also))
(VP (VBZ likes)
(S
(VP (VBG eating)
(S
(ADJP (NNS bananas))))))
开发者_如何学编程 (. .)))
Who can help me? thanks a lot.
If you're mainly interested in manipulating the tags in a program, and don't need the TreePrint
functionality, you can just get the tagged words as a List:
LexicalizedParser lp =
LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree parse = lp.apply(Arrays.asList(sent));
List taggedWords = parse.taggedYield();
When running edu.stanford.nlp.parser.lexparser.LexicalizedParser on the command line, you want to use:
-outputFormat "wordsAndTags"
Programatically, use the TreePrint class constructed with formatString="wordsAndTags" and call printTree, like this:
TreePrint posPrinter = new TreePrint("wordsAndTags", yourPrintWriter);
posPrinter.printTree(yourLexParser.getBestParse());
String[] sent = { "This", "is", "an", "easy", "sentence", "." };
List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
Tree parse = lp.apply(rawWords);
ArrayList ar=parse.taggedYield();
System.out.println(ar.toString());
This answer is a bit outdated so I decided to add my own. So with Stanford Parser version 3.6.0 (maven dependencies):
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-parser</artifactId>
<version>3.6.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.6.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.6.0</version>
<classifier>models</classifier>
</dependency>
private static MaxentTagger tagger = new MaxentTagger(MaxentTagger.DEFAULT_JAR_PATH);
public String getTaggedString(String someString) {
String taggedString = tagger.tagString(someString);
return taggedString;
}
This will return I_PRP claim_VBP the_DT rights_NNS
for 'I claim the rights'
So If you want to detect verbs in a phrase using java and stanford parser you can do this:
public boolean containsVerb(String someString) {
String taggedString = tagger.tagString(someString);
String[] tokens = taggedString.split(" ");
for (String tok : tokens){
String[] taggedTokens = tok.split("_");
if (taggedTokens[1].startsWith("VB")){
return true;
}
}
return false;
}
精彩评论