开发者

Use of Stanford Parser in Web Service

I need to use the Stanford Parser in a web service. As SentenceParser loads a big object, I will make sure it is a singleton, but in this case, is it thread safe (no according to http://nlp.s开发者_JAVA技巧tanford.edu/software/parser-faq.shtml). How else would it be done efficiently? One option is locking the object while being used.

Any idea how the people at Stanford are doing this for http://nlp.stanford.edu:8080/parser/ ?


If the contention is not a factor, locking (synchronization) would be one option as you mentioned, and it might be good enough.

If there are contentions, however, I see three general options.

(1) instantiating it every time

Just instantiate it as a local variable every time you perform parsing. Local variables are trivially safe. The instantiation is not free of course, but it may be acceptable depending on the specific situation.

(2) using threadlocals

If instantiation turns out to be costly, consider using threadlocals. Each thread would retain its own copy of the parser, and the parser instance would be reused on a given thread. Threadlocals are not without problems, however. Threadlocals may not be garbage collected without being set to null or until the holding thread goes away. So there is a memory concern if there are too many of them. Second, beware of the reuse. If these parsers are stateful, you need to ensure to clean up and restore the initial state so subsequent use of the threadlocal instance does not suffer from the side effect of previous use.

(3) pooling

Pooling is in general no longer recommended, but if the object sizes are truly large so that you need to have a hard limit on the number of instances you can allow, then using an object pool might be the best option.


I don't know how the people at Stanford have implemented their service but I would build such a service based on a message framework, such as http://www.rabbitmq.com/. So your front end service will receive documents and use a message queue to communicate (store documents and retrieve results) with several workers that execute NLP parsing. The workers -- after finishing processing -- will store results into a queue that is consumed by the front end service. This architecture will let you to dynamically add new workers in case of high load. Especially that NLP tagging takes some time - up several seconds per document.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜