开发者

How to write these patterns?

How to write these patterns?

1) [ the/DT$ government/NN ] has/VBZ n't/RB [ any/DT authority/NN ] to/TO issue/VB [ new/JJ debt/N$ obligations/NNS ] of/IN [ any/DT kind/NN ] [ the/DT Treasury/NNP ] said/VBD...

how开发者_如何学C to get DT$, VBZ, RB, DT, NN... or the part between '/' and space.

2) This is tagsets for Brown database. Is there a pattern for all tags in this link: http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html

Can 1) and 2) be combined as one pattern?

We are new to regex, please help. Thank you very much.

edit: 1) We want to extract the part between / and space: for example: This is a section from a corpus with tag, we just want to extract the tag, not word/token. The tagset includes uppercase letters or uppercaseletters+$, as shown below. We want to get only tags. Are we making the question clear? The tag rule is:

uppercase letter or uppercase letters or uppercase letters + $

[ the/DT$ government/NN ] has/VBZ n't/RB [ any/DT authority/NN ]...

How to have a pattern that only extract DT$, NN, VBZ, RB, DT, NN..

In other words, we should get part between / and space.

We are using a Tperlregex wrapper that support most functions and patterns. The reg may be sth like /\w+|$, but we do not know.

We do not know if we have made it clear.


I think you should use this: "/[A-Z]+\$?\ ". (without qoutes of course)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜