Perl: How can I split these texts to extract the required info? [closed]

2023-03-11 18:32 问答作者：

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center. Closed 11 years ago.

EDITED/Shortened VERSION

I have two texts, which come from two files that I have to loop through (you can ignore my variables). Here is a sample from each:

Tagged:

5.4_CD Passive_NNP Processes_NNP of_IN Membrane_NNP Transport_NNP 85_CD We_PRP have_VBP examined_VBN membrane_NN structure_NN and_CC how_WRB it_PRP is_VBZ used_VBN to_TO perform_VB one_CD membrane_NN function_NN :_: the_DT binding_JJ of_IN one_CD cell_NN to_TO another_DT ._.

Desired output:

5.4 Passive Processes of Membrane Transport 85 We have examined membrane stru....

Parsed:

   Parsing [sent. 1 len. 31]:
        nsubj(85-7, Processes-3)
        nn(Transport-6, Membrane-5)
        prep_of(Processes-3, Transport-6)
        nsubj(examined-10, We-8)
        nsubjpass(used-17, it-15)
        xsubj(perform-19, it-15)
        conj_and(examined-10, used-17)
        xcomp(used-17, perform-19)
        dobj(perform-19, function-22)
        prep_of(binding-25, cell-28) <- refer to开发者_如何学JAVA this for examples below

Desired output:

the sent. number (ie. sent. 1 )
the grammar function (ie. prep_of )
the first dependency word (ie. binding )
the second dependency word (ie. cell )

QUESTION

How can I split/substitute these to get my desired output, so that they keep a word boundary on the end and beginning (=~ \bword\b should apply)??

THANKS a lot for taking your time to read this! Any advice is appreciated!

Well, I have difficulty understanding even your revised question. Since I have skipped your historical questions due to not understanding what you wanted, I thought I would share a better explanation. You would be well advised to skip the background material and just break down the problem into:

@subsentences = ("5.4_CD Passive_NNP Processes_NNP","85_CD We_PRP have_VBP examined_VBN membrane_NN");
foreach my $sub (@subsentences) {
  @final = split(/_\S+/,$sub);
  print join(",",@final)."\n";
}

Expected output:  ("5.4", "Passive", "Process") and ("85", "We", "have", "examined").

The sad thing is, I cannot even tell if my guess about what you might mean in this ONE example is correct (might you have meant @subsentence = qw(5.4_CD Passive_NNP Processes_NNP) instead? or something else?). Repeat for each example. Assuming I guessed correctly, the regex you want in this example is:

@finalsentence = split(/_\S+(?:\s+|$)/,$subsentences[$j])

Or the equally valid(?)

@finalsentence = grep(s/_\S+//||1,split(/\s+/,$subsentences[$j]));

I think we have discovered that the actual question he wanted asked was:

@subs = qw(5.4_CD Passive_NNP Processes_NNP);
Expected output: qw(5.4 Passive Processes)

If my revised understand is correct, the following will do what you want

@subs = qw(5.4_CD Passive_NNP Processes_NNP);
@final = @subs;
grep(s/_\S+//,@final);
print join(",",@final)."\n";

继续阅读：perl split

Perl: How can I split these texts to extract the required info? [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？