开发者

Perl: How can I split these texts to extract the required info? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center. Closed 11 years ago.

EDITED/Shortened VERSION

I have two texts, which come from two files that I have to loop through (you can ignore my variables). Here is a sample from each:

Tagged:

5.4_CD Passive_NNP Processes_NNP of_IN Membrane_NNP Transport_NNP 85_CD We_PRP have_VBP examined_VBN membrane_NN structure_NN and_CC how_WRB it_PRP is_VBZ used_VBN to_TO perform_VB one_CD membrane_NN function_NN :_: the_DT binding_JJ of_IN one_CD cell_NN to_TO another_DT ._.

Desired output:

5.4 Passive Processes of Membrane Transport 85 We have examined membrane stru....

Parsed:

   Parsing [sent. 1 len. 31]:
        nsubj(85-7, Processes-3)
        nn(Transport-6, Membrane-5)
        prep_of(Processes-3, Transport-6)
        nsubj(examined-10, We-8)
        nsubjpass(used-17, it-15)
        xsubj(perform-19, it-15)
        conj_and(examined-10, used-17)
        xcomp(used-17, perform-19)
        dobj(perform-19, function-22)
        prep_of(binding-25, cell-28) <- refer to开发者_如何学JAVA this for examples below

Desired output:

  • the sent. number (ie. sent. 1 )
  • the grammar function (ie. prep_of )
  • the first dependency word (ie. binding )
  • the second dependency word (ie. cell )

QUESTION

How can I split/substitute these to get my desired output, so that they keep a word boundary on the end and beginning (=~ \bword\b should apply)??

THANKS a lot for taking your time to read this! Any advice is appreciated!


Well, I have difficulty understanding even your revised question. Since I have skipped your historical questions due to not understanding what you wanted, I thought I would share a better explanation. You would be well advised to skip the background material and just break down the problem into:

@subsentences = ("5.4_CD Passive_NNP Processes_NNP","85_CD We_PRP have_VBP examined_VBN membrane_NN");
foreach my $sub (@subsentences) {
  @final = split(/_\S+/,$sub);
  print join(",",@final)."\n";
}

Expected output:  ("5.4", "Passive", "Process") and ("85", "We", "have", "examined").

The sad thing is, I cannot even tell if my guess about what you might mean in this ONE example is correct (might you have meant @subsentence = qw(5.4_CD Passive_NNP Processes_NNP) instead? or something else?). Repeat for each example. Assuming I guessed correctly, the regex you want in this example is:

@finalsentence = split(/_\S+(?:\s+|$)/,$subsentences[$j])

Or the equally valid(?)

@finalsentence = grep(s/_\S+//||1,split(/\s+/,$subsentences[$j]));

I think we have discovered that the actual question he wanted asked was:

@subs = qw(5.4_CD Passive_NNP Processes_NNP);
Expected output: qw(5.4 Passive Processes)

If my revised understand is correct, the following will do what you want

@subs = qw(5.4_CD Passive_NNP Processes_NNP);
@final = @subs;
grep(s/_\S+//,@final);
print join(",",@final)."\n";
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜