开发者

How convert text into XML using perl?

input text file contain the following:

....    
    ponies B-pro        
    were I-pro        
    used I-pro    
    A O        
    report O        
    of O    
    indirect B-cd        
    were O
    . O    
...

output XML file

<sen> 
 <base id="pro">
  <w id="1">ponies</w>
  <w id="2">were</w>
  <w id="3">were</w>
 </base>A report of 
 <base id="cd">indirect</base> were 
</sen>

i want to make an XML file by reading the text file, B- means the开发者_如何学C begining of my tag and I- means an include words inside the tag while "O" means outside the base tag which means it only exist in the tag.

i try the following codes:

#!/usr/local/bin/perl -w    
open(my $f, "input.txt") or die "Can't";    
open(my $o, ">output.xml") or die "Can't";    
my $c;   

sub read_line {     
  my $fh = shift;    
  if ($fh and my $line = <$fh>) {    
    chomp($line);    
 my @words = split(/\t/, $line);    
 my $word = $words[0];
     my $group = $words[1];    
 if($word eq "."){    
  return;    
 }    
 else{    
  if($group ne 'O'){    
   my @b = split(/\-/, $group);    
   if($b[0] eq 'B'){    
    my $e = "<e id=\"";              
    $e .= " . $b[1] . "\">";    
    $e .= $word . "</e>";
    return $e;    
   }   
   if($b[0] eq 'I'){    
    my $w = "<w id=\"";    
    $w .= $c . "\">";    
    $w .= $word . "</w>";    
    $c++;    
    return $w;    
   }    
  }    
  else{    
   $c = 2;    
   return $word;    
  }    
 }    
  }    
  return;    
}

sub get_text(){    
 my $txt = "";    
 my $r = read_line($f);     
 while($r){     
  if($r =~ m/[[:punct:]]/){    
   chop($txt);    
   $txt .= " " . $r . " ";    
  }    
  else{    
   $txt .= $r . " ";    
  }    
  $r = read_line($f);    
 }   
 chop($txt);    
 return "<sen>" . $txt . ".</sen>";    
}

instead im getting as output:

<sen> 
 <base id="pro"> ponies </base>
  <w id="2">were</w>
  <w id="3">were</w>
 A report of 
 <base id="cd">indirect</base> were 
</sen>

i really need help.

Thanks


Writing XML "by hand" will only get you in trouble. Use a module from CPAN.

In your case, I would first put the data in a proper Perl data structure (maybe a hash containing some arrays, or something similar) and then using a module (i.e. XML::Simple for starters) to output to a file.


As Javs said, you want to use a module rather than do this by hand. For your purposes, since you have mixed content, I recommend XML::LibXML. Here is an example I made to test that you can indeed to mixed content like you've got:

use XML::LibXML;

my $doc = XML::LibXML::Document->new();

my $root = $doc->createElement('html');
$doc->setDocumentElement($root);
my $body = $doc->createElement('body');
$root->appendChild($body);

my $link = $doc->createElement('a');
$link->setAttribute('href', 'http://google.com');
$link->appendText('Google');
$body->appendChild($link);

$body->appendText('Inline Text');

print $doc->toString;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜