How do I edit an XML file with Perl?
I have a movie collection catalogue with local links to folders and files for an easy access. Recently I reorganaized my entire hard disk space and I need to update the links and I'm trying to do that automatically with Perl.
I can export the data in a XML file and import it again. I can extract the new filepaths with the use of File::Find but I'm stuck with two problems. I have no idea how to connect the $title
from the new filepath with the corresponding $title
from the XML file. I'm dealing with such files for the first time and I don't know how to proceed with the replacement process. Here is what I've done till now
use strict;
use warnings;
use File::Basename;
use File::Find;
use File::Spec;
use XML::Simple;
use Data::Dumper;
my $dir_target = 'D:/Movies/';
my %titles_locations = ();
find(\&file_handler, $dir_target);
sub file_handler {
/\.iso$/ or return;
my $fn = $File::Find::name;
$fn =~ s/\//\\/g;
$fn =~ /(.*\\)开发者_如何学JAVA(.*)/;
my $path = $1;
my $filename = $2;
my $title = (File::Spec->splitdir($fn))[2];
$title =~ s/(.*?)\s\(\d+\)$/$1/;
$title =~ s/~/:/;
$title =~ s/`/?/;
my $link_local = '<link><description>Folder</description><url>'.$path.'</url><urltype>Movie</urltype></link><link><description>'.$filename.'</description><url>'.$fn.'</url><urltype>Movie</urltype></link>' unless $title eq '';
$titles_locations{$title} = {'filename'=>$filename, 'path'=>$path };
}
my $xml_in = XMLin('somepath/test.xml', ForceArray => 1, KeepRoot => 1);
my $title = {'key1' => 'title', 'key2' => 'links'};
foreach my $link (keys %$title) {
}
print Data::Dumper->Dump([$title]);
my $xml_out = XMLout($xml_in, OutputFile => 'somepath/test_out.xml', KeepRoot=>1);
And here is a snippet of the data I need to edit. If found imdb and dvdempire link - do not touch. if found local links replace, otherwise insert. I'm willing to complete the code myself but need some directions how to proceed further. Thanks.
<title>$title</title>
.......
<links>
<link>
<description>IMDB</description>
<url>http://www.imdb.com/title/VARIABLE</url>
<urltype>URL</urltype>
</link>
<link>
<description>DVD Empire</description>
<url>http://www.dvdempire.com/VARIABLE</url>
<urltype>URL</urltype>
</link>
<link>
<description>Folder</description>
<url>OLD_FOLDERPATH</url>
<urltype>Movie</urltype>
</link>
<link>
<description>OLD_FILENAME</description>
<url>OLD_FILENAMEPATH</url>
<urltype>Movie</urltype>
</link>
</links>
Get rid of XML::Simple and use XML::Twig which is made just for this sort of task. The traversal and element operations are built into Twig. There is a lot less to think about when Twig does most of the work.
As far as connecting old paths to new paths, there's not much to go on with the data that you have. If they are the same filenames but in different folders, that could be the way that you match up the new and old paths if they are unique filenames. Here's everything except getting all of the new paths to populate %new_paths
:
#!perl
use File::Basename qw(basename);
use XML::Twig;
my %new_paths = (
# filename => new_path
...
);
my $twig = XML::Twig->new(
twig_handlers =>
{
link => \&rewrite_link,
},
pretty_print => 'indented',
);
$twig->parse( *DATA );
$twig->flush;
sub rewrite_link
{
my( $link ) = $_;
return unless $link->field( 'urltype' ) eq 'Movie';
# this is from the old file
my $basename = basename( $link->field( 'url' ) );
unless( exists $new_paths{ $basename } )
{
warn "Didn't find a new location for $basename!\n";
return;
}
$link->first_child( 'url' )->set_text( $new_paths{ $basename } );
}
__END__
<titles>
<entry>
<title>$title</title>
<links>
<link>
<description>IMDB</description>
<url>http://www.imdb.com/title/VARIABLE</url>
<urltype>URL</urltype>
</link>
<link>
<description>DVD Empire</description>
<url>http://www.dvdempire.com/VARIABLE</url>
<urltype>URL</urltype>
</link>
<link>
<description>Folder</description>
<url>OLD_FOLDERPATH</url>
<urltype>Movie</urltype>
</link>
<link>
<description>OLD_FILENAME</description>
<url>OLD_FILENAMEPATH</url>
<urltype>Movie</urltype>
</link>
</links>
</entry>
</titles>
I'll provide a plausible approach - please comment if you'd like it fleshed out more.
Declare a hash
my %titles_locations = ();
at the beginning.You should move your XML handling out of
sub a
(and please call it something readable, likesub file_handler
:)What the file handler should do is:
Build the
$title
and$link_local
as you do nowStore them in a
%titles_locations
hash with$title
being the key and the value a hashref containing{'filename'=>$filename, 'path'=>$path }
Now, in your code, after calling
find()
, you will call XMLin.$xml_in
should become an array of hashrefs (or a hashref mapping your "root" key to an array of hashrefs. Each hashref in the array will represent 1 title.After that, you will loop over that arrayref of titles.
Each element (call it
$title
) of the arrayref will be a hashref with 2 keys,"title"
and"links"
.From the value of the
"title"
key, find the new path and filename from%titles_locations
hash.The value of
"links"
key will be a hashref mapping "link" to an array of hashrefs. I won't bother detailing the data structure here but it's trivial to see it by printingData::Dumper->Dump([$title]);
You will then loop over those link hashrefs. For each of them (call it
$link
:- If
$link->{urltype}
ne "Movie", leave it alone (next;
) - If
$link->{description}
eq "Folder", replace the$link->{url}
value with new path you found from%titles_locations
hash. - Else, it's a file, replace the
$link->{url}
value with new filepath you found from%titles_locations
hash.
May be add some error handling if
$title
is not in%titles_locations
hash.- If
After all the looping is done, then simply take your
$xml_in
(that now contains updated info) and pass toXMLout()
DONE
精彩评论