Mapping of name to the filename in perl
i am confused now... Here is my problem: I have a text file in this format
Tom //name
Washington
account.txt //filename
Gary //NAME
New York
accountbalance.png //filename
Mary //name
New Jersey
Michelle 开发者_StackOverflow社区 //NAME
Larry //NAME
Charles //NAME
Washington
Real.cpp //FILENAME
.
.goes on(large file)
I wanted to extract the name and corresponding filename.For example Charles is the name of the person who worked on real.cpp....
I think I need to
- use a while loop
- used two if statements within it (one to extract name other to extract filenmae)
- end the while loop
Problem faced:I get name and filenames which are not corresponding to it...(due to no unformity of one to one relation in the text file reading) I want the name to be the key and filename to be the value and store this in the hash How to resolve this.....I am confused..Give me suggestions,Pls
If names always have //name
following them and the filenames always have //filename
following them and the name before the filename is the name to associate with the filename, it is fairly simple:
#!/usr/bin/perl
use strict;
use warnings;
my $key;
my %name_to_filename;
while (<DATA>) {
#only pay attention to lines that have //name or //filename
#and save off the part before //name or //filename and which type it was
next unless my ($name, $type) = m{(.*?)\s+//(name|filename)}i;
if ($type =~ /^name$/i) {
$key = $name; #remember the last name seen
next;
}
$name_to_filename{$key} = $name;
}
use Data::Dumper;
print Dumper \%name_to_filename;
__DATA__
Tom //name
Washington
account.txt //filename
Gary //NAME
New York
accountbalance.png //filename
Mary //name
New Jersey
Michelle //NAME
Larry //NAME
Charles //NAME
Washington
Real.cpp //FILENAME
Have 3 variables Line_1,Line_2,Current_line. For first 2 lines read the variables Line_1,Line_2 are initialized. Now when reading 3rd line check whether its a File If yes then store the same in hash hash{filename} = name,city. If not the copy Line_2 to Line_1 and Current_line to Line_2. This shuld happen in a loop till whole file is read
Since you want to map names to a file name. The data shows that you get a list of names and then a file name. So you're going to need to store up keys until you know what you can store them with.
Additionally, since you didn't say anything about state names, I expect you want to ignore those. So we need a way to tell them apart. Fortunately, the states are a well-defined set, and can be put into a lookup table.
Then, we need a way to distinguish names from filenames, from what you show, I'm going with the following pattern: at least one word character, then a single dot, then at least one word character for the extention.
So that will tell me whether we're on a file line, and can resolve the value of the pending names.
@ARGV = '/path/to/file';
my %state_hash
= ( Alabama => 1, Alaska => 1, Arizona => 1, ...
, 'New Hampshire' => 1, ..., Wyoming => 1
);
my ( @pending_names, %file_for );
while ( <> ) {
# Extract non-spaces at the beginning of the line
# potentially separated with one-and-only-one space
my ( $name_or_file ) = m/^(?:\S+[ ]?)+)/;
next unless $name_or_file or exists $state_hash{ $name_or_file };
# if the extract value fits the file pattern
if ( $name_or_file =~ m/^\w+\.\w+$/ ) {
# store the name-file combination for each pending
$file_for{ $_ } = $name_or_file foreach @pending_names;
# they are not pending anymore, so clear them.
@pending_names = ();
}
else {
# store up pending names
push @pending_names, $name_or_file;
}
}
What you didn't ask to handle is whether or not, it being a "large file", a name is likely to repeat. If a name repeats more than once, you'll clobber the value you save last time.
This can be remedied by push
-ing onto the hash slot and not simply assigning it. Like so:
push @{ $file_for{ $_ } }, $name_or_file foreach @pending_name;
This version uses a hash named %is_city
to skip lines that look like cities and assumes that a name containing a .
is a filename. Both of these assumptions are bad though. For instance, my name contains a period and names like Madison can be the name of a city or a person.
#!/usr/bin/perl
use strict;
use warnings;
my %is_city = map { $_ => 1 } (
"Washington", "New York", "New Jersey",
);
my $key;
my %name_to_filename;
while (my $name = <DATA>) {
chomp $name;
next if $is_city{$name};
if ($name =~ /[.]/) {
$name_to_filename{$key} = $name;
next;
}
$key = $name;
}
use Data::Dumper;
print Dumper \%name_to_filename;
__DATA__
Tom
Washington
account.txt
Gary
New York
accountbalance.png
Mary
New Jersey
Michelle
Larry
Charles
Washington
Real.cpp
Assuming that all filenames have a .
in them, and that filenames are the only thing that does.
Also assuming that the list of Cities, and States is so large as to be infeasible to get an entire list.
#! /usr/bin/env perl
use strict;
use warnings;
my @state_city_or_person;
my %files;
while(<>){
chomp;
if( index($_,'.') >= 0 ){
push @{ $files{$_} }, @state_city_or_person;
@state_city_or_person = ();
}else{
push @state_city_or_person, $_;
}
}
use YAML;
print Dump \%files;
--- Real.cpp: - Mary - New Jersey - Michelle - Larry - Charles - Washington account.txt: - Tom - Washington accountbalance.png: - Gary - New York
You will still have to go through and remove any extraneous data, like cities, and states, but this should help you to get it into an actual parse-able format.
It would be helpful if there was some sort of structure to the data to start with.
精彩评论