Using perl to parse a file and insert specific values into a database
Disclaimer: I'm a newbie at scripting in perl, this is partially a learning exercise (but still a project for work). Also, I have a much stronger grasp on shell scripting, so my examples will likely be formatted in that mindset (but I would like to create them in perl). Sorry in advance for my verbosity, I want to make sure I am at least marginally clear in getting my point across
I have a text file (a reference guide) that is a Word document converted to text then swapped from Windows to UNIX format in Notepad++. The file is uniform in that each section of the file had the same fields/formatting/tables.
What I have planned to do, in a basic way is grab each section, keyed by unique batch job names and place all of the values into a database (or maybe just an excel file) so all the fields can be searched/edited for each job much easier than in the word file and possibly create a web interface later on.
So what I want to do is grab each section by doing something like:
sed -n '/job_name_1_regex/,/job_name_2_regex/' file.txt
--how would this be formatted within a perl script?
(grab the section in total, then break it down further from there)
To read the file in the script I have open FORMAT_FILE, 'test_format.txt';
and then use foreach $line (<FORMAT_FILE>)
to parse the file line by line. --is there a better way?
My next problem is that since I converted from a word doc with tables, 开发者_开发百科which looks like:
Table Heading 1 Table Heading 2 Heading 1/Value 1 Heading 2/Value 1 Heading 1/Value 2 Heading 2/Value 2
but the text file it looks like:
Table Heading 1 Table Heading 2Heading 1/Value 1Heading 1/Value 2Heading 2/Value 1Heading 2/Value 2
So I want to have "Heading 1" and "Heading 2" as a columns name and then put the respective values there. I just am not sure how to get the values in relation to the heading from the text file. The values of Heading 1 will always be the line number of Heading 1 plus 2 (Heading 1, Heading 2, Values for heading 1). I know this can be done in awk/sed pretty easily, just not sure how to address it inside a perl script.
For this I was thinking of doing an array something like:my @heading1 = ($value1, $value2, etc.)
my @heading2 = ($value1, $value2, etc.)
I just need to be able to associate the correct values and headings together. So that heading1 = the line after heading2 (where the values start). Like saying (in shell):
x=$(grep -n "Heading 1" file.txt | cut -d":" -f1) #gets the line that "Heading 1" is on in the file
(( x = x+2 )) #adds 2 to the line (where the values will start)
#print values from file.txt from the line where they start to the
#last one (I'll figure that out at some point before this)
sed -n "$x,$last_line_of_values p" file.txt
This is super-hacked together for the moment, to try to elaborate what I want to do...let me know if it clears it up a little...
---/EDIT---After I have all the right values and such, linking it up to a database may be an issue as well, I haven't started looking at the way perl interacts with DBs yet.
Sorry if this is a bit's still not fully formed in my head.
use strict;
use warnings;
use DBI;
my $driver = "mysql"; # Database driver type
my $database = "test"; # Database name
my $user = ""; # Database user name
my $password = ""; # Database user password
my $dbh = DBI->connect(
$user, $password,
RaiseError => 1,
PrintError => 1,
) or die $DBI::errstr;
my $sth = $dbh->prepare("
(col1, col2)
VALUES (?, ?)
") or die $dbh->errstr;
my $intable = 0;
open my $file, "file.txt" or die "can't open file $!";
while (<$file>) {
if (/job_name_1_regex/../job_name_2_regex/) { # job 1 section
$intable = 1 if /Table Heading 1/; # table start
if ($intable) {
my $next_line = <$file>; # heading 2 line
chomp; chomp $next_line;
$sth->execute($_, $next_line) or die $dbh->errstr;
close $file or die "can't close file $!";
Several things in this post... First, the basic "best practices" :
use modern perl. start your scripts with
use strict; use warnings;
don't use global filehandles, use lexical filehandles (declare them in a variable).
always check "open" for return values.
open my $file, "/some/file" or die "can't open file : $!"
Then, about pattern matching : I don't understand your example at all but I suppose you want something like :
foreach my $line ( <$file> ) {
if ( $line =~ /regexp1/) {
# do something...
Edit : about table, I suppose the best thing is to build two arrays, one for each column. If I understand correctly when reading the file you need to split the line and put one part in the @col1 array, and the second part in the @col2 array. The clear and easy way is to use two temporary variables :
my ( $val1, $val2 ) = split /\s+/, $line;
push @col1, $val1;
push @col2, $val2;