put .txt files into a hash and compare with an array of words using perl [closed]

2023-01-25 14:07 问答作者：

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist

Closed 9 years ago开发者_如何学C.

Improve this question

I have a folder of .txt files which I want to store in a hash. Then compare the file against an array of specific words. While counting the amount of times the specific words occur.

Note that I use \p{Alpha} because that technically defines a word. You can monkey with the regex to add numbers or make sure that there is one alpha at the beginning or whatever you're likely to need.

Note also that for files consisting of one word per line, the regex is overkill and you should omit it. Just chomp the line and store $_.

use 5.010; # for say
use strict;
use warnings;

my ( %hash );

sub load_words { 
    @hash{ @_ } = ( 0 ) x @_; return; 
}

sub count_words {
    $hash{$_}++ foreach grep { exists $hash{$_} } @_;
}


my $word_regex
    = qr{ (                # start a capture
            \p{Alpha}+     # any sequence of one or more alpha characters
            (?:            # begin grouping of
                ['-]         # allow hyphenated words and contractions
                \p{Alpha}+   # which must be followed by an alpha
            )*             # any number of times
            (?: (?<=s)')?  # case for plural possessives (ht: tchrist)
          )                # end capture
        }x;

# load @ARGV to do <> processing
@ARGV = qw( list of files I take words from );
while ( <> ) {
    load_words( m/$word_regex/g );
}
@ARGV = qw( list of files where I count words );
while ( <> ) { 
    count_words( m/$word_regex/g );
}

# take a look at the hash
say Data::Dumper->Dump( [ \%hash ], [ '*hash' ] );

Not going to write the code for you, but you could do something like:

Loop all the files (see glob())
Loop all the words in each file (maybe with a regular expression or split()?)
Check each words against a hash of wanted words. If it's there, increment a "counter" hash value as such: $hash{ $word }++ OR you could store all the words in a hash and then grab the ones you want afterwards ..

OR ... there are many ways to do it..

If your files are huge you will have to do it another way

So I got it done Using an array of the specific words I wanted to find... HAPPY DAYS :-)

#!/usr/bin/perl
#use strict;
use warnings;
my @words;

my @triggers=(" [kK]ill"," [Aa]ssault", " [rR]ap[ie]"," [dD]rug");
my %hash;

sub count_words {
    print "\n";
}

my $word_regex
    = qr{ (                # start a capture
            \p{Alpha}+     # any sequence of one or more alpha characters
            (?:            # begin grouping of
                ['-]         # allow hyphenated words and contractions
                \p{Alpha}+   # which must be followed by an alpha
            )*             # any number of times
          )                # end capture
        }x;

my @files;
my $dirname = "/home/directory";
opendir(DIR,$dirname) or die "can't opendir $dirname: $!";
while (defined($file = readdir(DIR))) {
     push @files, "$dirname$file";
}    # do something with "$dirname/$file" } 
closedir(DIR);
my @interestingfiles;

foreach $file (@files){

    open FILE, ("<$file") or die "No file";

    foreach $line (<FILE>){
        foreach $trigger (@triggers){
           if($line =~ /$trigger/g){
              push @interestingfiles, "$file\n";
           }
        }
    } 
   close FILE;
}
print @interestingfiles;

继续阅读：arrays count hash perl text

put .txt files into a hash and compare with an array of words using perl [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？