开发者

What are some basic rules of thumb to achieve better performance in Perl scripts? [closed]

开发者_JAVA技巧 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 12 years ago.

I was wondering if you had any basic tips to improve the performance of a Perl script performance (like memoizing determinist functions)?


  1. Install Devel::NYTProf
  2. run your script with it: perl -d:NYTProf some_perl.pl
  3. convert the output file into a nice report: nytprofhtml -f nytprof
  4. open the report in a web browser: firefox nytprof/index.html
  5. look for whatever is taking the most time
  6. determine if that is the right algorithm for the job (e.g. are you using an O(n2) algorithm where a O(n) algorithm would also work?)
  7. extract the slow code to a separate script that can be run by itself
  8. write a second (or more) version of the slow code
  9. use Benchmark to compare them (remember to use data that you expect to see in the wild, some algorithms do great for small numbers of items, but are terrible when the number of items increase)
  10. when you have proven that you have faster code, modify the original and go back to step 2 until your code is fast enough or you can't improve it any more
  11. if it is still too slow, ask how to do X faster here (be as detailed about X as you can be)


You should profile first. Rules are made to be broken.

Some very basic tips:

  • Avoid slurping. There are occasions where slurping more than a line is justified, but slurping whole files everywhere is going to exact a price.
  • Avoid passing large lists to subroutines.
  • If you have multi-level hashes or arrays, avoid repeatedly dereferencing multiple levels. Store a reference to the deepest applicable level in a lexical variable and use that.
  • Declare your variables in the smallest scope possible. I am not sure if this is an optimization in and of itself, but frequently will allow you to see more clearly, and that is key to improving performance.


The basic tips for improving the performance of Perl scripts consists of general approaches that apply everywhere, and some very Perl-specific things.

Things like:

  • measure, don't guess, and target expensive areas only. Your ROI will be much higher.
  • cache unchanging expensive results if you're going to use them a bit.
  • don't do something manually when there's a function that will do it for you. This is the difference between interpreted and compiled speed.
  • use CPAN, there's a module for everything, usually written by people who know Perl better than us mere mortals.

This is not something that's easy to distil into a short pithy answer.

I will give one piece of advice. The best way to improve the performance of your code is to post specific bits of it here on SO and wait for the likes of brian d foy to find it :-)


I recently found out that profiling is very useful, and that Devel::NYTProf is very good at it.. Profiling stops you from unnecessary optimizing and helps you fix the biggest bottlenecks first.

Of course, having some knowledge of what not to do helps, but guess that's not very perl specific..

A list of common perl performance gotchas would be nice though :)

Update:

here is one: glob() sorts if you don't tell it otherwise :|

don't copy large amounts of data around, use references (e.g.: Use scalar references to pass large data without copying. )

Duplicate of: How can I speed up my Perl program?


Many good suggestions here already, here are a few more:

  • Avoid repeatedly packing and unpacking arrays, especially when they are large
  • pack and unpack are very fast, but split is sometimes faster
  • Inline small portions of code rather than breaking everything into subroutines (you can easily go too far with this, so profile to find the hot spots).
  • If you want an array of aliases, know your data is mutable, or promise not to change the values, using sub{\@_}->(retuns_a_list()) is around 40% faster than [returns_a_list()] (you don't have to inline the subroutine, I usually call it sub cap {\@_} (short for capture)

And if you really, really need to improve the speed of method calls on certain objects, you can use closure based objects. Here is an excerpt from my module List::Gen discussing the curse method of creating closure objects:

  • curse HASHREF PACKAGE

    many of the functions in this package utilize closure objects to avoid the speed penalty of dereferencing fields in their object during each access. curse is similar to bless for these objects and while a blessing makes a reference into a member of an existing package, a curse conjures a new package to do the reference's bidding

    package Closure::Object;
        sub new {
            my ($class, $name, $value) = @_;
            curse {
                get  => sub {$value},
                set  => sub {$value = $_[1]},
                name => sub {$name},
            } => $class
        }
    

    Closure::Object is functionally equivalent to the following normal perl object, but with faster method calls since there are no hash lookups or other dereferences (around 40-50% faster for short getter/setter type methods)

    package Normal::Object;
        sub new {
            my ($class, $name, $value) = @_;
            bless {
                name  => $name,
                value => $value,
            } => $class
        }
        sub get  {$_[0]{value}}
        sub set  {$_[0]{value} = $_[1]}
        sub name {$_[0]{name}}
    

    the trade off is in creation time / memory, since any good curse requires drawing at least a few pentagrams in the blood of an innocent package. the returned object is blessed into the conjured package, which inherits from the provided PACKAGE.

    when fast just isn't fast enough, since most cursed methods don't need to be passed their object, the fastest way to call the method is:

    my $obj = Closure::Object->new('tim', 3);
    my $set = $obj->{set};                  # fetch the closure
         # or $obj->can('set')
    
    
    $set->(undef, $_) for 1 .. 1_000_000;   # call without first arg
    

    which is around 70% faster than pre-caching a method from a normal object for short getter/setter methods.


This question reminds me of the advice of Tim Bunce, maintainer of Devel::NYTProf.

To summarize, don't do it unless you really have to.

My favorite quote from his presentation:

“The First Rule of Program Optimization: Don't do it.

The Second Rule of Program Optimization (for experts only!): Don't do it yet.”

-Michael A. Jackson

[emphasis added]


Here are a few specific things that haven't been mentioned yet:

  • use tr/// instead of s/// wherever possible
  • use index() to match exact substrings instead of a regex
  • never ever use $^, $& and $-. These are the dreaded regex match variables that slow all regexes. There are alternatives that do not impose a penalty.
  • Revenge of the match vars, when using the English module, exclude match variable aliasing. That is, do use English '-no_match_vars'; instead of just use English;

More info can be found in perlop, perlfunc and perlvar.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜