Does a line profiler for code require a parse tree and is that sufficient?

2023-04-02 07:24 问答作者：

I am tryin开发者_Python百科g to determine what is necessary to write a line profiler for a language, like those available for Python and Matlab.

A naive way to interpret "line profiler" is to assume that one can insert time logging around every line, but the definition of a line is dependent on how a parser handles whitespace, which is only the first problem. It seems that one needs to use the parse tree and insert timings around individual nodes.

Is this conclusion correct? Does a line profiler require the parse tree, and is that all that is needed (beyond time logging)?

Update 1: Offering a bounty on this because the question is still unresolved.

Update 2: Here is a link for a well known Python line profiler in case it is helpful for answering this question. I've not yet been able to make heads or tails of it's behavior relative to parsing. I'm afraid that the code for the Matlab profiler is not accessible.

Also note that one could say that manually decorating the input code would eliminate a need for a parse tree, but that's not an automatic profiler.

Update 3: Although this question is language agnostic, this arose because I am thinking of creating such a tool for R (unless it exists and I haven't found it).

Update 4: Regarding use of a line profiler versus a call stack profiler - this post relating to using a call stack profiler (Rprof() in this case) exemplifies why it can be painful to work with the call stack rather than directly analyze things via a line profiler.

I'd say that yes, you require a parse tree (and the source) - how else would you know what constitutes a "line" and a valid statement?

A practical simplification though might be an "statement profiler" instead of a "line profiler". In R, the parse tree is readily available: body(theFunction), so it should be fairly easy to insert measuring code around each statement. With some more work you can insert it around a group of statements that belong to the same line.

In R, the body of a function loaded from a file typically also has an attribute srcref that lists the source for each "line" (actually each statement) :

Here's a sample function (put in "example.R"):

f <- function(x, y=3)
{
    a <- 0; a <- 1  # Two statements on one line
    a <- (x + 1) *  # One statement on two lines
        (y + 2)

    a <- "foo       
        bar"        # One string on two lines
}

Then in R:

source("example.R")
dput(attr(body(theFunction), "srcref"))

Which prints this line/column information:

list(structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L), srcfile = <environment>, class = "srcref"), 
    structure(c(3L, 2L, 3L, 7L, 9L, 14L, 3L, 3L), srcfile = <environment>, class = "srcref"), 
    structure(c(3L, 10L, 3L, 15L, 17L, 22L, 3L, 3L), srcfile = <environment>, class = "srcref"), 
    structure(c(4L, 2L, 5L, 15L, 9L, 15L, 4L, 5L), srcfile = <environment>, class = "srcref"), 
    structure(c(7L, 2L, 8L, 6L, 9L, 20L, 7L, 8L), srcfile = <environment>, class = "srcref"))

As you can "see" (the last two numbers in each structure are begin/end line), the expressions a <- 0 and a <- 1 map to the same line...

Good luck!

It sounds like what you mean by line profiler is something that measures time spent (i.e. instrumenting) within each line. I hope what you mean by time is wall-clock time, because in real good-size software if you only look at CPU time you're going to be missing a lot.

Another way to do it is stack-sampling on wall-clock time, as in the Zoom and LTProf profilers. Since every line of a stack sample can be localized to a line of code using only a map or pdb file, in the same way as debuggers do, there is no need to parse or modify the source.

The percent of time taken by a line of code is simply the percent of stack samples containing it. Since you are working at the line level, there is no need to distinguish between exclusive (self) time and inclusive time. This is because the line's percent of time active is what matters, whether or not it is a call to another function, a call to a blind system function, or just a call to microcode.

The advantage of looking at percents, instead of absolute times, is you don't need to worry about the app being slowed down, either by the sampling itself, or by competition with other processes, because those things don't affect the percents very much.

Also you don't have to worry about recursion. If a line of code is in a recursive function and appears more than once on a sample, that's OK. It still counts as only one sample containing the line. The reason that's OK is, if that line of code could somehow be made to take no time (such as by removing it) that sample would not have occurred. Therefore the samples containing that line would be removed from the sample set, and the program's total time would decrease by the same amount as the fraction of samples removed. That's irrespective of recursion.

You also don't need to count how many times a line of code is executed, because the number that matters for locating code you should optimize is the percent of time it's active.

Here's more explanation of these issues.

继续阅读：language-agnostic parse-tree profiling r

Does a line profiler for code require a parse tree and is that sufficient?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？