GCC -O2 with -march / -ftree-vectorize
I am trying out several compiler switches against a program that performs sobel kernel convolution on two images( 2000Hx3000W and 6800Hx8500W ). There are some observations that I am not able to interprete, following are the data - compiler flags and time taken in secs (please focus on the last column, as it signifies convolution on Y axis 开发者_开发知识库for the larger image):
O2-march=barcelona                  0.1483326   0.833264    1.6018882   28.6711242
O2-ftree-vectorize                  0.1462104   0.847973    1.506708    26.628592
O2                                  0.1468406   0.8368156   1.5999718   20.61377564
O2-ftree-vectorize-march=barcelona  0.1441898   0.827366    1.4687354   15.2572644
I expected -O2-march=barcelona to be moderately better, considering the machine I am running on is AMD barcelona. Any ideas as to why -O2 is better than -O2 -march?
About -ftree-vectorize, it should be able to run instructions in parallel since my loop is dependence free. But then, -O2-ftree-vectorize-march=barcelona is the best of the lot, when individually there are reasonable differences in timing.
It would be great if I could understand this behavior.
Regards,
Sayan 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论