开发者

Plotting statistical data, preferably using gnuplot

I have a number of numerical datasets that I've computed averages, medians, deviations, minima and maxima from, and I'd like to visualize them (on Linux, without X, to an image file).

I've seen gnuplot's functionality for plotting error bar开发者_StackOverflow社区s, but I have a couple of problems with them: My datasets are packed rather tightly together along the X axis, and gnuplot's error bars take up a bit too much space along the X axis; and I've only seen it be able to plot a minimum and a maximum together with a data point in between (presumably for an average), and I see no good way of fitting the median and deviation into that model.

Does anyone know of a way to get around those problems in gnuplot? Or is there, perhaps, a better program than gnuplot altogether?


I use R's lattice package for graphing statistical data.

You might take a look at the R Graph Gallery for sample scripts that render error bars ("confidence intervals").

You don't need X to display graphs. Wrap the lattice plot function in a postscript device, before printing:

mtxf.ps <- "myBarchart.ps"
postscript(mtxf.ps,
           width = 6,
           height = 4,
           paper = 'special',
           horizontal = F)
mtx.p <- barchart(...) 
print(mtx.p)
dev.off()

That snippet of code prints the barchart to a PostScript file, which you can convert with Imagemagick to PNG or other formats:

$ convert -density 200 myBarchart.ps myBarchart.png

R is a bit of a weird language, but expressive once you figure out its quirks. This is a pretty good introductory book on R, and this is a pretty good book about how to use lattice in different scenarios.


I think that you should not put median and average into the same plot, because they are fundamentally different.

A standard way to display median statistics is the "box-and-whisker"-plot, which shows minimum, first quartile, median, third quartile, and maximum. In order to get that in gnuplot, you have to do several passes using multiplot:

set multiplot
set style fill empty
set boxwidth 0.4
plot "data" using 1:3:2:6:5 with candlesticks lt -1 lw 5 notitle,\
     '' using 1:4:4:4:4 with candlesticks lt -1 lw 10 notitle
set boxwidth 0.3
plot "data" using 1:2:2:2:2 with candlesticks lt -1 lw 5 notitle,\
     '' using 1:6:6:6:6 with candlesticks lt -1 lw 5 notitle

This assumes that your "data" file has the columns x-value, minimum, first quartile, median, third quartile, maximum in that order. Of course, you can play around with the boxwidths, line types (lt) and line widths (lw) to get what you need.

If you want average with standard deviation instead, you can use the standard errorbars; I believe that you can use boxwidth to adjust the size of the error bars.

In order to set the output type, use set terminal. I would recommend to use a vector format in order to avoid pixelization. You can see which terminals are available in your installation with help set terminal at the gnuplot prompt. Also, use the butt option to that, if available (that prevents lines from "overshooting").


In addition to Svante's answer, you can use

set bars small 

to remove the 'x-part' to the error bars. For example,

#!/bin/bash
echo "1 2 2.0 2.4
2 4 3.9 4.5
3 1.4 0.1 1.5
4 2.9 2.2 4" > "data.dat"
gnuplot<<EOF
set term png small; set output "data.png"
set xrange [0:5]; set yrange [0:5]
set bars small
plot "./data.dat" using 1:2:3:4 with errorbars
set output ; set term pop
EOF

Tom


You may look at MathGL -- it is GPL plotting library which can plot in console (don't need X). And it have large set of graphics types (including ones for 2- and 3-ranged data) than gnuplot.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜