开发者

Plotting Histogram: How can I do it from scratch using data stored in a database?

I have some data stored in a database like this:

TableName: faults Table:

+------------+--------------+
| fault_type | total        |
+------------+--------------+
|    1       |            1 开发者_如何转开发| 
|    2       |            3 | 
|    3       |            8 | 
|    4       |            2 | 
.............................

How am I supposed to get a histogram plot starting from this table?


The solution below assumes that you have MySQL, Python and GNUPlot. The specific details can be fine tuned if necessary. Posting it so that it could be a baseline for other peers.

Step #1: Decide the type of graph.

If it is a frequency plot of some kind, then a simple SQL query should do the trick:

select total, count(total) from faults GROUP BY total;

If you need to specify bin sizes, then proceed to the next step.

Step #2: Make sure you are able to connect to MySQL using Python. You can use the MySQLdb import to do this.

After that, the python code to generate data for a histogram plot is the following (this was written precisely in 5 minutes so it is very crude):

import MySQLdb

def DumpHistogramData(databaseHost, databaseName, databaseUsername, databasePassword, dataTableName, binsTableName, binSize, histogramDataFilename):
    #Open a file for writing into
    output = open("./" + histogramDataFilename, "w")

    #Connect to the database
    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
    cursor = db.cursor()

    #Form the query
    sql = """select b.*, count(*) as total 
            FROM """ + binsTableName + """ b 
            LEFT OUTER JOIN """ + dataTableName + """ a 
            ON a.total between b.min AND b.max 
            group by b.min;"""
    cursor.execute(sql)

    #Get the result and print it into a file for further processing
    count = 0;
    while True:
        results = cursor.fetchmany(10000)
        if not results:
            break
        for result in results:
            #print >> output, str(result[0]) + "-" + str(result[1]) + "\t" + str(result[2])
    db.close()

def PrepareHistogramBins(databaseHost, databaseName, databaseUsername, databasePassword, binsTableName, maxValue, totalBins):

    #Connect to the database    
    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
    cursor = db.cursor()

    #Check if the table was already created
    sql = """DROP TABLE IF EXISTS """ + binsTableName
    cursor.execute(sql)

    #Create the table
    sql = """CREATE TABLE """ + binsTableName + """(min int(11), max int(11));"""
    cursor.execute(sql)

    #Calculate the bin size
    binSize = maxValue/totalBins

    #Generate the bin sizes
    for i in range(0, maxValue, binSize):
        if i is 0:
            min = i
            max = i+binSize
        else:
            min = i+1
            max = i+binSize
        sql = """INSERT INTO """ + binsTableName + """(min, max) VALUES(""" + str(min) + """, """ + str(max) + """);"""
        cursor.execute(sql)
    db.close()
    return binSize

binSize = PrepareHistogramBins("localhost", "testing", "root", "", "bins", 5000, 100)
DumpHistogramData("localhost", "testing", "root", "", "faults", "bins", binSize, "histogram")

Step #3: Use GNUPlot to generate the histogram. You can use the following script as a starting point (generates an eps image file):

set terminal postscript eps color lw 2 "Helvetica" 20
set output "output.eps"
set xlabel "XLABEL"
set ylabel "YLABEL"
set title "TITLE"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
set key autotitle columnheader
set xtics rotate by -45
plot "input" using 1:2 with linespoints ls 1

Save the above script into some arbitrary file say, sample.script. Proceed to the next step.

Step #4: Use gnuplot with the above input script to generate an eps file

gnuplot sample.script

Nothing complicated but I figured a couple of bits from this code can be reused. Again, like I said, it is not perfect but you can get the job done :)

Credits:

  • Ofri Raviv (for helping me out with the MySQL query in this post: Getting data for histogram plot)

  • Myself (for writing the python and gnuplot script :D)


This blog article may help you! It talks about statistic using gnuplot and plot the result into histogram.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜