Plotting Histogram: How can I do it from scratch using data stored in a database?
I have some data stored in a database like this:
TableName: faults Table:
+------------+--------------+
| fault_type | total |
+------------+--------------+
| 1 | 1 开发者_如何转开发|
| 2 | 3 |
| 3 | 8 |
| 4 | 2 |
.............................
How am I supposed to get a histogram plot starting from this table?
The solution below assumes that you have MySQL, Python and GNUPlot. The specific details can be fine tuned if necessary. Posting it so that it could be a baseline for other peers.
Step #1: Decide the type of graph.
If it is a frequency plot of some kind, then a simple SQL query should do the trick:
select total, count(total) from faults GROUP BY total;
If you need to specify bin sizes, then proceed to the next step.
Step #2: Make sure you are able to connect to MySQL using Python. You can use the MySQLdb import to do this.
After that, the python code to generate data for a histogram plot is the following (this was written precisely in 5 minutes so it is very crude):
import MySQLdb
def DumpHistogramData(databaseHost, databaseName, databaseUsername, databasePassword, dataTableName, binsTableName, binSize, histogramDataFilename):
#Open a file for writing into
output = open("./" + histogramDataFilename, "w")
#Connect to the database
db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
cursor = db.cursor()
#Form the query
sql = """select b.*, count(*) as total
FROM """ + binsTableName + """ b
LEFT OUTER JOIN """ + dataTableName + """ a
ON a.total between b.min AND b.max
group by b.min;"""
cursor.execute(sql)
#Get the result and print it into a file for further processing
count = 0;
while True:
results = cursor.fetchmany(10000)
if not results:
break
for result in results:
#print >> output, str(result[0]) + "-" + str(result[1]) + "\t" + str(result[2])
db.close()
def PrepareHistogramBins(databaseHost, databaseName, databaseUsername, databasePassword, binsTableName, maxValue, totalBins):
#Connect to the database
db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
cursor = db.cursor()
#Check if the table was already created
sql = """DROP TABLE IF EXISTS """ + binsTableName
cursor.execute(sql)
#Create the table
sql = """CREATE TABLE """ + binsTableName + """(min int(11), max int(11));"""
cursor.execute(sql)
#Calculate the bin size
binSize = maxValue/totalBins
#Generate the bin sizes
for i in range(0, maxValue, binSize):
if i is 0:
min = i
max = i+binSize
else:
min = i+1
max = i+binSize
sql = """INSERT INTO """ + binsTableName + """(min, max) VALUES(""" + str(min) + """, """ + str(max) + """);"""
cursor.execute(sql)
db.close()
return binSize
binSize = PrepareHistogramBins("localhost", "testing", "root", "", "bins", 5000, 100)
DumpHistogramData("localhost", "testing", "root", "", "faults", "bins", binSize, "histogram")
Step #3: Use GNUPlot to generate the histogram. You can use the following script as a starting point (generates an eps image file):
set terminal postscript eps color lw 2 "Helvetica" 20
set output "output.eps"
set xlabel "XLABEL"
set ylabel "YLABEL"
set title "TITLE"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
set key autotitle columnheader
set xtics rotate by -45
plot "input" using 1:2 with linespoints ls 1
Save the above script into some arbitrary file say, sample.script. Proceed to the next step.
Step #4: Use gnuplot with the above input script to generate an eps file
gnuplot sample.script
Nothing complicated but I figured a couple of bits from this code can be reused. Again, like I said, it is not perfect but you can get the job done :)
Credits:
Ofri Raviv (for helping me out with the MySQL query in this post: Getting data for histogram plot)
Myself (for writing the python and gnuplot script :D)
This blog article may help you! It talks about statistic using gnuplot and plot the result into histogram.
精彩评论