开发者

Calculating statistics directly from a CSV file

I have a transaction log file in CSV format that I want use to run statistics. The log has the following fields:

date:  Time/date stamp
salesperson:  The username of the person who closed the sale
promo:  sum total of items in the sale that were promotions.
amount:  grand total of the sale

I'd like to get the following statistics:

salesperson:  The username of the salesperson being analyzed.
minAmount:  The smallest grand total of this salesperson's transaction.
avgAmount:  The mean grand total..
maxAmount:  The largest grand total..
mi开发者_高级运维nPromo:  The smallest promo amount by the salesperson.
avgPromo:  The mean promo amount...

I'm tempted to build a database structure, import this file, write SQL, and pull out the stats. I don't need anything more from this data than these stats. Is there an easier way? I'm hoping some bash script could make this easy.


TxtSushi does this:

tssql -table trans transactions.csv \
'select
    salesperson,
    min(as_real(amount)) as minAmount,
    avg(as_real(amount)) as avgAmount,
    max(as_real(amount)) as maxAmount,
    min(as_real(promo)) as minPromo,
    avg(as_real(promo)) as avgPromo
from trans
group by salesperson'

I have a bunch of example scripts showing how to use it.

Edit: fixed syntax


Could also bang out an awk script to do it. It's only CSV with a few variables.


You can loop through the lines in the CSV and use bash script variables to hold your min/max amounts. For the average, just keep a running total and then divide by the total number of lines (not counting a possible header).

Here are some useful snippets for working with CSV files in bash.

If your data might be quoted (e.g. because a field contains a comma), processing with bash, sed, etc. becomes much more complex.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜