Calculating statistics directly from a CSV file

2022-12-27 17:45 问答作者：

I have a transaction log file in CSV format that I want use to run statistics. The log has the following fields:

date:  Time/date stamp
salesperson:  The username of the person who closed the sale
promo:  sum total of items in the sale that were promotions.
amount:  grand total of the sale

I'd like to get the following statistics:

salesperson:  The username of the salesperson being analyzed.
minAmount:  The smallest grand total of this salesperson's transaction.
avgAmount:  The mean grand total..
maxAmount:  The largest grand total..
mi开发者_高级运维nPromo:  The smallest promo amount by the salesperson.
avgPromo:  The mean promo amount...

I'm tempted to build a database structure, import this file, write SQL, and pull out the stats. I don't need anything more from this data than these stats. Is there an easier way? I'm hoping some bash script could make this easy.

TxtSushi does this:

tssql -table trans transactions.csv \
'select
    salesperson,
    min(as_real(amount)) as minAmount,
    avg(as_real(amount)) as avgAmount,
    max(as_real(amount)) as maxAmount,
    min(as_real(promo)) as minPromo,
    avg(as_real(promo)) as avgPromo
from trans
group by salesperson'

I have a bunch of example scripts showing how to use it.

Edit: fixed syntax

Could also bang out an awk script to do it. It's only CSV with a few variables.

You can loop through the lines in the CSV and use bash script variables to hold your min/max amounts. For the average, just keep a running total and then divide by the total number of lines (not counting a possible header).

Here are some useful snippets for working with CSV files in bash.

If your data might be quoted (e.g. because a field contains a comma), processing with bash, sed, etc. becomes much more complex.

继续阅读：bash csv

Calculating statistics directly from a CSV file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？