开发者

Manipulating data text file with bash command?

I was given this text file, call stock.txt, the content of the text file is:

pepsi;drinks;3
fries;snacks;6
apple;fruits;9
baron;drinks;7
orange;fruits;2
chips;snacks;8

I will need to开发者_如何转开发 use bash-script to come up this output:

Total amount for drinks: 10
Total amount for snacks: 14
Total amount for fruits: 11
Total of everything: 35

My gut tells me I will need to use sed, group, grep and something else.

Where should I start?


I would break the exercise down into steps

Step 1: Read the file one line at a time

while read -r line
do
    # do something with $line
done

Step 2: Pattern match (drinks, snacks, fruits) and do some simple arithmetic. This step requires that you tokenized each line which I'll leave an exercise for you to figure out.

if [[ "$line" =~ "drinks" ]]
then
    echo "matched drinks"
    .
    .
    .
fi 


Pure Bash. A nice application for an associative array:

declare -A category                  # associative array
IFS=';'
while read name cate price ; do
  ((category[$cate]+=price))
done < stock.txt

sum=0
for cate in ${!category[@]}; do       # loop over the indices
  printf "Total amount of %s: %d\n" $cate ${category[$cate]}
  ((sum+=${category[$cate]}))
done

printf "Total amount of everything: %d\n" $sum


There is a short description here about processing comma separated files in bash here:

http://www.cyberciti.biz/faq/unix-linux-bash-read-comma-separated-cvsfile/

You could do something similar. Just change IFS from comma to semicolon.

Oh yeah, and a general hint for learning bash: man is your friend. Use this command to see manual pages for all (or most) of commands and utilities.

Example: man read shows the manual page for read command. On most systems it will be opened in less, so you should exit the manual by pressing q (may be funny, but it took me a while to figure that out)


The easy way to do this is using a hash table, which is supported directly by bash 4.x and of course can be found in awk and perl. If you don't have a hash table then you need to loop twice: once to collect the unique values of the second column, once to total.

There are many ways to do this. Here's a fun one which doesn't use awk, sed or perl. The only external utilities I've used here are cut, sort and uniq. You could even replace cut with a little more effort. In fact lines 5-9 could have been written more easily with grep, (grep $kind stock.txt) but I avoided that to show off the power of bash.

for kind in $(cut -d\; -f 2 stock.txt | sort | uniq) ; do
    total=0
    while read d ; do
        total=$(( total+d ))
    done < <(
        while read line ; do 
            [[ $line =~ $kind ]] && echo $line
        done < stock.txt | cut -d\; -f3
    )

    echo "Total amount for $kind: $total" 
done

We lose the strict ordering of your original output here. An exercise for you might be to find a way not to do that.

Discussion: The first line describes a sub-shell with a simple pipeline using cut. We read the third field from the stock.txt file, with fields delineated by ;, written \; here so the shell does not interpret it. The result is a newline-separated list of values from stock.txt. This is piped to sort, then uniq. This performs our "grouping" step, since the pipeline will output an alphabetic list of items from the second column but will only list each item once no matter how many times it appeared in the input file.

Also on the first line is a typical for loop: For each item resulting from the sub-shell we loop once, storing the value of the item in the variable kind. This is the other half of the grouping step, making sure that each "Total" output line occurs once.

On the second line total is initialized to zero so that it always resets whenever a new group is started.

The third line begins the 'totaling' loop, in which for the current kind we find the sum of its occurrences. here we declare that we will read the variable d in from stdin on each iteration of the loop.

On the fourth line the totaling actually occurs: Using shell arithmatic we add the value in d to the value in total.

Line five ends the while loop and then describes its input. We use shell input redirection via < to specify that the input to the loop, and thus to the read command, comes from a file. We then use process substitution to specify that the file will actually be the results of a command.

On the sixth line the command that will feed the while-read loop begins. It is itself another while-read loop, this time reading into the variable line. On the seventh line the test is performed via a conditional construct. Here we use [[ for its =~ operator, which is a pattern matching operator. We are testing to see whether $line matches our current $kind.

On the eighth line we end the inner while-read loop and specify that its input comes from the stock.txt file, then we pipe the output of the entire loop, which by now is simply all lines matching $kind, to cut and instruct it to show only the third field, which is the numeric field. On line nine we then end the process substitution command, the output of which is a newline-delineated list of numbers from lines which were of the group specified by kind.

Given that the total is now known and the kind is known it is a simple matter to print the results to the screen.


The below answer is OP's. As it was edited in the question itself and OP hasn't come back for 6 years, I am editing out the answer from the question and posting it as wiki here.


My answer, to get the total price, I use this:

...
PRICE=0
IFS=";"     # new field separator, the end of line   
while read name cate price
do
let PRICE=PRICE+$price
done < stock.txt
echo $PRICE

When I echo, its :35, which is correct. Now I will moving on using awk to get the sub-category result.

Whole Solution:

Thanks guys, I manage to do it myself. Here is my code:

#!/bin/bash
INPUT=stock.txt
PRICE=0
DRINKS=0
SNACKS=0
FRUITS=0
old_IFS=$IFS      # save the field separator   
IFS=";"     # new field separator, the end of line   
while read name cate price
do
    if [ $cate = "drinks" ]; then   
        let DRINKS=DRINKS+$price
fi

if [ $cate = "snacks" ]; then
        let SNACKS=SNACKS+$price
fi

if [ $cate = "fruits" ]; then
        let FRUITS=FRUITS+$price
fi

# Total
let PRICE=PRICE+$price
done < $INPUT

echo -e "Drinks: " $DRINKS
echo -e "Snacks: " $SNACKS
echo -e "Fruits: " $FRUITS
echo -e "Price " $PRICE 
IFS=$old_IFS
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜