Parallel distinct word count in Go
Jakob Østergaard presented this challenge:
Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.
H开发者_开发百科ow can we meet this challenge with parallel programming (preferably in Go, but a description in English will suffice)?
There are several possibilities, but I guess you mean "efficiently" ?
The general idea would be to split the text into manageable chunks, pile those chunks into a queue, and have multiple consumers handle the chunks.
This looks like a typical Map/Reduce application to me:
_ Worker_
/ \
/ \
Splitter--- Worker ---Aggregator
\ /
\_ Worker _/
Ideally the "multiple" queues would be a single one with multiple consumers, so that if one worker slows down the whole process does not slows as much.
I would also use a signal from the Splitter to the Workers to let them know the input has been fully consumed and they can start send their results to the Aggregator.
Here's the solution, in Go, to Jakob Østergaard's problem.
/*
The problem: Write a program that reads text from
standard-input, and returns (prints) the total
number of distinct words found in the text.
C versus C++, Jakob Østergaard, February 24th, 2004
http://unthought.net/c++/c_vs_c++.html
*/
package main
import (
"bufio"
"fmt"
"os"
"unicode"
)
func main() {
rdr := bufio.NewReader(os.Stdin)
words := make(map[string]bool, 1024*1024)
word := make([]int, 0, 256)
for {
r, n, _ := rdr.ReadRune()
if unicode.IsSpace(r) || n == 0 {
if len(word) > 0 {
words[string(word)] = true
word = word[:0]
}
if n == 0 {
break
}
} else {
word = append(word, r)
}
}
fmt.Println(len(words))
}
It's naive to add the phrase "parallel programming" to this and most other problems and expect some magical improvement. Reading input sequentially from a stream and performing trivial computation provides no significant opportunities for parallel computing.
精彩评论