开发者

How can I find the most common number in an input?

This is a very abstracted explanation of what I'm doing:

Say I have a text file full of numbers separated by newlines. Right now, I take these numbers and put them in开发者_运维知识库 a map<int, int>, where the key is the number, and the value is the frequency.

My end goal is a list of numbers sorted by frequency. How can I go about doing this? Note that a frequency can obviously show up more than once. The way I was thinking was to make a struct containing both a number and its frequency, and define < so that it never returns equality. I would then iterate through the map, putting all the elements into that struct and then into a set.


Once you've built the frequency map, copy its pairs to a std::vector<std::pair<int, int> > then std::sort the latter with the 3-args version of std::sort, which takes the comparator as the third arg; as a comparator, you can use one that compares the .second fields of the pairs first, and the .first ones (if you want) only to disambiguate the ordering of pairs whose .second fields are equal (but I don't think you really need the last bit since you don't care about the ordering for cases with equal frequency, do you?).


If this is simply an operation you wish to do on it's own (rather than a component you want to use), the most practical way for me would be to do something like this:

sort <filename> | uniq -c | sed 's/^[ \t]*//' | sort -rn

Of course it doesn't scale well compared to a linear algorithm, but it performs reasonably well for every day tasks, and it has the added advantage that you don't need to reinvent the wheel yourself.


EDIT: sort" sorts your file ; "uniq" groups all consecutive lines that are the same into a pair [frequency] [item] pair, for instance

12
12
24
25
25
25

becomes

    2 12
    1 24
    3 25

then I remove the trailing space with "sed" (if this doesn't work, it is because there is sometimes an issue with the way the tab character is inputted), then I sort the output according to frequency (the "n" option asks for a numerical sort, instead of lexicographic; the "r" option asks for a reverse sorting).

If you want to select another field for the sorting (say, because the number you are actually interested in counting the frequency are the THIRD field of a tab delimited line), then you can use "sort"'s selection feature:

sort -t'\t' -k3 <filename> | ...

This says that your input is a tab-delimited list of fields, and that you want to sort according to the third field.


EDIT 2: here is the line to do this according to the fourth field (and to remove all other fields)

cut -d'\t' -f4 <filename> | sort | uniq -c | sed 's/^[ \t]*//' | sort -rn


Iterate over the std:pair of the std::map and insert them into a std::priority_queue. Use a comparison function from the value.

Once you have your priority queue you can iterate over it.


In year 2022, the approach would be different. We can take advantage of new language features. And new containers.

Here especially the std::unordered_map or the std::mulitiset.

Then the task will be much simpler and the resulting code will be shorter.

Please see:

#include <iostream>
#include <fstream>
#include <string>
#include <filesystem>
#include <unordered_map>
#include <set>
#include <type_traits>
#include <utility>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<int, std::size_t>;

// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;

// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Sorter = std::multiset<Pair, Comp>;
// ------------------------------------------------------------

int main() {

    // Open the source file and check, if it could be opened
    if (std::ifstream inFileStream{ "r:\\file.txt" }; inFileStream) {

        // Here we will count the extensions of the file names
        Counter counter{};

        // Read all values from file and count them
        for (Pair::first_type i{}; inFileStream >> i; counter[i]++)
            ;

        // Sort them 
        Sorter sorter(counter.begin(), counter.end());

        // Show result to the user. 
        for (const auto& [value, count] : sorter) std::cout << value << " --> " << count << '\n';
    } 
    else {
        // file could not be opened
        std::cerr << "\n\n*** Error: Input file could not be opened\n\n";
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜