Composability of STL algorithms
The STL algorithms are a pretty useful thing in C++. But one thing that kind of irks me is that they seem to lack composability.
For example, let's say I have a vector<pair<int, int>>
and want to transform that to a vector<int>
containing only the second
member of the pair. That's simple enough:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;
std::transform(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return p.second; });
Or maybe I want to filter the vector
for only those pairs whose first
member is even. Also pretty simple:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> result;
std::copy_if(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return (p.first % 2) == 0; });
But what if I want to do both? There is no transform_if
algorithm, and using both transform
and copy_if
seems to require allocating a temporary vector
to hold the intermediate result开发者_JS百科:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> temp;
std::vector<int> result;
std::copy_if(values.begin(), values.end(), std::back_inserter(temp),
[] (std::pair<int, int> p) { return (p.first % 2) == 0; });
std::transform(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return p.second; });
This seems rather wasteful to me. The only way I can think of to avoid the temporary vector is to abandon transform
and copy_if
and simply use for_each
(or a regular for loop, whichever suits your fancy):
std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;
std::for_each(values.begin(), values.end(),
[&result] (std::pair<int, int> p)
{ if( (p.first % 2) == 0 ) result.push_back(p.second); });
Am I missing something here? Is there a good way to compose two existing STL algorithms into a new one without needing temporary storage?
You're right. You can use Boost.Range adaptors to achieve composition.
I think the problem is unfortunately structural
- C++ uses two iterators to represent a sequence
- C++ functions are single-valued
so you cannot chain them because a function cannot return "a sequence".
An option would have been to use single-object sequences instead (like the range approach from boost). This way you could have combined the result of one processing as the input of another... (one object -> one object).
In the standard C++ library instead the processing is (two objects -> one object) and it's clear that this cannot be chained without naming the temporary object.
Back in 2000, the problem was already noted. Gary Powell and Martin Weiser came up with a "view" concept, and coined the name "View Template Library". It didn't take off then but the idea makes sense. A "view" adaptor essentially applies an on-the-fly transform. For instance, it can adapt the value_type
.
The concept probably should be readdressed now we have C++0x. We've made quite some progress in generic programming since 2000.
For example, let's use the vector<pair<int, int>>
to vector<int>
example. That could be quite simple:
std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, [](std::pair<int, int> p) { return p.first });
std::vector<int> result(view.begin(), view.end());
Or, using the boost::bind
techniques, even simpler:
std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, &std::pair<int, int>::first);
std::vector<int> result(view.begin(), view.end());
Since C++20 you can use std::ranges::copy
together with the range adaptors std::views::filter
and std::views::values
from the Ranges library as follows:
int main() {
std::vector<std::pair<int, int>> values = { {1,2}, {4,5}, {6,7}, {9,10} };
std::vector<int> result;
auto even = [](const auto& p) { return (p.first % 2) == 0; };
std::ranges::copy(values | std::views::filter(even) | std::views::values,
std::back_inserter(result));
for (int i : result)
std::cout << i << std::endl;
return 0;
}
Output:
5
7
In the solution above, no temporary vector is created for an intermediate result, because the view adaptors create ranges that don't contain elements. These ranges are just views over the input vector, but with a customized iteration behavior.
Code on Wandbox
Not sure if this is still active, but... A new light wait header only lib that does what you describe. Doc talks about lazy evaluation and com compossible generators.
Doc snippet:
- Read in up to 10 integers from a file "test.txt".
- filter for the even numbers, square them and sum their values.
int total = lz::read<int>(ifstream("test.txt")) | lz::limit(10) |
lz::filter([](int i) { return i % 2 == 0; }) |
lz::map([](int i) { return i * i; }) | lz::sum();
you can split that line up into multiple expressions.
auto numbers = lz::read<int>(ifstream("test.txt")) | lz::limit(10);
auto evenFilter = numbers | lz::filter([](int i) { return i % 2 == 0; });
auto squares = evenFilter | lz::map([](int i) { return i * i; });
int total = squares | lz::sum();
- Even though this expression is split over multiple variable assignments, it is not any less efficient.
- Each intermediate variable simply describes a unit of code to be executed. All held in stack.
https://github.com/SaadAttieh/lazyCode
精彩评论