How to filter characters from a string with C++/Boost
This seems like such a basic question, so I apologize if it's already been answered somewhere (my searching didn't turn up anything).
I just want to filter a string object so that it contains only al开发者_运维问答phanumeric and space characters.
Here's what I tried:
#include "boost/algorithm/string/erase.hpp"
#include "boost/algorithm/string/classification.hpp"
std::wstring oldStr = "Bla=bla =&*\nSampleSampleSample ";
std::wstring newStr = boost::erase_all_copy(oldStr, !(boost::is_alnum() ||
boost::is_space()));
But the compiler is not at all happy with that -- it seems that I can only put a string in the second argument of erase_all_copy
and not this is_alnum()
stuff.
Is there some obvious solution I'm missing here?
With the std algorithms and Boost.Bind:
std::wstring s = ...
std::wstring new_s;
std::locale loc;
std::remove_copy_if(s.begin(), s.end(), std::back_inserter(new_s),
!(boost::bind(&std::isalnum<wchar_t>, _1, loc)||
boost::bind(&std::isspace<wchar_t>, _1, loc)
));
It's been years since I've used boost, but perhaps you could use erase_all_regex_copy() instead of erase_all_copy()? It might be a bit of a performance hit, but it may be your only choice aside from iterating over each element and checking manually. If you're not familiar with regular expressions, the expression you'd use in this case would be something like "[^a-zA-Z0-9 ]+".
For completeness' sake, some sample code:
#include "boost/regex.hpp"
#include "boost/algorithm/string/regex.hpp"
std::wstring oldStr = "Bla=bla =&*\nSampleSampleSample ";
std::wstring newStr = boost::erase_all_regex_copy(oldStr, boost::regex("[^a-zA-Z0-9 ]+"));
For those that are not so wise, here is ANSI and UNICODE functions based on @Éric Malenfant's answer:
std::string CleanString(const std::string& Input)
{
std::string clean_string;
std::locale loc;
try {
std::remove_copy_if(Input.begin(), Input.end(), std::back_inserter(clean_string),
!(boost::bind(&std::isalnum<unsigned char>, _1, loc) || boost::bind(&std::isspace<unsigned char>, _1, loc)
));
}
catch (const std::bad_alloc& e) {
std::cout << "Allocation failed: " << e.what() << '\n';
}
return clean_string;
}
std::wstring CleanString(const std::wstring& Input)
{
std::wstring clean_string;
std::locale loc;
try {
std::remove_copy_if(Input.begin(), Input.end(), std::back_inserter(clean_string),
!(boost::bind(&std::isalnum<wchar_t>, _1, loc) ||
boost::bind(&std::isspace<wchar_t>, _1, loc)
));
} catch (const std::bad_alloc& e) {
std::cout << "Allocation failed: " << e.what() << '\n';
}
return clean_string;
}
Online Demo: https://wandbox.org/permlink/MFTwXV4ZCi9nsdlC
Full test code for Linux:
#include <iostream>
#include <algorithm>
#include <cctype>
#include <boost/bind.hpp>
// Note on Linux we use char and not unsigned char!
std::string CleanString(const std::string& Input)
{
std::string clean_string;
std::locale loc;
try {
std::remove_copy_if(Input.begin(), Input.end(), std::back_inserter(clean_string),
!(boost::bind(&std::isalnum<char>, _1, loc) || boost::bind(&std::isspace<char>, _1, loc)
));
}
catch (const std::bad_alloc& e) {
std::cout << "Allocation failed: " << e.what() << '\n';
}
catch (...)
{
}
return clean_string;
}
std::wstring CleanString(const std::wstring& Input)
{
std::wstring clean_string;
std::locale loc;
try {
std::remove_copy_if(Input.begin(), Input.end(), std::back_inserter(clean_string),
!(boost::bind(&std::isalnum<wchar_t>, _1, loc) ||
boost::bind(&std::isspace<wchar_t>, _1, loc)
));
}
catch (const std::bad_alloc& e) {
std::cout << "Allocation failed: " << e.what() << '\n';
}
catch (...)
{
}
return clean_string;
}
int main()
{
std::string test_1 = "Bla=bla =&*\n Sample Sample Sample !$%^&*@~";
std::string new_test_1 = CleanString(test_1);
if (!new_test_1.empty())
{
std::cout << "ANSI: " << new_test_1 << std::endl;
}
std::wstring test_uc_1 = L"!$%^&*@~ test &*";
std::wstring new_test_uc_1 = CleanString(test_uc_1);
if (!new_test_uc_1.empty())
{
std::wcout << L"UNICODE: " << new_test_uc_1 << std::endl;
}
return 0;
}
Thanks to Éric Malenfant.
精彩评论