What is the nicest way to parse this in C++?
In my program, I have a list of "server address" in the following format:
host[:port]
The brackets here, indicate that the port
is optional.
host
can be a hostname, an IPv4 or IPv6 address (possibly in "bracket-enclosed" notation).port
, if present can be a numeric port number or a service string (like: "http" or "ssh").
If port
is 开发者_开发问答present and host
is an IPv6 address, host
must be in "bracket-enclosed" notation (Example: [::1]
)
Here are some valid examples:
localhost
localhost:11211
127.0.0.1:http
[::1]:11211
::1
[::1]
And an invalid example:
::1:80 // Invalid: Is this the IPv6 address ::1:80 and a default port, or the IPv6 address ::1 and the port 80 ?
::1:http // This is not ambigous, but for simplicity sake, let's consider this is forbidden as well.
My goal is to separate such entries in two parts (obviously host
and port
). I don't care if either the host
or port
are invalid as long as they don't contain a non-bracket-enclosed :
(290.234.34.34.5
is ok for host
, it will be rejected in the next process); I just want to separate the two parts, or if there is no port
part, to know it somehow.
I tried to do something with std::stringstream
but everything I come up to seems hacky and not really elegant.
How would you do this in C++
?
I don't mind answers in C
but C++
is prefered. Any boost
solution is welcome as well.
Thank you.
Have you looked at boost::spirit? It might be overkill for your task, though.
Here's a simple class that uses boost::xpressive to do the job of verifying the type of IP address and then you can parse the rest to get the results.
Usage:
const std::string ip_address_str = "127.0.0.1:3282";
IpAddress ip_address = IpAddress::Parse(ip_address_str);
std::cout<<"Input String: "<<ip_address_str<<std::endl;
std::cout<<"Address Type: "<<IpAddress::TypeToString(ip_address.getType())<<std::endl;
if (ip_address.getType() != IpAddress::Unknown)
{
std::cout<<"Host Address: "<<ip_address.getHostAddress()<<std::endl;
if (ip_address.getPortNumber() != 0)
{
std::cout<<"Port Number: "<<ip_address.getPortNumber()<<std::endl;
}
}
The header file of the class, IpAddress.h
#pragma once
#ifndef __IpAddress_H__
#define __IpAddress_H__
#include <string>
class IpAddress
{
public:
enum Type
{
Unknown,
IpV4,
IpV6
};
~IpAddress(void);
/**
* \brief Gets the host address part of the IP address.
* \author Abi
* \date 02/06/2010
* \return The host address part of the IP address.
**/
const std::string& getHostAddress() const;
/**
* \brief Gets the port number part of the address if any.
* \author Abi
* \date 02/06/2010
* \return The port number.
**/
unsigned short getPortNumber() const;
/**
* \brief Gets the type of the IP address.
* \author Abi
* \date 02/06/2010
* \return The type.
**/
IpAddress::Type getType() const;
/**
* \fn static IpAddress Parse(const std::string& ip_address_str)
*
* \brief Parses a given string to an IP address.
* \author Abi
* \date 02/06/2010
* \param ip_address_str The ip address string to be parsed.
* \return Returns the parsed IP address. If the IP address is
* invalid then the IpAddress instance returned will have its
* type set to IpAddress::Unknown
**/
static IpAddress Parse(const std::string& ip_address_str);
/**
* \brief Converts the given type to string.
* \author Abi
* \date 02/06/2010
* \param address_type Type of the address to be converted to string.
* \return String form of the given address type.
**/
static std::string TypeToString(IpAddress::Type address_type);
private:
IpAddress(void);
Type m_type;
std::string m_hostAddress;
unsigned short m_portNumber;
};
#endif // __IpAddress_H__
The source file for the class, IpAddress.cpp
#include "IpAddress.h"
#include <boost/xpressive/xpressive.hpp>
namespace bxp = boost::xpressive;
static const std::string RegExIpV4_IpFormatHost = "^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$";
static const std::string RegExIpV4_StringHost = "^[A-Za-z0-9]+(\\:[0-9]+)?$";
IpAddress::IpAddress(void)
:m_type(Unknown)
,m_portNumber(0)
{
}
IpAddress::~IpAddress(void)
{
}
IpAddress IpAddress::Parse( const std::string& ip_address_str )
{
IpAddress ipaddress;
bxp::sregex ip_regex = bxp::sregex::compile(RegExIpV4_IpFormatHost);
bxp::sregex str_regex = bxp::sregex::compile(RegExIpV4_StringHost);
bxp::smatch match;
if (bxp::regex_match(ip_address_str, match, ip_regex) || bxp::regex_match(ip_address_str, match, str_regex))
{
ipaddress.m_type = IpV4;
// Anything before the last ':' (if any) is the host address
std::string::size_type colon_index = ip_address_str.find_last_of(':');
if (std::string::npos == colon_index)
{
ipaddress.m_portNumber = 0;
ipaddress.m_hostAddress = ip_address_str;
}else{
ipaddress.m_hostAddress = ip_address_str.substr(0, colon_index);
ipaddress.m_portNumber = atoi(ip_address_str.substr(colon_index+1).c_str());
}
}
return ipaddress;
}
std::string IpAddress::TypeToString( Type address_type )
{
std::string result = "Unknown";
switch(address_type)
{
case IpV4:
result = "IP Address Version 4";
break;
case IpV6:
result = "IP Address Version 6";
break;
}
return result;
}
const std::string& IpAddress::getHostAddress() const
{
return m_hostAddress;
}
unsigned short IpAddress::getPortNumber() const
{
return m_portNumber;
}
IpAddress::Type IpAddress::getType() const
{
return m_type;
}
I have only set the rules for IPv4 because I don't know the proper format for IPv6. But I'm pretty sure it's not hard to implement it. Boost Xpressive is just a template based solution and hence do not require any .lib files to be compiled into your exe, which I believe makes is a plus.
By the way just to break down the format of regex in a nutshell...
^ = start of string
$ = end of string
[] = a group of letters or digits that can appear
[0-9] = any single-digit between 0 and 9
[0-9]+ = one or more digits between 0 and 9
the '.' has a special meaning for regex but since our format has 1 dot in an ip-address format we need to specify that we want a '.' between digits by using '\.'. But since C++ needs an escape sequence for '\' we'll have to use "\\."
? = optional component
So, in short, "^[0-9]+$" represents a regex, which is true for an integer.
"^[0-9]+\.$" means an integer that ends with a '.'
"^[0-9]+\.[0-9]?$" is either an integer that ends with a '.' or a decimal.
For an integer or a real number, the regex would be "^[0-9]+(\.[0-9]*)?$".
RegEx an integer that is between 2 and 3 numbers is "^[0-9]{2,3}$".
Now to break down the format of the ip address:
"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$"
This is synonymous to: "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+(\:[0-9]{1,5})?$", which means:
[start of string][1-3 digits].[1-3 digits].[1-3 digits].[1-3 digits]<:[1-5 digits]>[end of string]
Where, [] are mandatory and <> are optional
The second RegEx is simpler than this. It's just a combination of a alpha-numeric value followed by an optional colon and port-number.
By the way, if you would like to test out RegEx you can use this site.
Edit: I failed to notice that you optionally had http instead of port number. For that you can change the expression to:
"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:([0-9]{1,5}|http|ftp|smtp))?$"
This accepts formats like:
127.0.0.1
127.0.0.1:3282
127.0.0.1:http
217.0.0.1:ftp
18.123.2.1:smtp
I'm late to the party, but I was googling for just how to do this. Spirit and C++ have grown up a lot, so let me add a 2021 take:
Live On Compiler Explorer
#include <fmt/ranges.h>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_tuple.hpp>
auto parse_server_address(std::string_view address_spec,
std::string_view default_service = "https")
{
using namespace boost::spirit::x3;
auto service = ':' >> +~char_(":") >> eoi;
auto host = '[' >> *~char_(']') >> ']' // e.g. for IPV6
| raw[*("::" | (char_ - service))];
std::tuple<std::string, std::string> result;
parse(begin(address_spec), end(address_spec),
expect[host >> (service | attr(default_service))], result);
return result;
}
int main() {
for (auto input : {
"localhost",
"localhost:11211",
"127.0.0.1:http",
"[::1]:11211",
"::1", "[::1]",
"::1:80", // Invalid: Is this the IPv6 address ::1:80 and a default
// port, or the IPv6 address ::1 and the port 80 ?
"::1:http", // This is not ambigous, but for simplicity sake, let's
// consider this is forbidden as well.
})
{
// auto [host, svc] = parse_server_address(input);
fmt::print("'{}' -> {}\n", input, parse_server_address(input));
}
}
Printing
'localhost' -> ("localhost", "https")
'localhost:11211' -> ("localhost", "11211")
'127.0.0.1:http' -> ("127.0.0.1", "http")
'[::1]:11211' -> ("::1", "11211")
'::1' -> ("::1", "https")
'[::1]' -> ("::1", "https")
'::1:80' -> ("::1", "80")
'::1:http' -> ("::1", "http")
BONUS
Validating/resolving the addresses. The parsing is 100% unchanged, just using Asio to resolve the results, also validating them:
#include <boost/asio.hpp>
#include <iostream>
#include <iomanip>
using boost::asio::ip::tcp;
using boost::asio::system_executor;
using boost::system::error_code;
int main() {
tcp::resolver r(system_executor{});
error_code ec;
for (auto input : {
"localhost",
"localhost:11211",
"127.0.0.1:http",
"[::1]:11211",
"::1", "[::1]",
"::1:80", // Invalid: Is this the IPv6 address ::1:80 and a default
// port, or the IPv6 address ::1 and the port 80 ?
"::1:http", // This is not ambigous, but for simplicity sake, let's
// consider this is forbidden as well.
"stackexchange.com",
"unknown-host.xyz",
})
{
auto [host, svc] = parse_server_address(input);
for (auto&& endpoint : r.resolve({host, svc}, ec)) {
std::cout << input << " -> " << endpoint.endpoint() << "\n";
}
if (ec.failed()) {
std::cout << input << " -> unresolved: " << ec.message() << "\n";
}
}
}
Prints (limited network Live On Wandbox and Coliruhttp://coliru.stacked-crooked.com/a/497d8091b40c9f2d)
localhost -> 127.0.0.1:443
localhost:11211 -> 127.0.0.1:11211
127.0.0.1:http -> 127.0.0.1:80
[::1]:11211 -> [::1]:11211
::1 -> [::1]:443
[::1] -> [::1]:443
::1:80 -> [::1]:80
::1:http -> [::1]:80
stackexchange.com -> 151.101.129.69:443
stackexchange.com -> 151.101.1.69:443
stackexchange.com -> 151.101.65.69:443
stackexchange.com -> 151.101.193.69:443
unknown-host.xyz -> unresolved: Host not found (authoritative)
std::string host, port;
std::string example("[::1]:22");
if (example[0] == '[')
{
std::string::iterator splitEnd =
std::find(example.begin() + 1, example.end(), ']');
host.assign(example.begin(), splitEnd);
if (splitEnd != example.end()) splitEnd++;
if (splitEnd != example.end() && *splitEnd == ':')
port.assign(splitEnd, example.end());
}
else
{
std::string::iterator splitPoint =
std::find(example.rbegin(), example.rend(), ':').base();
if (splitPoint == example.begin())
host = example;
else
{
host.assign(example.begin(), splitPoint);
port.assign(splitPoint, example.end());
}
}
As mentioned, Boost.Spirit.Qi could handle this.
As mentioned, it's overkill (really).
const std::string line = /**/;
if (line.empty()) return;
std::string host, port;
if (line[0] == '[') // IP V6 detected
{
const size_t pos = line.find(']');
if (pos == std::string::npos) return; // Error handling ?
host = line.substr(1, pos-1);
port = line.substr(pos+2);
}
else if (std::count(line.begin(), line.end(), ':') > 1) // IP V6 without port
{
host = line;
}
else // IP V4
{
const size_t pos = line.find(':');
host = line.substr(0, pos);
if (pos != std::string::npos)
port = line.substr(pos+1);
}
I really don't think this warrants a parsing library, it might not gain in readability because of the overloaded use of :
.
Now my solution is certainly not flawless, one could for example wonder about its efficiency... but I really think it's sufficient, and at least you'll not lose the next maintainer, because from experience Qi expressions can be all but clear!
#pragma once
#ifndef ENDPOINT_HPP
#define ENDPOINT_HPP
#include <string>
using std::string;
struct Endpoint {
string
Host,
Port;
enum : char {
V4,
V6
} Type = V4;
__inline Endpoint(const string& text) {
bind(text);
}
private:
void __fastcall bind(const string& text) {
if (text.empty())
return;
auto host { text };
string::size_type bias = 0;
constexpr auto NONE = string::npos;
while (true) {
bias = host.find_first_of(" \n\r\t", bias);
if (bias == NONE)
break;
host.erase(bias, 1);
}
if (host.empty())
return;
auto port { host };
bias = host.find(']');
if (bias != NONE) {
host.erase(bias);
const auto skip = text.find('[');
if (skip == NONE)
return;
host.erase(0, skip + 1);
Type = V6;
++bias;
}
else {
bias = host.find(':');
if (bias == NONE)
port.clear();
else {
const auto next = bias + 1;
if (host.length() == next)
return;
if (host[next] == ':') {
port.clear();
Type = V6;
}
else if (! bias)
host.clear();
else
host.erase(bias);
}
}
if (! port.empty())
Port = port.erase(0, bias + 1);
if (! host.empty())
Host = host;
}
};
#endif // ENDPOINT_HPP
If you are getting the port and host via a string or in C++ an array of characters; you could get the length of the string. Do a for loop until the end of the string and go until you find a single colon by itself and the split the string into two parts at that location.
for (int i=0; i<string.length; i++) {
if (string[i] == ':') {
if (string[i+1] != ':') {
if (i > 0) {
if (string[i-1] != ':') {
splitpoint = i;
} } } } }
Just a suggestion its kinda deep and I'm sure there is a more efficient way but hope this helps, Gale
精彩评论