What's the easiest way to parse a string in C?

2022-12-29 23:54 问答作者：

I have to parse this string in C:

XFR 3 NS 207.46.106.118:1863 0 207.46.104.20:1863\r\n

And be able to get the 207.46.106.118 part and 1863 part (the first ip address).

I know I could go char by char and eventually find my way through it, but what's the easiest way to get this information, given tha开发者_运维问答t the IP address in the string could change to a different format (with less digits)?

You can use sscanf() from the C standard lib. Here's an example of how to get the ip and port as strings, assuming the part in front of the address is constant:

#include <stdio.h>

int main(void)
{
    const char *input = "XFR 3 NS 207.46.106.118:1863 0 207.46.104.20:1863\r\n";

    const char *format = "XFR 3 NS %15[0-9.]:%5[0-9]";
    char ip[16] = { 0 };  // ip4 addresses have max len 15
    char port[6] = { 0 }; // port numbers are 16bit, ie 5 digits max

    if(sscanf(input, format, ip, port) != 2)
        puts("parsing failed");
    else printf("ip = %s\nport = %s\n", ip, port);

    return 0;
}

The important parts of the format strings are the scanset patterns %15[0-9.] and %5[0-9], which will match a string of at most 15 characters composed of digits or dots (ie ip addresses won't be checked for well-formedness) and a string of at most 5 digits respectively (which means invalid port numbers above 2^16 - 1 will slip through).

Depends on what defines the format of the document. In this case, it may be as simple as tokenizing the string and looking through the tokens for what you want. Simply use strtok and split on spaces to grab the 207.46.106.118:1863 and then you can tokenize that again (or simply scan for the : manually) to get the proper components.

You could use strtok to tokenize breaking on space, or you could use one of the scanf family to pull out data as well.

There is a big caveat in all of this though, these are functions that are notorious for security and mishandling bad input. YMMV.

Loop through until you get the first '.', and loop back until you find ' '. The loop forward until you find ':', building sub-strings every time you meet '.' or ':'. You can check the number of substrings and their lengths as simple error checking. Then loop until you find a ' ' and you have the 1863 part.

This would be robust if the beginning of the string doesn't vary much. And also very easy. You could make it even simpler if the string always begins with "XFR 3 NS ".

In this case, strok() is of trivial use and would be my choice. For safety, you might count the ':' in your string and proceed if there is exactly one ':'.

If the strings to be parsed are well-formatted then I'd go with Daniel and Ukko's suggestion to use strtok().

A word of warning though: strtok() modifies the string that it parses. Not always what you want.

This may be overkill, since you said you didn't want to use a regex library, but the re2c program will give you regex parsing without the library: it generates the DFSM for a regular expression as C code. The regexps are specified in comments embedded in C code.

And what seems like overkill now may become a comfort to you later should you have to parse the rest of the string; it is a lot easier to modify a few regexps to adjust or add new syntax than to modify a bunch of ad hoc tokenizing code. And it makes the structure of what you are parsing a lot clearer in your code.

继续阅读：c parsing

What's the easiest way to parse a string in C?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？