开发者

Good C string library [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 3 years ago.

The community reviewed whether to reopen this question 10 months ago and left it c开发者_如何学Closed:

Original close reason(s) were not resolved

Improve this question

I recently got inspired to start up a project I've been wanting to code for a while. I want to do it in C, because memory handling is key this application. I was searching around for a good implementation of strings in C, since I know me doing it myself could lead to some messy buffer overflows, and I expect to be dealing with a fairly big amount of strings.

I found this article which gives details on each, but they each seem like they have a good amount of cons going for them (don't get me wrong, this article is EXTREMELY helpful, but it still worries me that even if I were to choose one of those, I wouldn't be using the best I can get). I also don't know how up to date the article is, hence my current plea.

What I'm looking for is something that may hold a large amount of characters, and simplifies the process of searching through the string. If it allows me to tokenize the string in any way, even better. Also, it should have some pretty good I/O performance. Printing, and formatted printing isn't quite a top priority. I know I shouldn't expect a library to do all the work for me, but was just wandering if there was a well documented string function out there that could save me some time and some work.

Any help is greatly appreciated. Thanks in advance!

EDIT: I was asked about the license I prefer. Any sort of open source license will do, but preferably GPL (v2 or v3).

EDIt2: I found betterString (bstring) library and it looks pretty good. Good documentation, small yet versatile amount of functions, and easy to mix with c strings. Anyone have any good or bad stories about it? The only downside I've read about it is that it lacks Unicode (again, read about this, haven't seen it face to face just yet), but everything else seems pretty good.

EDIT3: Also, preferable that its pure C.


It's an old question, I hope you have already found a useful one. In case you didn't, please check out the Simple Dynamic String library on github. I copy&paste the author's description here:

SDS is a string library for C designed to augment the limited libc string handling functionalities by adding heap allocated strings that are:

  • Simpler to use.
  • Binary safe.
  • Computationally more efficient.
  • But yet... Compatible with normal C string functions.

This is achieved using an alternative design in which instead of using a C structure to represent a string, we use a binary prefix that is stored before the actual pointer to the string that is returned by SDS to the user.

+--------+-------------------------------+-----------+
| Header | Binary safe C alike string... | Null term |
+--------+-------------------------------+-----------+
         |
         `-> Pointer returned to the user.

Because of meta data stored before the actual returned pointer as a prefix, and because of every SDS string implicitly adding a null term at the end of the string regardless of the actual content of the string, SDS strings work well together with C strings and the user is free to use them interchangeably with real-only functions that access the string in read-only.


I would suggest not using any library aside from malloc, free, strlen, memcpy, and snprintf. These functions give you all of the tools for powerful, safe, and efficient string processing in C. Just stay away from strcpy, strcat, strncpy, and strncat, all of which tend to lead to inefficiency and exploitable bugs.

Since you mentioned searching, whatever choice of library you make, strchr and strstr are almost certainly going to be what you want to use. strspn and strcspn can also be useful.


If you really want to get it right from the beginning, you should look at ICU, i.e. Unicode support, unless you are sure your strings will never hold anything but plain ASCII-7... Searching, regular expressions, tokenization is all in there.

Of course, going C++ would make things much easier, but even then my recommendation of ICU would stand.


Please check milkstrings.
Sample code :

int main(int argc, char * argv[]) {
  tXt s = "123,456,789" ;
  s = txtReplace(s,"123","321") ; // replace 123 by 321
  int num = atoi(txtEat(&s,',')) ; // pick the first number
  printf("num = %d s = %s \n",num,s) ;
  s = txtPrintf("%s,%d",s,num) ; // printf in new string
  printf("num = %d s = %s \n",num,s) ;
  s = txtConcat(s,"<-->",txtFlip(s),NULL) ; // concatenate some strings
  num = txtPos(s,"987") ; // find position of substring
  printf("num = %d s = %s \n",num,s) ;
  if (txtAnyError()) { //check for errors
    printf("%s\n",txtLastError()) ;
    return 1 ; }
  return 0 ;
  }


I also found a need for an external C string library, as I find the <string.h> functions very inefficient, for example:

  • strcat() can be very expensive in performance, as it has to find the '\0' char each time you concatenate a string
  • strlen() is expensive, as again, it has to find the '\0' char instead of just reading a maintained length variable
  • The char array is of course not dynamic and can cause very dangerous bugs (a crash on segmentation fault can be the good scenario when you overflow your buffer)

The solution should be a library that does not contain only functions, but also contains a struct that wraps the string and that enables to store important fields such as length and buffer-size

I looked for such libraries over the web and found the following:

  1. GLib String library (should be best standard solution) - https://developer.gnome.org/glib/stable/glib-Strings.html
  2. http://locklessinc.com/articles/dynamic_cstrings/
  3. http://bstring.sourceforge.net/

Enjoy


I faced this problem recently, the need for appending a string with millions of characters. I ended up doing my own.

It is simply a C array of characters, encapsulated in a class that keeps track of array size and number of allocated bytes.

The performance compared to SDS and std::string is 10 times faster with the benchmark below

at

https://github.com/pedro-vicente/table-string

Benchmarks

For Visual Studio 2015, x86 debug build:

| API                   | Seconds           
| ----------------------|----| 
| SDS                   | 19 |  
| std::string           | 11 |  
| std::string (reserve) | 9  |  
| table_str_t           | 1  |  

clock_gettime_t timer;
const size_t nbr = 1000 * 1000 * 10;
const char* s = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";
size_t len = strlen(s);
timer.start();
table_str_t table(nbr *len);
for (size_t idx = 0; idx < nbr; ++idx)
{
  table.add(s, len);
}
timer.now("end table");
timer.stop();

EDIT Maximum performance is achieved by allocating the string all at start (constructor parameter size). If a fraction of total size is used, performance drops. Example with 100 allocations:

std::string benchmark append string of size 33, 10000000 times
end str:        11.0 seconds    11.0 total
std::string reserve benchmark append string of size 33, 10000000 times
end str reserve:        10.0 seconds    10.0 total
table string benchmark with pre-allocation of 330000000 elements
end table:      1.0 seconds     1.0 total
table string benchmark with pre-allocation of ONLY 3300000 elements, allocation is MADE 100 times...patience...
end table:      9.0 seconds     9.0 total
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜