TCL regexp example
I want to get a word in a stri开发者_如何学JAVAng which starts with abc_ or with xyz_ by writing a regexp. Here my script:
[regexp -nocase -- {.*\s+(abc_|xyz_\S+)\s+.*} $str all necessaryStr]
So if I apply the above written regexp on str1 and str2 I want to get "xyz_hello" from $str1 and "abc_bye" from $str2.
set str1 "gfrdgasjklh dlasd =-0-489 xyz_hello sddf 89rn sf n9"
set str2 "dytfasjklh abc_bye dlasd =-0tyj-489 sddf tyj89rn sjf n9"
But my regexps does not work. And my questions are:
1) What is wrong with my regexp? 2) Is it good to find the works starting with some predefined prefixes with regexp or it is better to use string functions (string match or so)?
It is not clear in your question what consitutes a word. Are further underscores permitted? Are digits permitted? What about "words that consist of just the prefix", e.g. "abc_" or "xyz"?
Making the conservative assumptions (based on your examples) that you are expecting only letters from the English alphabet, at least one further character, and you don't care about case, you can simplify your regexp:
[regexp -nocase -- {\m(abc_|xyz_)[a-zA-Z]+} $str match]
This will set match
to the matching word. You can replace the conents of the square brackets if your definition of a word differs from my assumptions.
Your second question about whether to prefer regexp to string functions will depend upon context, and could lead into subjective territory.
Some things to consider:
- Does performance really matter? Unless you are doing the search in a tight loop, or searching very long strings, I suspect any performance difference will not be relevant. Wait until you have a performance issue, then profile your application to see where the bottleneck is, then you can test alternative implementations.
- Convenience is going to depend upon the preference of the programmer(s) who have to write and maintain the code. Do they love/hate using regexps?
- Using a regexp is likely to offer more flexibility, but it can be at the cost of readability.
My recommendation would be to use whichever you are most comfortable with. Write a good set of unit tests for your code, then optimise later only if you have identified a bottleneck there during profiling.
On the basis of what you've written, you seem to be words beginning with abc_
or xyz_
(in any case) and having just letters after that. A good first attempt at matching this is this:
regexp -nocase -- {\y(?:abc_|xyz_)[a-z]+} $str match
The special features of this are:
\y
means this only matches at word start (theoretically word end too, but we follow it by a letter in all cases!)(?:…)
is grouping without capturing- Greedy matching means we'll get all the word (assuming it just means letters from the ASCII range of UNICODE). Consider using
\w
or\S
instead of[a-z]
, but these do change the semantics of what's matched (\w
will give you about what symbols are usually allowed in program identifiers, and\S
will give you non-spaces).
I have fixed it: [regexp -nocase -- {.*\s+((abc_|xyz_)\S+)\s+.*} $str all necessaryStr ]
But still would like to know if the regexp is the best solution or string function are better (faster, convenient, flexible).
精彩评论