TCL regexp example

2023-03-16 11:28 问答作者：

I want to get a word in a stri开发者_如何学JAVAng which starts with abc_ or with xyz_ by writing a regexp. Here my script:

[regexp -nocase -- {.*\s+(abc_|xyz_\S+)\s+.*} $str all necessaryStr]

So if I apply the above written regexp on str1 and str2 I want to get "xyz_hello" from $str1 and "abc_bye" from $str2.

set str1 "gfrdgasjklh dlasd =-0-489 xyz_hello sddf 89rn sf n9"
set str2 "dytfasjklh abc_bye dlasd =-0tyj-489 sddf tyj89rn sjf n9"

But my regexps does not work. And my questions are:

1) What is wrong with my regexp? 2) Is it good to find the works starting with some predefined prefixes with regexp or it is better to use string functions (string match or so)?

It is not clear in your question what consitutes a word. Are further underscores permitted? Are digits permitted? What about "words that consist of just the prefix", e.g. "abc_" or "xyz"?

Making the conservative assumptions (based on your examples) that you are expecting only letters from the English alphabet, at least one further character, and you don't care about case, you can simplify your regexp:

[regexp -nocase -- {\m(abc_|xyz_)[a-zA-Z]+} $str match]

This will set match to the matching word. You can replace the conents of the square brackets if your definition of a word differs from my assumptions.

Your second question about whether to prefer regexp to string functions will depend upon context, and could lead into subjective territory.

Some things to consider:

Does performance really matter? Unless you are doing the search in a tight loop, or searching very long strings, I suspect any performance difference will not be relevant. Wait until you have a performance issue, then profile your application to see where the bottleneck is, then you can test alternative implementations.
Convenience is going to depend upon the preference of the programmer(s) who have to write and maintain the code. Do they love/hate using regexps?
Using a regexp is likely to offer more flexibility, but it can be at the cost of readability.

My recommendation would be to use whichever you are most comfortable with. Write a good set of unit tests for your code, then optimise later only if you have identified a bottleneck there during profiling.

On the basis of what you've written, you seem to be words beginning with abc_ or xyz_ (in any case) and having just letters after that. A good first attempt at matching this is this:

regexp -nocase -- {\y(?:abc_|xyz_)[a-z]+} $str match

The special features of this are:

\y means this only matches at word start (theoretically word end too, but we follow it by a letter in all cases!)
(?:…) is grouping without capturing
Greedy matching means we'll get all the word (assuming it just means letters from the ASCII range of UNICODE). Consider using \w or \S instead of [a-z], but these do change the semantics of what's matched (\w will give you about what symbols are usually allowed in program identifiers, and \S will give you non-spaces).

I have fixed it: [regexp -nocase -- {.*\s+((abc_|xyz_)\S+)\s+.*} $str all necessaryStr ]

But still would like to know if the regexp is the best solution or string function are better (faster, convenient, flexible).

继续阅读：regex string string-matching tcl

TCL regexp example

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？