get started with regular expression [closed]

2023-01-26 07:22 问答作者：

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

Closed 9 years ago.

Improve this question

I’m always afraid whenever I see any regular expression. I think it very difficult to understand. But fear is not the solution. I’ve d开发者_如何学JAVAecided to start learning regex, so can someone advise me how I can just start? And if there’s is any easy tutorial?

☝ Getting Started with /Regexes/

Regular expressions are a form of declarative programming. If you are used to imperative, functional, or object-oriented programming, then they are a very different way of thinking. It’s a rules-based approach with subtle backtracking issues. I daresay a background in Prolog might actually do you some good with these, which certainly isn’t something I commonly advise.

Normally I would just have people play around with the grep command from their shell, then advance to using regexes for searching and replacing in their editor.

But I’m guessing you aren’t coming from a Unix background, because if you were, you would have come across regexes all over, from the very most basic grep command to pattern-matching in the vi or emacs editors. You can look at the grep manpage by typing

% man grep

on your BSD, Linux, Apple, or Sun systems — just to name a few.

☹ ¡ʇɟoƨoɹɔᴉƜ ʇnoqɐ əɯ ʞƨɐ ʇ ̦uop əƨɐəld ʇƨnɾ ☹

☟ (?: Book Learnin’? )

If you ran into regular expresions at school or university, it was probably in the context of automata theory. They come up when discussing regular languages. If you have suffered through such classes, you may remember that regular expressions are the user-friendly face to messy finite automata. What they probably did not teach you, however, is that outside of the ivory tower, the regular expressions people actually use to in the real world are far, far behind "regular" in the rarefied, theoretical, and highly irregular sense of that otherwise commonplace word. This means that the modern regular expressions — call them patterns if you prefer — can do much more than the traditional regular expressions taught in computer science classes. There just isn’t any REGULAR left in modern regular expressions outside the classroom, but this is a good thing.

I say “modern”, but in fact regular expressions haven’t been regular since Ken Thompson first put back references into his backtracking NFA, back when he was famousluy proving NFA–DFA equivalence. So unless you actually are using a DFA engine, it might be best to just forget any book-learnin’ nonsense about REGULARness of regexes. It just doesn’t apply to the way we really use them every day in the real world.

Modern regular expressions allow for much more than just back references though, as you will find once you delve into them. They’re their own wonderful world, even if that world is a bit surreal at times. They can let you substitute for pages and pages of code in just one line. They can also make you lose hair over their crazy behavior. Sometimes they make your computer seem like it’s hung, because it’s actually working very hard in a race between it and the heat-death of the universe in some awful O(2ⁿ) algorithm, or even worse. It can easily be much worse, actually. That’s what having this sort of power in your hands can do. There are no training wheel or slow lane. Regexes are a power tool par excellence.

/☕✷⅋⋙$⚣™‹ª∞¶⌘̤℈⁑‽#♬˘$π❧/

⁠ ⁠ ⁠

Just one more thing before I give you a big list of helpful references. As I’ve already said today elsewhere, regexes do not have to be ugly, and they do not have to be hard. REMEMBER: If you create ugly regexes, it is only a reflection on you, not on them.

That’s absolutely no excuse for creating regexes that are hard to read. Oh, there’s plenty like that out there all right, but they shouldn’t be and they needn’t be. Even though regexes are (for the most part( a form of declarative programming, all the software engineering techniques that one uses in other forms of programming ̲s̲t̲i̲l̲l̲ ̲a̲p̲p̲l̲y̲ ̲h̲e̲r̲e̲!

A regex should never look like a dense row of punctuation that’s impossible to decipher. Any language would be a disaster if you removed all the alphabetical identifiers, removed all whitespace and indentation, removed all comments, and removed every last trace of top-down programming. So of course they look like cr@p if you do that. Don’t do that!

So use all of those basic tools, including aesthetically pleasing code layout, careful problem decomposition, named subroutines, decoupling the declaration from the execution (including ordering!), unit testing, plus all the rest, whenever you’re creating regexes. These are all critical steps in making your patterns maintainable.

It’s one thing to write /(.)\1/, but quite another to write something like mǁ☕⅋⚣⁑™∞¶⌘℈‽#♬❧ǁ. Those are regexes from the Dark Ages: don’t just reject them: burn them at the stake! It’s programming, after all, not line-noise or golf!

☞ Regex References

The Wikipedia page on regular expressions is a decent enough overview.
IBM has a nice introduction to regexes in their Speaking Unix series.
Russ Cox has a very nice list of classic regular expressions references. You might want to check out the original Version 8 regular expressions, here found in a Perl manpage, but these were the original, most basic patterns that everybody grew up with back in olden days.
Mastering Regular Expressions from O’Reilly, by Jeffrey Friedl.
Jan Goyvaerts’s regular-expressions.info site and his Regular Expression Cookbook, also from O’Reilly.
I’m a native speaker of Perl, so let me say four words about it. Chapter 5 of the Perl Cookbook and Chapter 6 of Programming Perl, both somewhat embarrassingly by yours truly et alios, also from O’Reilly, are devoted to regular expressions in Perl. Perl was the language that originated most regex features found in modern regular expressions, and it continues to lead the pack. Perl’s Unicode support for regexes is especially rich and remarkably simple to use — in comparison with other languages’. You can download all the code examples from those two books from the O’Reilly site, or see the next item. The perldoc.org site has quite a bit on pattern matching, including the perlre and perluniprops manpages, just to take a couple of starting points.
Apropos the Perl Cookbook, the PLEAC project has reïmplemented the Perl Cookbook code in a dizzying number of diverse languages, including ada, common lisp, groovy, guile, haskell, java, merd, ocaml, php, pike, python, rexx, ruby, and tcl. If you look at what each language does for their equivalent of PCB’s regex chapter, you will learn a tremendously huge amount about how that language deals with regular expressions. It’s a marvellous resource and quite an eye-opener, even if some up the solutions are, um, supoptimal.
Java Regular Expressions by Mehran Habibi from Apress. It’s certainly better than trying to figure anything out by reading Sun’s documentation on the Pattern class. Java is probably the worst possible language for learning regexes in; it is very clumsy and often completely stupid. I speak from painful personal experience, not from ignorance, and I am hardly alone in this appraisal. If you have to use a JVM language, I recommend Groovy or perhaps Scala. Unfortunately, both are based on the standard Java pattern matching classes, so share their inadequacies.
If you need Unicode and you’re using Java or C⁺⁺ instead of Perl, then I recommend looking into the ICU library. They handle Unicode in Java much better than Sun does, but it still feels too much like assembler for my tastes. Perl and Java appear to have the best support for Unicode and multiple encodings. Java is still kinda warty, but other languages often have this even worse. Be warned that languages with regexes bolted on the site are always clumsier to use them in than those that don’t.
If you’re using C, then I would probably skip over the system-supplied regex library and jump right into PCRE by Phil Hazel. A bonus is that PCRE can be built to handle Unicode reasonably well. It is also the basic regex library used by several other languages and tools, including PHP.

regular-expressions.info is a gold-mine of information and tutorials about regular expressions. From beginner to expert, there's not much out there that is better than this site when it comes to the study of regular expressions.

regular-expressions.info has a good tutorial here

http://www.regular-expressions.info/tutorial.html

Regular expressions in itself might not achieve any utility, unless combined in with either a text manipulation operations using some kind of scripting tool(sed/awk) or a programming language like Perl or so. Try to install Regex Buddy. Nice standalone tool which can let you use regular expressions, on some files you may point it to.

So yes you can learn about some basic info mentioning their structure, syntax, semantics, if I may call so, but try to read the regular expressions tutorials in - Perl, Vim,... and do some example string/text manipulation in those contexts, programatically

-AD.

While learning at: regular-expressions.info, the Regular Expressions Cheat Sheet (V2) is something you definitely want to have.

http://www.gskinner.com/RegExr/ exists both as an online version and as an AIR application.

The cool thing about this app (besides that it work like a charm) is that you can save your expressions or share them with the community right from the app.

Say you need an e-mail regex you can just search for e-mail and you will get back a rated list of expressions.

Another helpful feature is the interpretation of your expressions into human readable form. This makes it easier to learn and master.

For the tutorial part this article is very easy consume.

This book saved my ass when I was starting out with awk and sed.

继续阅读：regex

get started with regular expression [closed]

☝ Getting Started with /Regexes/

☟ (?: Book Learnin’? )

/☕✷⅋⋙$⚣™‹ª∞¶⌘̤℈⁑‽#♬˘$π❧/

☞ Regex References

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

☝ Getting Started with /Regexes/

☟ (?: Book Learnin’? )

/☕✷⅋⋙$⚣™‹ª∞¶⌘̤℈⁑‽#♬˘$π❧/

☞ Regex References

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？