get started with regular expression [closed]
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this questionI’m always afraid whenever I see any regular expression. I think it very difficult to understand. But fear is not the solution. I’ve d开发者_如何学JAVAecided to start learning regex, so can someone advise me how I can just start? And if there’s is any easy tutorial?
☝ Getting Started with /Regexes/
Regular expressions are a form of declarative programming. If you are used to imperative, functional, or object-oriented programming, then they are a very different way of thinking. It’s a rules-based approach with subtle backtracking issues. I daresay a background in Prolog might actually do you some good with these, which certainly isn’t something I commonly advise.
Normally I would just have people play around with the grep
command from their shell, then advance to using regexes for searching and replacing in their editor.
But I’m guessing you aren’t coming from a Unix background, because if you were, you would have come across regexes all over, from the very most basic grep
command to pattern-matching in the vi
or emacs
editors. You can look at the grep
manpage by typing
% man grep
on your BSD, Linux, Apple, or Sun systems — just to name a few.
☹ ¡ʇɟoƨoɹɔᴉƜ ʇnoqɐ əɯ ʞƨɐ ʇ ̦uop əƨɐəld ʇƨnɾ ☹
☟ (?: Book Learnin’? )
If you ran into regular expresions at school or university, it was probably in the context of automata theory. They come up when discussing regular languages. If you have suffered through such classes, you may remember that regular expressions are the user-friendly face to messy finite automata. What they probably did not teach you, however, is that outside of the ivory tower, the regular expressions people actually use to in the real world are far, far behind "regular" in the rarefied, theoretical, and highly irregular sense of that otherwise commonplace word. This means that the modern regular expressions — call them patterns if you prefer — can do much more than the traditional regular expressions taught in computer science classes. There just isn’t any REGULAR left in modern regular expressions outside the classroom, but this is a good thing.
I say “modern”, but in fact regular expressions haven’t been regular since Ken Thompson first put back references into his backtracking NFA, back when he was famousluy proving NFA–DFA equivalence. So unless you actually are using a DFA engine, it might be best to just forget any book-learnin’ nonsense about REGULARness of regexes. It just doesn’t apply to the way we really use them every day in the real world.
Modern regular expressions allow for much more than just back references though, as you will find once you delve into them. They’re their own wonderful world, even if that world is a bit surreal at times. They can let you substitute for pages and pages of code in just one line. They can also make you lose hair over their crazy behavior. Sometimes they make your computer seem like it’s hung, because it’s actually working very hard in a race between it and the heat-death of the universe in some awful O(2ⁿ) algorithm, or even worse. It can easily be much worse, actually. That’s what having this sort of power in your hands can do. There are no training wheel or slow lane. Regexes are a power tool par excellence.
/☕✷⅋⋙$⚣™‹ª∞¶⌘̤℈⁑‽#♬˘$π❧/
Just one more thing before I give you a big list of helpful references. As I’ve already said today elsewhere, regexes do not have to be ugly, and they do not have to be hard. REMEMBER: If you create ugly regexes, it is only a reflection on you, not on them.
That’s absolutely no excuse for creating regexes that are hard to read. Oh, there’s plenty like that out there all right, but they shouldn’t be and they needn’t be. Even though regexes are (for the most part( a form of declarative programming, all the software engineering techniques that one uses in other forms of programming ̲s̲t̲i̲l̲l̲ ̲a̲p̲p̲l̲y̲ ̲h̲e̲r̲e̲!
A regex should never look like a dense row of punctuation that’s impossible to decipher. Any language would be a disaster if you removed all the alphabetical identifiers, removed all whitespace and indentation, removed all comments, and removed every last trace of top-down programming. So of course they look like cr@p if you do that. Don’t do that!
So use all of those basic tools, including aesthetically pleasing code layout, careful problem decomposition, named subroutines, decoupling the declaration from the execution (including ordering!), unit testing, plus all the rest, whenever you’re creating regexes. These are all critical steps in making your patterns maintainable.
It’s one thing to write /(.)\1/
, but quite another to write something like mǁ☕⅋⚣⁑™∞¶⌘℈‽#♬❧ǁ
. Those are regexes from the Dark Ages: don’t just reject them: burn them at the stake! It’s programming, after all, not line-noise or golf!
☞ Regex References
The Wikipedia page on regular expressions is a decent enough overview.
IBM has a nice introduction to regexes in their Speaking Unix series.
Russ Cox has a very nice list of classic regular expressions references. You might want to check out the original Version 8 regular expressions, here found in a Perl manpage, but these were the original, most basic patterns that everybody grew up with back in olden days.
Mastering Regular Expressions from O’Reilly, by Jeffrey Friedl.
Jan Goyvaerts’s regular-expressions.info site and his Regular Expression Cookbook, also from O’Reilly.
I’m a native speaker of Perl, so let me say four words about it. Chapter 5 of the Perl Cookbook and Chapter 6 of Programming Perl, both somewhat embarrassingly by yours truly et alios, also from O’Reilly, are devoted to regular expressions in Perl. Perl was the language that originated most regex features found in modern regular expressions, and it continues to lead the pack. Perl’s Unicode support for regexes is especially rich and remarkably simple to use — in comparison with other languages’. You can download all the code examples from those two books from the O’Reilly site, or see the next item. The perldoc.org site has quite a bit on pattern matching, including the perlre and perluniprops manpages, just to take a couple of starting points.
Apropos the Perl Cookbook, the PLEAC project has reïmplemented the Perl Cookbook code in a dizzying number of diverse languages, including ada, common lisp, groovy, guile, haskell, java, merd, ocaml, php, pike, python, rexx, ruby, and tcl. If you look at what each language does for their equivalent of PCB’s regex chapter, you will learn a tremendously huge amount about how that language deals with regular expressions. It’s a marvellous resource and quite an eye-opener, even if some up the solutions are, um, supoptimal.
Java Regular Expressions by Mehran Habibi from Apress. It’s certainly better than trying to figure anything out by reading Sun’s documentation on the Pattern class. Java is probably the worst possible language for learning regexes in; it is very clumsy and often completely stupid. I speak from painful personal experience, not from ignorance, and I am hardly alone in this appraisal. If you have to use a JVM language, I recommend Groovy or perhaps Scala. Unfortunately, both are based on the standard Java pattern matching classes, so share their inadequacies.
If you need Unicode and you’re using Java or C⁺⁺ instead of Perl, then I recommend looking into the ICU library. They handle Unicode in Java much better than Sun does, but it still feels too much like assembler for my tastes. Perl and Java appear to have the best support for Unicode and multiple encodings. Java is still kinda warty, but other languages often have this even worse. Be warned that languages with regexes bolted on the site are always clumsier to use them in than those that don’t.
If you’re using C, then I would probably skip over the system-supplied regex library and jump right into PCRE by Phil Hazel. A bonus is that PCRE can be built to handle Unicode reasonably well. It is also the basic regex library used by several other languages and tools, including PHP.
regular-expressions.info is a gold-mine of information and tutorials about regular expressions. From beginner to expert, there's not much out there that is better than this site when it comes to the study of regular expressions.
regular-expressions.info has a good tutorial here
http://www.regular-expressions.info/tutorial.html
Regular expressions in itself might not achieve any utility, unless combined in with either a text manipulation operations using some kind of scripting tool(sed/awk) or a programming language like Perl or so. Try to install Regex Buddy. Nice standalone tool which can let you use regular expressions, on some files you may point it to.
So yes you can learn about some basic info mentioning their structure, syntax, semantics, if I may call so, but try to read the regular expressions tutorials in - Perl, Vim,... and do some example string/text manipulation in those contexts, programatically
-AD.
While learning at: regular-expressions.info, the Regular Expressions Cheat Sheet (V2) is something you definitely want to have.
http://www.gskinner.com/RegExr/ exists both as an online version and as an AIR application.
The cool thing about this app (besides that it work like a charm) is that you can save your expressions or share them with the community right from the app.
Say you need an e-mail regex you can just search for e-mail and you will get back a rated list of expressions.
Another helpful feature is the interpretation of your expressions into human readable form. This makes it easier to learn and master.
For the tutorial part this article is very easy consume.
This book saved my ass when I was starting out with awk and sed.
精彩评论