开发者

Please explain some Javascript Regular Expressions

I'm learning Javascript via an online tutorial, but nowhere on that website or any other I googled for was the jumble of symbols explained that makes up a r开发者_开发百科egular expression.

Check if all numbers: /^[0-9]+$/

Check if all letters: /^[a-zA-Z]+$/

And the hardest one:

Validate Email: /^[\w-.+]+\@[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/

What do all the slashes and dollar signs and brackets mean? Please explain.

(By the way, what languages are required to create a flexible website? I know a bit of Javascript and wanna learn jQuery and PHP. Anything else needed?)

Thanks.


There are already a number of good sites that explain regular expressions so I'll just dive a bit into how each of the specific examples you gave translate.

Check if all numbers: ^ anchors the start of the expression (e.g. start at the beginning of the text). Without it a match could be found anywhere. [0-9] finds the characters in that character class (e.g. the numbers 0-9). The + after the character class just means "one or more". The ending $ anchors the end of the text (e.g. the match should run to the end of the input). So if you put that together, that regular expression would allow for only 1 or more numbers in a string. Note that the anchors are important as without them it might match something like "foo123bar".

Check if all letters: Pretty much the same as above but the character classes are different. In this example the character class [a-zA-Z] represents all lowercase and uppercase characters.

The last one actually isn't any more difficult than the other two it's just longer. This answer is getting quite long so I'll just explain the new symbols. A \w in a character class will match word characters (which are defined per regex implementation but are generally 0-9a-zA-Z_ at least). The backslash before the @ escapes the @ so that it isn't seen as a token in the regex. A period will match any character so .+ will match one or more of any character (e.g. a, 1, Z, 1a, etc). The last part of the regex ({2,4}) defines an interval expression. This means that it can match a minimum of 2 of the thing that precedes it, and a maximum of 4.

Hope you got something out of the above.


There is an awesome explanation of regular expressions at http://www.regular-expressions.info/ including notes on language and implementation specifics.


Let me explain:

Check if all numbers: /^[0-9]+$/

So, first thing we see is the "/" at the beginning and the end. This is a deliminator, and only serves to show the beginning and end of the regular expression.

Next, we have a "^", this means the beginning of the string. [0-9] means a number from 0-9. + is a modifier, which modifies the term in front of it, in this case, it means you can have one or more of something, so you can have one or more numbers from 0-9.

Finally, we end with "$", which is the opposite of "^", and means the end of the string. So put that all together and it basically makes sure that inbetween the start and end of the string, there can be any number of digits from 0-9.

Check if all letters: /^[a-zA-Z]+$/

We notice this is very similar, but instead of checking for numbers 0-9, it checks for letters a-z (lowercase) and A-Z (uppercase).

And the hardest one:

Validate Email: /^[\w-.+]+\@[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/

"\w" means that it is a word, in this case we can have any number of letters or numbers, as well as the period means that it can be pretty much any character.

The new thing here is escape characters. Many symbols cannot be used without escaping them by placing a slash in front, as is the case with "\@". This means it is looking directly for the symbol "@".

Now it looks for letters and symbols, a period (this one seems incorrect, it should be escaping the period too, though it will still work, since an unescaped period will make any symbol). Numbers inside {} mean that there is inbetween this many terms in the previous term, so of the [a-zA-Z0-9], there should be 2-4 characters (this part here is the website domain, such as .com, .ca, or .info). Note there's another error in this one here, the [a-zA-z0-9] should be [a-zA-Z0-9] (capital Z).

Oh, and check out that site listed above, it is a great set of tutorials too.


Regular Expressions is a complex beast and, as already pointed out, there are quite a few guides off of google you can go read.

To answer the OP questions:

Check if all numbers: /^[0-9]+$/

regexps here are all delimated with //, much like strings are quoted with '' or "".

^ means start of string or line (depending on what options you have about multiline matching)

[...] are called character classes. Anything in [] is a list of single matching characters at that position in this case 0-9. The minus sign has a special meaning of "sequence of characters between". So [0-9] means "one of 0123456789".

+ means "1 or more" of the preceeding match (in this case [0-9]) so one or more numbers

$ means end of string/line match.

So in summary find any string that contains only numbers, i.e '0123a' will not match as [0-9]+ fails to match a before $).

Check if all letters: /^[a-zA-Z]+$/

Hopefully [A-Za-z] makes sense now (A-Z = ABCDEF...XYZ and a-z abcdef...xyz)

Validate Email: /^[\w-.+]+\@[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/

Not all regexp parses know the \w sequence. Javascript, java and perl I know do support it.

I have already have covered '/^ at the beginning, for this [] match we are looking for \w - . and +. I think that regexp is incorrect. Either the minus sign should be escaped with \ or it should be at the end of the [] (i.e [\w+.-]). But that is an aside they are basically attempting to allow anything of abcdefghijklmnopqrstuvwxyz01234567890-.+ so fred.smith-foo+wee@mymail.com will match but fred.smith%foo+wee@mymail.com wont (the % is not matched by [\w.+-]).

\@ is the litteral atsil sign (it is escaped as perl expands @ an array variable reference)

[a-zA-Z0-9.-]+ is the same as [\w.-]+. Very much like the user part of the match, but does not match +. So this matches foo.com. and google.co. but not my+foo.com or my***domain.co.

. means match any one character. This again is incorrect as fred@foo%com will match as . matches %*^%$£! etc. This should of been written as \.

The last character class [a-zA-z0-9]{2,4} looks for between 2 3 or 4 of the a-zA-Z0-9 specified in the character class (much like + looks for "1 more more" {2,4} means at least 2 with a maximum of 4 of the preceeding match. So 'foo' matches, '11' matches, '11111' does not match and 'information' does not.

The "tweaked" regexp should be:

/^[\w.+-]+\@[a-zA-Z0-9.-]+\.[a-zA-z0-9]{2,4}$/


I'm not doing a tutorial on RegEx's, that's been done really well already, but here are what your expressions mean.

/^<something>$/ String begins, has something in the middle, and then immediately ends.

  • /^foo$/.test('foo'); // true
  • /^foo$/.test('fool'); // false
  • /^foo$/.test('afoo'); // false

+ One or more of something:

  • /a+/.test('cot');//false
  • /a+/.test('cat');//true
  • /a+/.test('caaaaaaaaaaaat');//true

[<something>] Include any characters found between the brackets. (includes ranges like 0-9, a-z, and A-Z, as well as special codes like \w for 0-9a-zA-Z_-

  • /^[0-9]+/.test('f00')//false
  • /^[0-9]+/.test('000')//true

{x,y} between X and Y occurrences

  • /^[0-9]{1,2}$/.test('12');// true
  • /^[0-9]{1,2}$/.test('1');// true
  • /^[0-9]{1,2}$/.test('d');// false
  • /^[0-9]{1,2}$/.test('124');// false

So, that should cover everything, but for good measure:

/^[\w-.+]+\@[a-zA-Z0-9.-]+.[a-zA-z0-9]{2,4}$/
Begins with at least character from \w, -, +, or .. Followed by an @, followed by at least one in the set a-zA-Z0-9.- followed by one character of anything (. means anything, they meant \.), followed by 2-4 characters of a-zA-z0-9

As a side note, this regular expression to check emails is not only dated, but it is very, very, very incorrect.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜