开发者

Finding line beginning using regular expression

Finding Line Beginning using Regular expression in Notepad++

I want to strip a 4000-line HTML file from all the jQuery "done" attributes in a div.

<DIV class=menu done27="1" done26="0"
done9="1" done8="0" done7="1"
done6="0" done4="20">

should be replaced with:

<DIV class=menu>

In this experiment I can do it with this regular expression:

[ ^]done[0-9]+="开发者_JAVA技巧;[0-9]+"

Using Notepad++ 5.6.8 Unicode, with a file encoded in ANSI, I'm putting this regex in the "Find what" field. It only replaces the 5 occurrences starting with a space, it will miss the 2 occurrences starting at the beginning of a line.

How can I construct a regex to remove all the attributes of an HTML element starting with a keyword?


Extended Replace "\n" with "LINEBREAK "

Thanks a lot to all for these timely replies. Following your advices, here's what I did:

  • "Notepad++ > View > Show Symbol > Show End Of Line" shows "CR+LF" at each line end.
  • "Notepad++ > Search > Find", "Search mode" = "Normal", made sure that "Find what" = "LINEBREAK" finds nothing
  • "Search mode" = "Extended", "Find what" = "\n\r" only finds the double-breaks (CR + LF + a blank line); "\n \r" find nothing; yet "\n" does find exactly all line breaks, and only them.
  • Saving my "Towncar.htm" test file as "Towncar_02.htm" (also encoded in ANSI)
  • Under "Extended", replaced all "\n" with "LINEBREAK " (notice the trailing space)
  • Under "Regular expression", replaced each occurrence of:

     done[0-9]*="[0-9]*"
    

(Be careful to check there is THE HEADING SPACE before "done"
and there is NO TRAILING SPACE! see below)

with an empty string

  • Under "Extended", replaced each occurrence of "LINEBREAK" with "\n" (no trailing space this time after "LINEBREAK"!)
  • Checked that the resulting "Towncar.htm" file (after a few cosmetic reformatting) looked OK and pretty, and that after refresh, it still rendered the same as the "Towncar_02.htm" backup.

Recalls and Notes:

  • This forum apparently works well in Chrome 4; but with some browsers (e.g. IE6 and other discontinued ones), under some circumstances, it causes some artifacts; so, be careful:
  • even if the forum doesn't show it in your browser, there is a heading space, i.e. at the beginning of the Regex (the " done..." Regular expression above) and inside it, so to replace only strings starting with " done", with the starting space, thus making even surer to NOT alter eventual other strings with "undone" or "methadone" or else
  • same way, even if the forum shows one in your browser, there is no trailing space at the end of the Regex!
  • in the Regex, [0-9] matches 1 and only 1 occurrence of any decimal digit (characters in the 0-9 range); IOW it matches « 0 » or « 1 » or « 9 » etc, but NOT « 01 » or « 835 » or « » (the empty string) or whichever.
  • * (asterisk) matches 0 or more times the previous character (here it matches the empty string or any string made exclusively of digits)
  • samewise, + (plus sign) matches 1 or more times the previous character (here it matches any string, at least 1 character long, made exclusively of digits)
    Ref: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions#Notepad.2B.2B_regex_syntax


I like Notepad++ too but the regexing is really a pain. If you insist on using Notepad++ try this:

  • First find out which newline characters are being used in your document (View>Show Symbol>Show End Of Line)
  • Delete those line-breaks by replacing them with a single space (Search and replace. CR is \r LF is \n. Be sure to tick "Extended" search mode)
  • Regex-replace done[0-9][0-9]*=\"[0-9][0-9]*\" with the empty string (be sure to put a single space before the regex expression)

Voila! Not very nice n clean but it works ;o)

After that if you want it human-readable again you could use the HTMLTidy functions


You almost had it! Unfortunately, the complete solution in Notepad++ would have to be a 3 step process.

  1. Regex search/replace with the following search: \<done[0-9]+="[0-9]+"[ ]* Of course, leave the replace field empty, so that it will simply delete everything that matches. (In Notepad++ understanding of regular expressions \< represents the "beginning of a word".)

  2. Select the portion of text affected by your previous search/replace. You don't want to select the entirety of your document, because we're going to...

  3. Strip newlines. Hit Ctrl-F to bring up the Search/Replace dialog again and this time select "Extended" search mode, instead of "Regular expression". Depending on the format of your document you are going to want to search for either \n or \r\n. The replacement field should, again, be empty. Also, make sure that the "In Selection" checkbox is checked.

Click "Replace All" and you're done!


A simple way is:

  1. Goto "Search" and "Replace"
  2. Input "\n" in "Find what"
  3. Input your string in "Replace with"
  4. Select "Extended" in "Search Mode"
  5. Click "Replace All"

It will plug your string at the beginning of each line except the first line.


I'm afraid, Notepad++ Regex cannot do that

Notepad++ using Scintilla regex engine, its per line based, so multiline search / replace cannot be done.

Note that \r and \n are never matched because in Scintilla, regular expression searches are made line per line (stripped of end-of-line chars).

Quoted from http://www.scintilla.org/SciTERegEx.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜