Regular expression to remove block enclosed in double angle brackets
I want to cut only ALLS WELL THAT ENDS WELL until Florentine. That mean cut all license off.
How to I write the expression ?
<<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM
SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS
PROVIDED BY PROJECT GUTENBERG ETEXT OF ILLINOIS BENEDICTINE COLLEGE
WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE
DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS
PERSONAL USE ONLY, AND (2) ARE NOT DISTRIBUTED OR USED
COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY
SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.>>
ALLS WELL THAT ENDS WELL
by William Shakespeare
Dramatis Personae
KING OF FRANCE
THE DUKE OF FLORENCE
BERTRAM, Count of Rousillon
LAFEU, an old lord
PAROLLES, a follower of Bertram
TWO FRENCH LORDS, serving with Bertram
STEWARD, Servant to the Countess of Rousillon
LAVACHE, a clown and Servant to the Countess of Rousillon
A PAGE, Servant to the Countess of Rousillon
COUNTESS OF ROUSILLON, mother to Bertram
HELENA, a gentlewoman protected by the Countess
A WIDOW OF FLORENCE.
DIANA, daughter to the Widow
VIOLENTA, neighbour and friend to the Widow
MARIANA, neighbour and friend to the Widow
Lords, Officers, Soldiers, etc., French and Florentine
<<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM
SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS
PROVIDED BY PROJECT GUTENBERG ETEXT OF ILLINOIS BENEDICTINE COLLEGE
WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE
DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS
PERSONAL USE ONLY,开发者_JS百科 AND (2) ARE NOT DISTRIBUTED OR USED
COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY
SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.>>
In this particular case, and by that I mean "for this particular input only", you could match it with:
>>([^<]+)<<
Make sure that whatever implementation you are using supports matching in the middle of a string, rather than just at the beginning. The parentheses represent the capture group.
If you need to get the test that is NOT in that group, you could use:
([^>]+>>)[^<]+(<<[.]+)
Now, the two capture groups have the text you want.
Edit: since you're using Java, make sure you use Matcher.find()
rather than Matcher.matches()
in the first case.
might be more flexible to just get rid of the comments enclosed in <<.*>>
In perl:
$string =~ s/<<.*?>>//g
精彩评论