RegEx in cfml to match whole word in uppercase followed by line feed
I've been struggling with this all day, as regular expressions aren’t my most favourite topic.
I’m trying to find when the following happens:
Complete word that is in uppercase Followed by a space Followed by a line feed Followed by another space Followed by another word that starts with an uppercase letter
While testing I found that if I defined what the capital letter should be (in this case S):
[A-Z][A-Z]+ \n S
It would match, however if I change it to something like
[A-Z][A-Z]+ \n [A-Z]
It now picks up any text that contains a line feed regardless if it is preceded by an uppercase word.
Am I missing something obvious?
Below is some sample text I’m using (hopefully it pastes ok without losing it's line feeds). I’m trying to find the headings (in uppercase) so that I can make some changes to them.
People who have a disability that would prevent them from performing required basic life support skills are advised that they will not be able to achieve the unit of competency. ENROLLING IN FIRST AID UNITS OF COMPETENCY If you are seeking to enro开发者_如何学运维l in a First Aid unit of competency e.g. HLTFA301B Apply first aid, you are advised that to complete the unit you must be able to perform basic life support skills, for example control bleeding and perform cardiopulmonary resuscitation (CPR). If you have a disability that would prevent you from performing required basic life support skills you are advised that you will not be able to achieve the unit of competency. REQUIREMENTS AND ADVICE FOR STUDENTS PARTICIPATING IN WORK PLACEMENT Some or all of the following advice will apply to you, depending on your course and the type of organisation where you will be undertaking work placement.
Cheers Mark
There are two primary problems. The lines have spaces and possibly other characters. You will need to at least use more than [A-Z] to search for these. You will at least need to include a space in the set [A-Z ]. If there are other characters such as numbers or some punctuation you will need to add them here as well. And as karora mentioned you will need to check for variations on the breaks.
Here is an example that also includes a positive look ahead to prevent it from coming back in the result, so you can then probably just use the match results array directly in the next step of your code.
<cfset matches = reMatch(" [A-Z ]+(?= \r?\n [A-Z])", teststring) />
<cfdump var="#matches#" />
When you are matching a line break, make sure you consider that line breaks may (or may not) have carriage-returns preceding them. Especially on text files from Windows.
So you might want something like:
"[ ][A-Z]+\r?\n[A-Z]"
Make sure you don't leave random spaces in your regex, because these will very likely be treated as literal spaces. I've enclosed the (only) space in the expression above in [ ] to make it clearer that it's part of the regex, and I've enclose the whole regex in " characters because you probably want that. The [ ] around that space should not be needed, though.
The ? following a match means "0 or more of the preceding", so in this case we want a \n optionally preceded by a \r.
精彩评论