Regex match all lines that don't end with ,0 and ,1
I have a malformed C开发者_运维技巧SV file which has two columns: Text,Value
The value is either 1 or 0, but some lines are malformed and span two lines:
1. "This line is fine, but there are some that are not like this",0
2. "Another good line",1
4. "Oh, I'm so bad!!
5. I spanned two lines!",0
6. "Why did you break me? FileHelpers can't read two lines!!",1
Line 4 and 5 are supposed to be one line, but the CSV file I got is broken and they span two lines, this causes the FileHelpers engine to fail while reading the csv file.
I have two CSV files with about 3000 lines each and I will only need to fix them once. I want to use notepad++ to find all the lines that are not ending in ,0 or ,1, what kind of regex can I use for that? Or maybe to regular expressions, one for the ,0 case the other one for the ,1 case.
Update:
Dan's answer works without the comma [^01]$ instead of ,[^01]$, but it only matches lines that are not ending with 0 or 1... it works sufficiently well in my case, but it does skip lines that are broken and actually end with 0 or 1.I don't know how the other answer would work:
Something like the below is what I would use in Notepad++
[^,][^01]$
Here are the steps I did:
Use ([^,][^01])$
to match the lines and replaced with \1{marked}
Then switched to extended mode and replaced {marked}\r\n
with `` ( empty ) to get a single line.
Screenshots below:
The expression you would use is
([^,].|,[^01])$
But unfortunately, notepad++ does not support alternation (the |
operator). [1]
You can match the broken lines with these two expressions then:
[^,].$
,[^01]$
Except, of course, if the "Text" part does end in ,0
or ,1
itself. :-)
[1] http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Unsupported_Regex_Operators
,[^01]$
Make sure regex mode is on.
General considerations
In general, to match a line that does not end with a specific pattern, you may use
^(?!.*pattern$).*$
where ^
matches the start of a line, (?!.*pattern$)
is a negative lookahead that fails the match if there are 0 or more chars other than line break chars, as few as possible (.*
) followed with pattern
at the end of the line ($
), and the .*$
actually matches the line.
To remove a line that does not end with some pattern together with a line break at the end, use
^(?!.*pattern$).*\R?
where \R?
is an optional line break sequence.
In case of several fixed strings, you may use
^(?!.*(?:pattern|pattern2|patternN)$).*\R?
If there is just one or two fixed strings to check at the end of the line, you may use a bit quicker regex like
^.*$(?<!a)(?<!bcd)
that will match any line not ending with a
and bcd
.
^.*$(?<!1)(?<!0)
Current problem solution
So, for the current issue, to match a line not ending with 1
or 0
, you may use
^(?!.*[01]$).*$ # without the line break
^(?!.*[01]$).*$\R? # with the line break
Or,
^.*(?<![01])$ # without the line break
^.*(?<![01])$\R? # with the line break
To remove/replace a line break on a line that does not end with a specific pattern you may use
(?<![01])$\R?
Replace with either an empty string (to remove the line break) or with any other delimiter string or character.
精彩评论