开发者

Regex match all lines that don't end with ,0 and ,1

I have a malformed C开发者_运维技巧SV file which has two columns: Text,Value

The value is either 1 or 0, but some lines are malformed and span two lines:

1. "This line is fine, but there are some that are not like this",0
2. "Another good line",1
4. "Oh, I'm so bad!!
5. I spanned two lines!",0
6. "Why did you break me? FileHelpers can't read two lines!!",1

Line 4 and 5 are supposed to be one line, but the CSV file I got is broken and they span two lines, this causes the FileHelpers engine to fail while reading the csv file.

I have two CSV files with about 3000 lines each and I will only need to fix them once. I want to use notepad++ to find all the lines that are not ending in ,0 or ,1, what kind of regex can I use for that? Or maybe to regular expressions, one for the ,0 case the other one for the ,1 case.

Update:

Dan's answer works without the comma [^01]$ instead of ,[^01]$, but it only matches lines that are not ending with 0 or 1... it works sufficiently well in my case, but it does skip lines that are broken and actually end with 0 or 1.


I don't know how the other answer would work:

Something like the below is what I would use in Notepad++

[^,][^01]$

Here are the steps I did:

Use ([^,][^01])$ to match the lines and replaced with \1{marked}

Then switched to extended mode and replaced {marked}\r\n with `` ( empty ) to get a single line.

Screenshots below:

Regex match all lines that don't end with ,0 and ,1

Regex match all lines that don't end with ,0 and ,1


The expression you would use is

([^,].|,[^01])$

But unfortunately, notepad++ does not support alternation (the | operator). [1] You can match the broken lines with these two expressions then:

[^,].$
,[^01]$

Except, of course, if the "Text" part does end in ,0 or ,1 itself. :-)

[1] http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Unsupported_Regex_Operators


,[^01]$

Make sure regex mode is on.


General considerations

In general, to match a line that does not end with a specific pattern, you may use

^(?!.*pattern$).*$

where ^ matches the start of a line, (?!.*pattern$) is a negative lookahead that fails the match if there are 0 or more chars other than line break chars, as few as possible (.*) followed with pattern at the end of the line ($), and the .*$ actually matches the line.

To remove a line that does not end with some pattern together with a line break at the end, use

^(?!.*pattern$).*\R?

where \R? is an optional line break sequence.

In case of several fixed strings, you may use

^(?!.*(?:pattern|pattern2|patternN)$).*\R?

If there is just one or two fixed strings to check at the end of the line, you may use a bit quicker regex like

^.*$(?<!a)(?<!bcd)

that will match any line not ending with a and bcd.

^.*$(?<!1)(?<!0)

Current problem solution

So, for the current issue, to match a line not ending with 1 or 0, you may use

^(?!.*[01]$).*$    # without the line break
^(?!.*[01]$).*$\R? # with the line break

Or,

^.*(?<![01])$    # without the line break
^.*(?<![01])$\R? # with the line break

To remove/replace a line break on a line that does not end with a specific pattern you may use

(?<![01])$\R?

Replace with either an empty string (to remove the line break) or with any other delimiter string or character.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜