JS regex to split by line
How do you split a long piece of text into separate lines? Why does this return line1 twice?
/^(.*?)$/mg.exec('line1\r\nline2\r\n');
["line1", "line1"]
I turned on the multi-line modifier to make ^
and $
match beginning and end of lines. I also turned on the global modifier to capture all lines.
I wish to use a regex split and not String.split
because I'll be dealing with both Linux \n
and Windows \r\n
开发者_JS百科line endings.
arrayOfLines = lineString.match(/[^\r\n]+/g);
As Tim said, it is both the entire match and capture. It appears regex.exec(string)
returns on finding the first match regardless of global modifier, wheras string.match(regex)
is honouring global.
Use
result = subject.split(/\r?\n/);
Your regex returns line1
twice because line1
is both the entire match and the contents of the first capturing group.
I am assuming following constitute newlines
- \r followed by \n
- \n followed by \r
- \n present alone
- \r present alone
Please Use
var re=/\r\n|\n\r|\n|\r/g;
arrayofLines=lineString.replace(re,"\n").split("\n");
for an array of all Lines including the empty ones.
OR
Please Use
arrayOfLines = lineString.match(/[^\r\n]+/g);
For an array of non empty Lines
Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:
var lines = text.split(/[\r\n]+/g);
With whitespace trimming:
var lines = text.trim().split(/\s*[\r\n]+\s*/g);
Unicode Compliant Line Splitting
Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):
const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)
I don't understand why the negative look-ahead part ((?!\r\n)
) is necessary, but that is what is suggested in the Unicode document
精彩评论