Regex: match whole line except first string and #comment lines
I tried (\s|\t).*[\b\w*\s\b]
, this one is almost ok but I want also except lines with #.
#Name Type Allowable values
#==================开发者_C百科======== ========= ========================================
_absolute-path-base-uri String -
add-xml-decl Boolean y/n, yes/no, t/f, true/false, 1/0
As @anubhava said in his answer, it looks you just need to check for #
at the beginning of the line. The regex for that is simple, but the mechanics of applying the regex varies wildly, so it would help if we knew which regex flavor/tool you're using (e.g. PHP, .NET, Notepad++, EditPad Pro, etc.). Here's a JavaScript version:
/^[^#].*$/mg
Notice the modifiers: m
("multiline") allows ^
and $
to match at line boundaries, and g
("global") allows you to find all the matches, not just the first one.
Now let's look at your regex. [\b\w*\s\b]
is a character class that matches a word character (\w
), a whitespace character (\s
), an asterisk (*
), or a backspace (\b
). In other words, both *
and \b
lose their special meanings when the appear in a character class.
\s
matches any whitespace character including \t
, so (\s|\t)
is needlessly redundant, and may not be needed at all. What it's actually doing in your case is matching the newline before each matched line. There's no need for that when you can use ^
in multiline mode. If you want to allow for horizontal whitespace (i.e., spaces and tabs) before the #
, you can do this:
/^(?![ \t]*#).*$/mg
(?![ \t]*#)
is a negative lookahead; it means "from this position, it is impossible to match zero or more tabs or spaces followed by #
". Coming right after the ^
line anchor as it does, "this position" means the beginning of a line.
Try this:
^[A-z0-9_-]+\s+(.+)$
Assuming your first string will consist of only letters, numbers, underscores or hyphens, the first part will match that. Then we match whitespace, and then capture the rest. However, this is all dependent on the regular expression engine being used. Is this using language support for regexes, a specific editor, or a certain library? Which one? There isn't a standard: each regex engine works slightly differently.
Try this:
^[^#].*?(\s|\t)(?<Group>.*)$
After a match is found, the Group
group will contain your string.
I would use this regex. In English, this says "First character is not a pound sign (#), then non-white space to match the first 'word', then white space, then match the whole line.
^[^#]\S*\s+(.+)$
Can I suggest another approach though? It looks like there are tabs between each field in the text, so why not just read the text line-by-line and split by tab into an array?
Here is an example in C# (untested):
using(StreamReader sr = new StreamReader("C:\\Path\\to\\file.txt"))
{
string line = sr.ReadLine();
while(!sr.EndOfStream)
{
//skip the comment lines
if(line.StartsWith("#"))
continue;
string[] fields = line.Split(new string[] {"\t"}, StringSplitOptions.RemoveEmptyEntries);
//now fields[0] contains the Name field
//fields[1] contains the Type field
//fields[2] contains the Allowable Values field
line = sr.ReadLine();
}
}
Try this code in php:
<?php
$s="#Name Type Allowable values
#========================== ========= ========================================
_absolute-path-base-uri String -
add-xml-decl Boolean y/n, yes/no, t/f, true/false, 1/0 ";
$a = explode("\n", $s);
foreach($a as $str) {
preg_match('~^[^#].*$~', $str, $m);
var_dump($m);
}
?>
OUTPUT
array(0) {
}
array(0) {
}
array(1) {
[0]=>
string(79) "_absolute-path-base-uri String - "
}
array(1) {
[0]=>
string(77) "add-xml-decl Boolean y/n, yes/no, t/f, true/false, 1/0 "
}
Code is pretty simple, it just ignores matching #
at the start of a line thus ingoring those lines completely.
精彩评论