php regex guitar tab (tabs or tablature, a type of music notation)
I am in the process of creating a guitar tab to rtttl (Ring Tone Text Transfer Language) converter in PHP. In order to prepare a guitar tab for rtttl conversion I first strip out all comments (comments noted by #- and ended with -#), I then have a few lines that set tempo, note the tunning and define multiple instruments (Tempo 120\nDefine Guitar 1\nDefine Bass 1, etc etc) which are stripped out of the tab and set aside for later use.
Now I essentially have nothing left except the guitar tabs. Each tab is prefixed with it's instrument name in conjunction with the instrument name noted prior.
Some times we have tabs for 2 separate instruments that are linked because they are to be played together, ie a Guitar and a Bass Guitar playing together.
Example 1, Standard Guitar Tab:
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
Example 2, Conjunction Tab:
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
|
|
|Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|
I have considered other methods of identifying the tabs with no solid results. I am hoping that someone who does regular expressions could help me find a way to identify a single guitar tab and if possible also be able to match a tab with multiple instruments linked together.
Once the tabs are in an array I will go through them one line at a time and convert them into rtttl lines (exploded at each new line "\n").
I do not want to separate the guitar tabs in the document via explode "\n\n" or something similar because it does not identify the guitar tab, rather, it is identifying the space between the tabs - not on the tabs themselves.
I have been messing with this for about a week now and this is the only major hold up I have. Everything else is fairly simple.
As of current, I have tried many variations of the regex pattern. Here is one of the most recent test samples:
<?php
$t = "
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0----开发者_StackOverflow社区-------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
|
|
|Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|
";
preg_match_all("/^.*?(\\|).*?(\\|)/is",$t,$p);
print_r($p);
?>
It is also worth noting that inside the tabs, where the dashes and #'s are, you may also have any variation of letters, numbers and punctuation. The beginning of each line marks the tuning of each string with one of the following case insensitive: a,a#,b,c,c#,d,d#,e,f,f#,g or g.
Thanks in advance for help with this most difficult problem.
I really like this question :-P. i had fun figuring this one out.
Here's what I got:
<?php
$t = <<<EOD
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
|
|
|Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|
EOD;
GetTabs($t);
function GetTabs($tabString) {
$tabs = array();
$tabcount = 0;
$instrumentcount = 0;
$tabline = 0;
$tabStringArray = explode("\n", $tabString);
foreach ($tabStringArray as $tabStringRow) {
if (preg_match ('/^(?<snaretuningprefix>[bgdaeBGDAE#])+\|(?<tabline>[0-9-]+)\|/', $tabStringRow)) {
//Matches a tab line
//The tabline group can be expanded with characters for hammer on's, pull off's and whatnot
$tabs[$tabcount][$instrumentcount-1][$tabline] = $tabStringRow;
$tabline++;
continue;
}
if (preg_match ('/^\s\|\s+/', $tabStringRow, $matches)) {
//Matches ' |'
//Continuation of tab do nothing
continue;
}
if (preg_match ('/^\s\|(?<instrument>[A-z0-9\s]+)/', $tabStringRow, $matches)) {
//Matches an instrument line ' |Guitar 1'
$tabs[$tabcount][$instrumentcount]['instrumentname'] = $matches['instrument'];
$instrumentcount++;
$tabline = 0;
continue;
}
if (preg_match ('/^\s+/', $tabStringRow)) {
//Matches empty line
//new tab
$tabcount++;
$instrumentcount = 0;
continue;
}
}
print_r($tabs);
}
?>
The function is commented somewhat, it's not that hard to read I think.
this outputs:
Array
(
[0] => Array
(
[0] => Array
(
[instrumentname] => Guitar 1
[0] => e|--------------3-------------------3------------|
[1] => B|------------3---3---------------3---3----------|
[2] => G|----------0-------0-----------0-------0--------|
[3] => D|--------0-----------0-------0-----------0------|
[4] => A|------2---------------2---2---------------2----|
[5] => E|----3-------------------3-------------------3--|
)
)
[1] => Array
(
[0] => Array
(
[instrumentname] => Guitar 1
[0] => e|--------------3-------------------3------------|
[1] => B|------------3---3---------------3---3----------|
[2] => G|----------0-------0-----------0-------0--------|
[3] => D|--------0-----------0-------0-----------0------|
[4] => A|------2---------------2---2---------------2----|
[5] => E|----3-------------------3-------------------3--|
)
[1] => Array
(
[instrumentname] => Bass 1
[0] => G|----------0-------0-----------0-------0--------|
[1] => D|--------2-----------2-------2-----------2------|
[2] => A|------3---------------3---3---------------3----|
[3] => E|----3-------------------3-------------------3--|
)
)
)
<?php
$t = <<<EOD
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
|Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
|
|
|Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|
EOD;
$t = preg_replace('/\r\n?/', "\n", $t); //normalize line endings
$te = explode("\n", $t);
$out = array();
$cur_inst = "";
$trim = false;
$lastlines = array();
$i = 0;
foreach ($te as $line) {
if (preg_match("/^\\s\\|(\\w+ \\d+)\$/", $line, $matches)) {
if ($matches[1] == $cur_inst) {
$trim = true;
}
else {
$out[$i++] = $line;
$trim = false;
$lastline = array();
$cur_inst = $matches[1];
}
}
elseif (empty($line) || preg_match("/^\\s\\|\$/", $line)) {
if (!preg_match("/^\\s\\|\$/", end($out)))
$out[$i++] = $line;
}
elseif (preg_match("/^([a-zA-Z])\\|(.*)\$/", $line, $matches)) {
if ($trim) {
if (array_key_exists($matches[1], $lastlines)) {
$oldi= $lastlines[$matches[1]];
$out[$oldi] = rtrim($out[$oldi], "|") . $matches[2];
}
else {
die("unexpected line: $line");
}
}
else {
$lastlines[$matches[1]] = $i;
$out[$i++] = $matches[0];
}
}
else {
die("unexpected line: $line");
}
}
$t = implode(PHP_EOL, $out);
echo $t;
gives
|Guitar 1 e|--------------3-------------------3--------------------------3-------------------3------------| B|------------3---3---------------3---3----------------------3---3---------------3---3----------| G|----------0-------0-----------0-------0------------------0-------0-----------0-------0--------| D|--------0-----------0-------0-----------0--------------0-----------0-------0-----------0------| A|------2---------------2---2---------------2----------2---------------2---2---------------2----| E|----3-------------------3-------------------3------3-------------------3-------------------3--| | |Bass 1 G|----------0-------0-----------0-------0--------| D|--------2-----------2-------2-----------2------| A|------3---------------3---3---------------3----| E|----3-------------------3-------------------3--|
If you prefer, you can iterate over the $out
array.
I'm not entirely sure what exactly you mean, but if you want to separate tabs by instrument, try this:
^[^|\r\n]+\|([^|\r\n]+)$\r?\n # match the line that contains the instrument name
# and capture this in backreference 1
( # capture the block of lines that follows
(?: # repeat this for each line
^[^|\r\n]+ # everything up to the first |
\| # |
[^|\r\n]+ # everything up to the next |
\| # |
\r?\n # newline
)+ # at least once
) # end capture
In PHP:
preg_match_all('/^[^|\r\n]+\|([^|\r\n]+)$\r?\n((?:^[^|\r\n]+\|[^|\r\n]+\|\r?\n)+)/im', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
Each match will be of the form
|Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|
and everything else between those blocks will be ignored.
The ^ in your regex will prevent the /s switch from doing what you want.
Also, preg_match_all is going to return a lot of duplicate "matches" because you are using ( ) grouping. If you plan to use preg_match_all() on a file with multiple tabs, isolating real matches might be difficult with those duplicates.
精彩评论