Matching the best regex
In following table, how can I match 'FG2-4G4T5
' so that the first returned row is the best match (which would be 'FG2-4G
')?
part_match
^FG2|^FF2|^FF3-S|^FF4-GR
^FG2-4G
^FG2
FG2-RGST
The following query doesn't work in this case because '^FG2|^FF2|^FF3-S|^FF4-GR
' has greater l开发者_开发知识库ength:
SELECT * FROM parts WHERE 'FG2-4G4T5' REGEXP part_match ORDER BY length(part_match) DEC
Any help will be greatly appreciated. Thanks in advance.
Sometimes the K.I.S.S. solution is best. Keep another column that lists the actual length of text matched by the pattern?
part_match part_match_length
======================================
^FG2|^FF2|^FF3|^FF4 3
^FG2-4G 6
^FG2 3
and thus...
SELECT * FROM parts
WHERE 'FG2-4G4T5' REGEXP part_match
ORDER BY part_match_length DEC
Technically, the second column doesn't even have to necessarily be length - just some kind of indication of how good a match a given pattern provides.
Assuming your part_match is ALWAYS of the form ^xxxx and optionally EQUAL-SIZED ^yyyy|^zzzz|...
SELECT * FROM parts
WHERE 'FG2-4G4T5' REGEXP part_match
ORDER BY instr(concat(part_match,'|'),'|') DEC
This will give you the longest matching prefix.
To solve the inclusion of FULL matches, take one off the length of part matches, so,
SELECT * FROM parts
WHERE 'FG2-4G4T5' REGEXP part_match
ORDER BY CASE WHEN part_match LIKE '^%' THEN -1 else 0 end +
instr(concat(part_match,'|'),'|') DEC
To solve REGEX involving multiple sections of unequal length, e.g. ^YYYY|^Z|ABC
, you will first need to find a function that breaks that into multiple rows
part_match | single_part
^YYYY|^Z|ABC ^YYYY
^YYYY|^Z|ABC ^Z
^YYYY|^Z|ABC ABC
Then REGEXP by single_part referencing back the part_match in the answer. This is non-trivial but can be achieved with a Numbers table and judicious use of SUBSTR and INSTR.
Create a numbers table (run this once):
DROP PROCEDURE IF EXISTS CreateNumbersTable;
delimiter //
CREATE PROCEDURE CreateNumbersTable()
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ''
BEGIN
drop table if exists Numbers;
create table Numbers (N int primary key);
SET @x := 0;
REPEAT
insert into Numbers values (@x);
SET @x := @x + 1;
UNTIL @x > 999 END REPEAT;
END//
delimiter ;
CALL CreateNumbersTable;
DROP PROCEDURE CreateNumbersTable;
Then you can use this query
select p.*
# ,substr(p.part_match, N.N+1, locate('|', concat(p.part_match,'|'), N.N+2) -N.N -1)
# ,length(substr(p.part_match, N.N+1, locate('|', concat(p.part_match,'|'), N.N+2) -N.N -1))
from parts p
inner join numbers N on N.N between 0 and length(p.part_match)
and ((N.N = 0) or (substr(p.part_match, N.N, 1) = '|'))
WHERE 'FG2-RGST' REGEXP p.part_match
and 'FG2-RGST' REGEXP substr(p.part_match, N.N+1, locate('|', concat(p.part_match,'|'), N.N+2) -N.N -1)
order by length(substr(p.part_match, N.N+1, locate('|', concat(p.part_match,'|'), N.N+2) -N.N -1)) DESC
Uncomment lines 2 and 3 to see the parts of part_match it matches on.
精彩评论