Return prefix of string using regular expression where stripped string sometimes contains '/'
I'm trying to return a prefix of a string, my related question is here,but I've run into a new problem.
How to return the string prefix from regexp
Basically I have a strings like
23430-BL 23430BZ 23430BK/BL
The Extensions I'm trying to remove are
strip_ext = BK/BL|BZ|BL
The regular expression I'm using to get the string without the e开发者_C百科xtension is
prefix = sample_data[/(.*[^-])-?(?:#{strip_ext})/,1]
This is returning
23430 23430 23430-BK
In theory, I understand that the regexp finds the BL match, and for some reason selects that as the match over the BK/BL. But is there a way to get the regexp to find BK/BL rather than BL?
Unfortunately, there isn't always a dash before the part that I am looking to strip.
I added the original strip_ext list as an example, and thought it would make it easy to understand. An actual strip_ext list looks like this and changes based on the sample data provided, so unfortunately it isn't as easy as Mu's answer below.
Make the first quantifier ungreedy.
See it here on Regexr
The ?
causes the .*?
to match as less as possible.
You could mix a negative look-behind into your BL
Adding (?<!BK\/)
indicates that you want to match BL
except when it is preceded by BK/
A quick test:
>> %w{23430-BL 23430GR 23430BK/BL}.map { |s| s[/(.*[^-])-?(?:BK\/BL|BZ|(?<!BK\/)BL)/,1] }
=> ["23430", nil, "23430"]
Your sample output doesn't match your input though, is "GR" a typo in your inputs or is "BZ" a typo in your regex?
Given that your patterns are not fixed, you could bypass regular expressions completely and fall back on simple string wrangling. Here's a better example of what I mentioned in my comment:
require 'set'
# The suffix list that you get from somewhere.
suffixes = [ 'BK/BL', 'BZ', 'BL' ]
# We want to do a couple things at once here. For each suffix, we
# want both the suffix and the suffix with a leading '-' attached,
# the `map` and `flatten` stuff does that. Then we group them by
# length to get a hash like:
# { 2 => ['BZ','BL'], 3 => ['-BZ', '-BL'], 5 => ['BK/BL'], ... }
by_length = { |suffix| [suffix, '-' + suffix ] }.flatten.group_by(&:length)
# Now we reorganize our suffixes into sets with the set of longest
# suffixes first and the set of shortest suffixes last. The result
# will be:
# [#<Set: {"-BK/BL"}>, #<Set: {"BK/BL"}>, #<Set: {"-BZ", "-BL"}>, #<Set: {"BZ", "BL"}>]
sets = by_length.keys.sort { |a,b| b <=> a }.map { |k|[k]) }
# Then we can just spin through sets, pull off the suffix of the
# appropriate length from the string, and see if it is in our set.
# If it is then chop the suffix off the string, do whatever is to be
# done with chopped string, and break out for the next string.
%w{ 23430-BL 23430BZ 23430BK/BL }.each do |string|
sets.each do |suffixes|
len = suffixes.first.length
sfx = string[string.length - len, len]
puts string[0 .. -(len + 1)]
That's just an "off the top of my head" illustration of the algorithm.