Finding the longest border of a string
First, let me tell you what the border of a string is,
let x = "abacab"
let y = "ababab"
The border of a string is a substrin开发者_开发问答g which is both a proper prefix and proper suffix of the string — "proper" meaning that the whole string does not count as a substring. The longest border of x
is "ab". The longest border of y
is "abab" (the prefix and suffix can overlap).
Another example:
In string "abcde hgrab abcde", then "abcde" is a prefix as well as suffix. Thus it is also the longest border of the string above.How can I find the longest border of a string?
Finding the "border of a string" is what the prefix function (also known as failure function) of Knuth-Morris-Pratt algorithm do. Implementation in c++ (a bit changed version of this code):
int longestBorder(const string& s) {
int len = s.length();
vector<int> prefixFunc(len);
prefixFunc[0] = 0;
int curBorderLen = 0;
for (int i = 1; i < len; ++i) {
while (curBorderLen > 0 && s[curBorderLen] != s[i])
curBorderLen = prefixFunc[curBorderLen - 1];
if (s[curBorderLen] == s[i])
++curBorderLen;
prefixFunc[i] = curBorderLen;
}
return prefixFunc[len-1];
}
Runnable version: http://ideone.com/hTW8FL
The complexity of this algorithm is O(n)
.
Here's a Java implementation, based on the assumption that borders are proper substrings. (Otherwise the longest border is simply the string length.)
public static int findLongestBorder(String s) {
int len = s.length();
for (int i = len - 1; i > 0; i--) {
String prefix = s.substring(0, i);
String suffix = s.substring(len - i, len);
if (prefix.equals(suffix)) {
return i;
}
}
return 0;
}
This could be optimized a bit by starting with the string's character array and then comparing individual characters, but the idea behind the algorithm is clearer the way I wrote it.
Here is a JS solution with commentary that uses the prefix function that DAIe mentioned:
function getPrefixBorders(string) {
// This will contain the border length for each
// prefix in ascending order by prefix length.
var borderLengthByPrefix = [0];
// This is the length of the border on the current prefix.
var curBorderLength = 0;
// Loop from the 2nd character to the last.
for (var i = 1; i < string.length; i++) {
// As long as a border exists but the character
// after it doesn't match the current character,
while (curBorderLength > 0 && string[curBorderLength] !== string[i])
// set the border length to the length of the current border's border.
curBorderLength = borderLengthByPrefix[curBorderLength - 1];
// If the characters do match,
if (string[curBorderLength] === string[i])
// the new border is 1 character longer.
curBorderLength++;
// Note the border length of the current prefix.
borderLengthByPrefix[i] = curBorderLength;
}
return borderLengthByPrefix;
}
It returns the longest border lengths of every prefix in a string (which is a lot more than asked for, but it does so in linear time). So to get the length of the longest border in the full string:
var string = "ababab";
var borderLengthsByPrefix = getPrefixBorders(); // [0,0,1,2,3,4]
var stringBorderLength = borderLengthsByPrefix[borderLengthsByPrefix.length - 1];
Another great resource for understanding how this works is this video (and the one before it) on Coursera.
To get the length of the longest border, do this:
def get_border_size(astr):
border = 0
for i in range(len(astr)):
if astr[:i] == astr[-i:]:
border = i
return border
To get the longest border itself, this:
def get_border(astr):
border = 0
for i in range(len(astr)):
if astr[:i] == astr[-i:]:
border = astr[:i]
return border
I've made a solution using Python3
(works also with Python2
), using Counter
from collections
module and max()
.
Here is my solution:
from collections import Counter
def get_seq(a):
data = []
for k in range(1, len(a)):
data.append(a[:k])
data.append(a[k:])
return Counter(data)
def get_max_sublist(a):
bb = [k for k in a.items() if k[1] > 1]
try:
k, j = max(bb, key= lambda x: len(x[0]))
n, _ = max(a.items(), key= lambda x: x[1])
except ValueError:
return None
else:
return k if j > 1 else n
seq = ["abacab", "ababab", "abxyab", "abxyba", "abxyzf", "bacab"]
for k in seq:
j = get_seq(k)
print("longest border of {} is: {}".format(k, get_max_sublist(j)))
Output:
longest border of abacab is: ab
longest border of ababab is: abab
longest border of abxyab is: ab
longest border of abxyba is: a
longest border of abxyzf is: None
longest border of bacab is: b
This simple solution with a single loop works just fine:
function findLongestBorder($s){
$a = 0;
$b = 1;
$n = strlen($s);
while($b<$n){
if($s[$a]==$s[$b]){
$a++;
}else{
$b-= $a;
$a = 0;
}
$b++;
}
return substr($s,0,$a);
}
Example:
echo findLongestBorder("abacab")."\n";
echo findLongestBorder("ababab")."\n";
echo findLongestBorder("abcde hgrab abcde")."\n";
echo findLongestBorder("bacab")."\n";
echo findLongestBorder("abacababac")."\n";
Output:
ab
abab
abcde
b
abac
See https://eval.in/812640
I've been using a lot of javascript lately so I did it with Javascript:
function findBorder() {
var givenString = document.getElementById("string").value;
var length = givenString.length;
var y = length;
var answer;
var subS1;
var subS2;
for (var x = 0; x < length; x++ ){
subS1 = givenString.substring(0, x);
subS2 = givenString.substring(y);
if(subS2 === subS1){
answer = subS1;
}
y--;
}
document.getElementById("answer").innerHTML = answer.toString();
}
<h1>put the string in here</h1>
<input type="text" id="string" />
<button id="goButton" onclick="findBorder()">GO</button>
<h3 id="answer"></h3>
Here is the code Implementation in c to find the longest border of the string
#include<stdio.h>
#include<string.h>
#include <stdlib.h>
int main()
{
char str[]="abcdefabcanabcabccabcdef";
int n,i,j,l,k,count,max=0;
l=strlen(str);
for(i=0;i<(l/2);i++)
{ j=l-1-i;
k=0;
count=0;
while(k<=i&&j<l)
{
if(str[k]==str[j])
count++;
k++;
j++;
}
if(count==(k))
{
if(isasubstring(str,k,l-(2*(k))))
if(max<count)
max=count;
}
}
return 0;
}
FUNCTION: isasubstring that finds the maximum width and the pattern of the border from the string.
isasubstring(char *a,int s,int n)
{
int i,j;
char *temp;
char *pattern=malloc(sizeof(char)*(s+1));
char *input =malloc(sizeof(char)*(n+1));
memcpy(pattern,a,s);
pattern[s]='\0';
j=0;
for(i=s;i<=s+n-1;i++)
{
input[j]=a[i];
j++;
}
input[j]='\0';
printf("The string between the border :%s\n The longest border is: %s\n",input,pattern);
temp=strstr(input,pattern);
if(temp)
return 1;
else
return 0;
}
The output of the program as below: //When the input is abcdefabcanabcabccabcdef
The string between the border :abcanabcabcc
The longest border is: abcdef
Implementation in Perl,Using the regular expression match
use strict;
use warnings;
while(<STDIN>)
{
if ( /^([a-zA-z]+).*\1$/)
{
print "Longest Border : $1\n";
}
else
{
print "No border in the pattern as suffix and prefix\n";
}
}
This program gets the standard input as string and find for the pattern.
^ - beginning of the line
$ - end of the line
([a-zA-z]+) - Grouping the pattern which holds in $1 or \1 variable
.* - Match any characters in between the borders.
If you are talking about character arrays, I think you want the following. This is based on the border being the first and last character of a string. Your examples aren't clear as to what a border is. You need to more clearly define what a border is.
x = abcde
border = { x[0], x[length(x)-1) }
and if you need length
length(z) {
return sizeof(z) / (sizeof(z[0])
精彩评论