Remove almost duplicate values from an array in PHP [closed]
Need help!
i haven an array where the values are duplicated but not entirely,
somestring = 'abcd-abcd-123', someOTHERstring223 = 'abcsd--adsf_12ds'
Array
(
[0] => somestring
[1] => somestring-(don't know the delimiter)core
[2] => somestring_(don't know the delimiter)-(don't know the delimiter)somethingelse
[3] => someOTHERstring223
[4] => someOTHERstring223_junkstring
[5] => someOTHERstring223OTHERSTRING-somethingNEW
)
and the result i want it would be
somestring
someOTHERstring223
i just want the shortest values, cause somestring, somestring-(don't know the delimiter)core, somestring_(don't开发者_C百科 know the delimiter)-(don't know the delimiter)somethingelse are the same because they all start with somestring.
sorry everybody, i didn't asked the correct question.
i came up with the answer but i don't know if it the most efficient,
$coLL = array('somestring',"somestring-(don't know the delimiter)core","somestring_(don't know the delimiter)-(don't know the delimiter)somethingelse"
,"someOTHERstring223",'someOTHERstring223_junkstring','someOTHERstring223OTHERSTRING-somethingNEW');
$coLL2 = $coLL;
foreach($coLL as $coLLK=>$coLLV){
$flength = strlen($coLLV);
foreach($coLL2 as $coLL2K=>$coLL2V){
if(strcmp($coLLV, $coLL2V) < 0){
if(strlen($coLL2V)-$flength > 3){
unset($coLL2[$coLL2K]);
}
}
}
}
i set this limiter if(strlen($coLL2V)-$flength > 3) because what if somestring1 comes up or somestring12 or somestring123 they are unique and they not match somestring.
Thanks everybody for your answers.
This should do it:
<?php
$array = array('apple','apple-core','apple-core-something','orange','orange-core','orange-core-someting');
$result = array();
foreach ($array as $entry) {
$entry = explode('-',$entry);
if (!in_array($entry[0],$result)) {
$result[] = $entry[0];
}
}
print_r($result);
?>
Working Example
The other answers all assume that -
or some other token can delimit your shortest string. To do what you want without any delimiters, you could use something like this code:
$yourArray = Array(
0 => "apple",
1 => "apple-core",
2 => "apple-core-something",
3 => "orange",
4 => "orange-dot",
5 => "orange-dot-something",
) ;
$resultArray = Array() ;
foreach($yourArray as $test) {
if(strlen($test)==0) continue(1) ; // Drop 0 length items.
foreach($resultArray as $rkey => $rval) {
if(strpos($test, $rval)===0) { // If $test starts with $rval
continue(2) ; // Continue outer foreach
} elseif(strpos($rval, $test)===0) { // If $rval starts with $test
unset($resultArray[$rkey]) ; // No longer shortest unique
continue(1) ; // Continue inner foreach (and add $test)
}
}
$resultArray[] = $test ;
}
var_dump($resultArray) ;
// array(2) {
// [0]=>
// string(5) "apple"
// [1]=>
// string(6) "orange"
// }
$store = array();
foreach($data as $fruit) $store[] = array_shift(explode('-',$fruit));
print_r($store);
here $data is the array you have posted above
To solve your problem divide it:
- Normalize each value to only contain the value you want to look for extact duplicates (
strtok
Docs). - Remove duplicates from the array (
array_unique
Docs).
Demo:
function normalize($v)
{
return strtok($v, '-_');
}
$normalized = array_map('normalize', $data);
$unique = array_unique($normalized);
Result:
array(3) {
[0]=>
string(10) "somestring"
[3]=>
string(18) "someOTHERstring223"
[5]=>
string(29) "someOTHERstring223OTHERSTRING"
}
You actually build a hash for each entry in the list. The hash is representing the comparison value of the original value. Then unique the hashes (and you actually want only the hashes).
What you need is a hash function that fulfills your needs. In the example above, the hash function is normalize
.
If the outcome does not suit your needs, you need to adopt the hash function. I had chosen strtok
as it seemed suitable for your (original) case. However if looking for a delimiter get's more complicated, you might look for regular expressions to specifiy a delimiter, like preg_split
Docs or preg_replace
Docs.
However to make use of a regular expression, you must know what your delimiter is, because bascially you follow the strategy to pad a string to build the hash. Without a well specified delimiter there is only try an error.
foreach($a as $k=>$v) {
foreach($a as $k2=>$v2) {
if ($k2 == $k)
break;
if ($v == substr($v2, 0, strlen($v))) {
unset($a[$k2]);
break;
}
if ($v2 == substr($v, 0, strlen($v2))) {
unset($a[$k]);
break;
}
}
}
Note: my solution just drops the elements for which there is an element in the array which is an exact prefix of the element. your updated question doesn't have a solution since you have to know what the delimiters are.
精彩评论