How can I remove duplicate words in a string with Python?

2023-04-13 08:31 问答作者：

Following example:

string1 = "calvin klein design dress calvin klein"

How can I remove the second two duplicates "calvin" and "klein"?

The result should look like

string2 = "calvin klein design d开发者_Go百科ress"

only the second duplicates should be removed and the sequence of the words should not be changed!

string1 = "calvin klein design dress calvin klein"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))

This sorts the set of all the (unique) words in your string by the word's index in the original list of words.

def unique_list(l):
    ulist = []
    [ulist.append(x) for x in l if x not in ulist]
    return ulist

a="calvin klein design dress calvin klein"
a=' '.join(unique_list(a.split()))

In Python 2.7+, you could use collections.OrderedDict for this:

from collections import OrderedDict
s = "calvin klein design dress calvin klein"
print ' '.join(OrderedDict((w,w) for w in s.split()).keys())

Cut and paste from the itertools recipes

from itertools import ifilterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

I really wish they could go ahead and make a module out of those recipes soon. I'd very much like to be able to do from itertools_recipes import unique_everseen instead of using cut-and-paste every time I need something.

Use like this:

def unique_words(string, ignore_case=False):
    key = None
    if ignore_case:
        key = str.lower
    return " ".join(unique_everseen(string.split(), key=key))

string2 = unique_words(string1)

string2 = ' '.join(set(string1.split()))

Explanation:

.split() - it is a method to split string to list (without params it split by spaces)
set() - it is type of unordered collections that exclude dublicates
'separator'.join(list) - mean that you want to join list from params to string with 'separator' between elements

string = 'calvin klein design dress calvin klein'

def uniquify(string):
    output = []
    seen = set()
    for word in string.split():
        if word not in seen:
            output.append(word)
            seen.add(word)
    return ' '.join(output)

print uniquify(string)

You can use a set to keep track of already processed words.

words = set()
result = ''
for word in string1.split():
    if word not in words:
        result = result + word + ' '
        words.add(word)
print result

Several answers are pretty close to this but haven't quite ended up where I did:

def uniques( your_string ):    
    seen = set()
    return ' '.join( seen.add(i) or i for i in your_string.split() if i not in seen )

Of course, if you want it a tiny bit cleaner or faster, we can refactor a bit:

def uniques( your_string ):    
    words = your_string.split()

    seen = set()
    seen_add = seen.add

    def add(x):
        seen_add(x)  
        return x

    return ' '.join( add(i) for i in words if i not in seen )

I think the second version is about as performant as you can get in a small amount of code. (More code could be used to do all the work in a single scan across the input string but for most workloads, this should be sufficient.)

Question: Remove the duplicates in a string

 from _collections import OrderedDict

    a = "Gina Gini Gini Protijayi"

    aa = OrderedDict().fromkeys(a.split())
    print(' '.join(aa))
   # output => Gina Gini Protijayi

Use numpy function make an import its better to have an alias for the import (as np)

import numpy as np

and then you can bing it like this for removing duplicates from array you can use it this way

no_duplicates_array = np.unique(your_array)

for your case if you want result in string you can use

no_duplicates_string = ' '.join(np.unique(your_string.split()))

11 and 2 work perfectly:

    s="the sky is blue very blue"
    s=s.lower()
    slist = s.split()
    print " ".join(sorted(set(slist), key=slist.index))

and 2

    s="the sky is blue very blue"
    s=s.lower()
    slist = s.split()
    print " ".join(sorted(set(slist), key=slist.index))

You can remove duplicate or repeated words from a text file or string using following codes -

from collections import Counter
for lines in all_words:

    line=''.join(lines.lower())
    new_data1=' '.join(lemmatize_sentence(line))
    new_data2 = word_tokenize(new_data1)
    new_data3=nltk.pos_tag(new_data2)

    # below code is for removal of repeated words

    for i in range(0, len(new_data3)):
        new_data3[i] = "".join(new_data3[i])
    UniqW = Counter(new_data3)
    new_data5 = " ".join(UniqW.keys())
    print (new_data5)


    new_data.append(new_data5)


print (new_data)

P.S. -Do identations as per required. Hope this helps!!!

Without using the split function (will help in interviews)

def unique_words2(a):
    words = []
    spaces = ' '
    length = len(a)
    i = 0
    while i < length:
        if a[i] not in spaces:
            word_start = i
            while i < length and a[i] not in spaces:
                i += 1
            words.append(a[word_start:i])
        i += 1
    words_stack = []
    for val in words:  #
        if val not in words_stack:  # We can replace these three lines with this one -> [words_stack.append(val) for val in words if val not in words_stack]
            words_stack.append(val)  #
    print(' '.join(words_stack))  # or return, your choice


unique_words2('calvin klein design dress calvin klein')

initializing list

listA = [ 'xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']

print("Given list : ",listA)

using `set()` and `split()`

res = [set(sub.split('-')) for sub in listA]

Result

print("List after duplicate removal :", res)

To remove duplicate words from sentence and preserve the order of the words you can use dict.fromkeys method.

string1 = "calvin klein design dress calvin klein"

words = string1.split()

result = " ".join(list(dict.fromkeys(words)))

print(result)

You can do that simply by getting the set associated to the string, which is a mathematical object containing no repeated elements by definition. It suffices to join the words in the set back into a string:

def remove_duplicate_words(string):
        x = string.split()
        x = sorted(set(x), key = x.index)
        return ' '.join(x)

继续阅读：duplicates python string

How can I remove duplicate words in a string with Python?

initializing list

using `set()` and `split()`

Result

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

initializing list

using set() and split()

Result

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

using `set()` and `split()`

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？