unique sorting in python

2023-03-17 01:19 问答作者：

I am quite new in python.I want to get unique string from a file.txt I have some data like so...

Tempranillo     Rioja_%28wine%29%23Wine_regions
Gr%C3%BCner_Veltliner       Czech_Republic_%28wine%29
Marsanne        California_%28wine%29
Carm%C3%A9n%C3%A8re     Wines_of_Chile
Carm%C3%A9n%C3%A8re     Washington_%28U.S._state%29
Gr%C3%BCner_Veltliner       Czech_Republic_%28wine%29

So, I have tried with the following code:

import re
import string
import urllib

for line in open('file.txt', 'r').readlines():
    left, right = string.split(line)
    relation = string.split(line)


 开发者_StackOverflow社区   dom = relation[0]
    rang = relation[1]

    dom = urllib.unquote(relation[0])
    dom = dom.replace('_', ' ')


    rang= urllib.unquote(relation[1])
    rang = rang.replace('_', ' ')

How to proceed further.I need to get unique co-occurrence of (dom rang) in this format:

Tempranillo     Rioja (wine) Wine regions
Marsanne        California (wine)

Any kind of help will be greatly appreciated.Thanks!

To filter out duplicates lines from the file, do this:

with open("file.txt") as f:
    unique_lines = set(f)

I would recommend using urllib2 -- and a functional style is good for string processing like this:

import urllib2

def process_item(x):
    return urllib2.unquote(x).replace('_', ' ')

def process_line(line):
    return tuple(process_item(i) for i in line.split())

with open('t.txt', 'r') as infile:
    unique_wines = set(process_line(l) for l in infile)

for dom, rang in sorted(unique_wines):
    print dom, ':', rang

Well, if I understand you correctly:

Put this before you open the file:

 wines = {}

Put this at the last lines inside the loop:

# if the wine location does not exist in the wines dictionary
if not dom in wines:
    # create a set at that index. (sets, unlike lists, will discard duplicates)
    wines[dom] = set();
wines[dom].add(rang) #add the wine and assume the set will handle dupes.

Put this after the loop:

# Prints a list of all wines organized by region
for dom in wines:
    for wine in wines[dom]:
        print("{0}\t{1}".format(dom, wine))

As a note

Another poster suggested this:

with open("file.txt") as f:
    unique_lines = set(f)

That is the best solution if there is no extra whitespace on any lines. Please try his suggestion first.

Check out set and frozenset; that should get you started.

>>> from collections import *

>>> wines = """
... a b
... a b
... c d
... """.strip()

>>> lines = wines.splitlines()

>>> Counter(lines)
Counter({'a b': 2, 'c d': 1})

继续阅读：python string

unique sorting in python

As a note

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

As a note

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？