开发者

unique sorting in python

I am quite new in python.I want to get unique string from a file.txt I have some data like so...

Tempranillo     Rioja_%28wine%29%23Wine_regions
Gr%C3%BCner_Veltliner       Czech_Republic_%28wine%29
Marsanne        California_%28wine%29
Carm%C3%A9n%C3%A8re     Wines_of_Chile
Carm%C3%A9n%C3%A8re     Washington_%28U.S._state%29
Gr%C3%BCner_Veltliner       Czech_Republic_%28wine%29

So, I have tried with the following code:

import re
import string
import urllib

for line in open('file.txt', 'r').readlines():
    left, right = string.split(line)
    relation = string.split(line)


 开发者_StackOverflow社区   dom = relation[0]
    rang = relation[1]

    dom = urllib.unquote(relation[0])
    dom = dom.replace('_', ' ')


    rang= urllib.unquote(relation[1])
    rang = rang.replace('_', ' ')

How to proceed further.I need to get unique co-occurrence of (dom rang) in this format:

Tempranillo     Rioja (wine) Wine regions
Marsanne        California (wine)

Any kind of help will be greatly appreciated.Thanks!


To filter out duplicates lines from the file, do this:

with open("file.txt") as f:
    unique_lines = set(f)


I would recommend using urllib2 -- and a functional style is good for string processing like this:

import urllib2

def process_item(x):
    return urllib2.unquote(x).replace('_', ' ')

def process_line(line):
    return tuple(process_item(i) for i in line.split())

with open('t.txt', 'r') as infile:
    unique_wines = set(process_line(l) for l in infile)

for dom, rang in sorted(unique_wines):
    print dom, ':', rang


Well, if I understand you correctly:

Put this before you open the file:

 wines = {}

Put this at the last lines inside the loop:

# if the wine location does not exist in the wines dictionary
if not dom in wines:
    # create a set at that index. (sets, unlike lists, will discard duplicates)
    wines[dom] = set();
wines[dom].add(rang) #add the wine and assume the set will handle dupes.

Put this after the loop:

# Prints a list of all wines organized by region
for dom in wines:
    for wine in wines[dom]:
        print("{0}\t{1}".format(dom, wine))

As a note


Another poster suggested this:

with open("file.txt") as f:
    unique_lines = set(f)

That is the best solution if there is no extra whitespace on any lines. Please try his suggestion first.


Check out set and frozenset; that should get you started.


>>> from collections import *

>>> wines = """
... a b
... a b
... c d
... """.strip()

>>> lines = wines.splitlines()

>>> Counter(lines)
Counter({'a b': 2, 'c d': 1})
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜