unique sorting in python
I am quite new in python.I want to get unique string from a file.txt I have some data like so...
Tempranillo Rioja_%28wine%29%23Wine_regions
Gr%C3%BCner_Veltliner Czech_Republic_%28wine%29
Marsanne California_%28wine%29
Carm%C3%A9n%C3%A8re Wines_of_Chile
Carm%C3%A9n%C3%A8re Washington_%28U.S._state%29
Gr%C3%BCner_Veltliner Czech_Republic_%28wine%29
So, I have tried with the following code:
import re
import string
import urllib
for line in open('file.txt', 'r').readlines():
left, right = string.split(line)
relation = string.split(line)
开发者_StackOverflow社区 dom = relation[0]
rang = relation[1]
dom = urllib.unquote(relation[0])
dom = dom.replace('_', ' ')
rang= urllib.unquote(relation[1])
rang = rang.replace('_', ' ')
How to proceed further.I need to get unique co-occurrence of (dom rang) in this format:
Tempranillo Rioja (wine) Wine regions
Marsanne California (wine)
Any kind of help will be greatly appreciated.Thanks!
To filter out duplicates lines from the file, do this:
with open("file.txt") as f:
unique_lines = set(f)
I would recommend using urllib2
-- and a functional style is good for string processing like this:
import urllib2
def process_item(x):
return urllib2.unquote(x).replace('_', ' ')
def process_line(line):
return tuple(process_item(i) for i in line.split())
with open('t.txt', 'r') as infile:
unique_wines = set(process_line(l) for l in infile)
for dom, rang in sorted(unique_wines):
print dom, ':', rang
Well, if I understand you correctly:
Put this before you open the file:
wines = {}
Put this at the last lines inside the loop:
# if the wine location does not exist in the wines dictionary
if not dom in wines:
# create a set at that index. (sets, unlike lists, will discard duplicates)
wines[dom] = set();
wines[dom].add(rang) #add the wine and assume the set will handle dupes.
Put this after the loop:
# Prints a list of all wines organized by region
for dom in wines:
for wine in wines[dom]:
print("{0}\t{1}".format(dom, wine))
As a note
Another poster suggested this:
with open("file.txt") as f:
unique_lines = set(f)
That is the best solution if there is no extra whitespace on any lines. Please try his suggestion first.
Check out set and frozenset; that should get you started.
>>> from collections import *
>>> wines = """
... a b
... a b
... c d
... """.strip()
>>> lines = wines.splitlines()
>>> Counter(lines)
Counter({'a b': 2, 'c d': 1})
精彩评论