how to create english language dictionary application with python (django)?
I would like to create an online dictionary application by using python (or with django).
It will be similar to http://dictionary.reference.com/.
PS: the dictionary is not stored in a database. it's stored in a text file or gunzip file. Free english dictionary files can be downloaded from this URL: dicts.info/dictionaries.php.
The easiest free dictionary file will be in the format of:
word1 explanation for word1
word2 explanation for word2
There are some other formats as well. but all are stored in either text file or text.gz file
My question is
(1) Are there any existing open source python package or modules or application which implements this functionality that I can use or study from?
(2) If the answer to the first question is NO. which algorithm should I follow to create such web application? Can I simply use the python built-in dictionary object for this job? so that the dictionary object's key will be the english word and the value will be the explanation. is this OK in term of performance? OR Do 开发者_如何学CI have to create my own Tree Object to speed up the search? or any existing package which handles this job properly?
Thank you very much.
You might want to check out http://www.nltk.org/ You could get lots of words and their definitions without having to worry about the implementation details of a database. If you're new to all this stuff, at the very least it would be useful to get you up and going, and then when you've got a working version, start putting in a database.
Here's a quick snippet of how to get all the available meanings of "dog" from that package:
from nltk.corpus import wordnet
for word_meaning in wordnet.synsets('dog'):
print word_meaning.definition
I'm not sure 'What' functionality you are talking about. If you mean 'searching keywords from a dictionnary that is recorded in your database', then python dictionnary
is not a possible solution, as you would have to deserialize your whole database in order to make a search.
You should rather look towards the django 'search' applications. A lot of people advise to use haystack
:
What's the best Django search app?
and use this search engine to look for some keyword in your database.
If you don't want to support sophisticated searches, then you could also query for an exact keyword in your database
DictEntry.objects.get(keyword=`something`).definition
I guess it all depends on the level of sophistication you want to achieve, but there can be extremely simple solutions.
EDIT :
If the dictionnaries come from files, then it's hard to say, you have plenty of solutions.
If the file is small, you could indeed deserialize it to a dictionnary when starting the server, and then always search in the same instance (so you wouldn't have to deserialize again for each request).
If the files are really big, you could consider migrating them to your database.
1) First create your Django models, so you would know what data you need, the name of your fields, etc... for example :
class DictEntry(Model):
keyword = CharField(max_length=100)
definition = CharField(max_length=100)
2) It seems like some of the files on the link you gave are in csv format (it seems also like you can have them in xml). With the csv module from standard library, you could extract these files to python.
3) and then with the json or yaml python libraries, you dump these files back to a different format (json or yaml) as described in initial data for your model. And magic your initial data is ready !
PS : the good thing with python : you google 'python json' you will find the official doc because a library for writing/reading json is part of the standard python lib !!! Idem with xml and csv ...
A dictionary should be pretty small (by IT standards).
For performance, make sure that the dictionary is built in the module namespace:
Good:
# build the dictionary
english_dict = dict()
for line in open(dict_file):
# however you process the file:
word,def = line.split(',')
# put it in the dictionary
english_dict[word] = def
def get_definition(word):
# should use english_dict.get(word,'undefined')
if word in english_dict:
return english_dict[word]
else:
return 'no definition'
Bad
def get_definition(word):
# build the dictionary
english_dict = dict()
for line in open(dict_file):
# however you process the file:
word,def = line.split(',')
# put it in the dictionary
english_dict[word] = def
if word in english_dict:
return english_dict[word]
else:
return 'no definition'
Or you could use pickle to save the dictionary (so it's faster to read in), or put it all in a database. It's up to you.
# importing pandas module
import pandas as pd
# reading csv file from url
data = pd.read_csv("yourfilename.csv")
# dropping null value columns to avoid errors
data.dropna(inplace = True)
# converting to dict
data_dict = data.to_dict()
# display
data_dict
精彩评论