Python program to search for specific strings in hash values (coding help)
Trying to write a code that searches hash values for specific string's (input by user) and returns the hash if searchquery is present in that line.
Doing this to kind of just learn python a bit more, but it could be a real world application used by an HR department to search a .csv resume database for specific words in each resume.
I'd like this program to look through a .csv file that has three entries per line (id#;applicant name;resume text)
I set it up so that it creates a hash, then created a string for the resume text hash entry, and am trying to use the .find() function to return the entire hash for each instance.
What i'd like is if the word "gpa" is used as a search query and it is found in s['resumetext'] for three applicants(rows in .csv file), it prints the id, name, and res开发者_如何学运维ume for every row that has it.(All three applicants)
As it is right now, my program prints the first row in the .csv file(print resume['id'], resume['name'], resume['resumetext']) no matter what the searchquery is, whether it's in the resumetext or not.
lastly, are there better ways to doing this, by searching word documents, pdf's and .txt files in a folder for specific words using python (i've just started reading about the re module and am wondering if this may be the route, rather than putting everything in a .csv file.)
def find_details(id2find):
resumes_f=open("resume_data.csv")
for each_line in resumes_f:
s={}
(s['id'], s['name'], s['resumetext']) = each_line.split(";")
resumetext = str(s['resumetext'])
if resumetext.find(id2find):
return(s)
else:
print "No data matches your search query. Please try again"
searchquery = raw_input("please enter your search term")
resume = find_details(searchquery)
if resume:
print resume['id'], resume['name'], resume['resumetext']
The line
resumetext = str(s['resumetext'])
is redundant, because s['resumetext']
is already a string (since it comes as one of the results from a .split
call). So, you can merge this line and the next into
if id2find in s['resumetext']: ...
Your following else
is misaligned -- with it placed like that, you'll print the message over and over again. You want to place it after the for
loop (and the else
isn't needed, though it would work), so I'd suggest:
for each_line in resumes_f:
s = dict(zip('id name resumetext'.split(), each_line.split(";"))
if id2find in s['resumetext']:
return(s)
print "No data matches your search query. Please try again"
I've also shown an alternative way to build dict s
, although yours is fine too.
What @Justin Peel said. Also to be more pythonic I would say change
if resumetext.find(id2find) != -1:
to if id2find in resumetext:
A few more changes: you might want to lower case the comparison and user input so it matches GPA, gpa, Gpa, etc. You can do this by doing searchquery = raw_input("please enter your search term").lower()
and resumetext = s['resumetext'].lower()
. You'll note I removed the explicit cast around s['resumetext'] as it's not needed.
One change that I recommend for your code is changing
if resumetext.find(id2find):
to
if resumetext.find(id2find) != -1:
because find() returns -1 if id2find wasn't in resumetext. Otherwise, it returns the index where id2find is first found in resumetext, which could be 0. As @Personman commented, this would give you the false positive because -1 is interpreted as True in Python.
I think that problem has something to do with the fact that find_details() only returns the first entry for which the search string is found in resumetext. It might be good to make find_details() into a generator instead and then you could iterate over it and print the found records out one by one.
精彩评论