use type() information to cast values stored as strings
In my application I have generated a number of values (three columns, of type int, str and datetime, see example below) and these values are stored in a flat file as comma-separated strings. Furthermore, I store a file containing the type of the values (see below). Now, how can I use this information to cast my values from the flat fil开发者_JS百科e to the correct data type in Python? Is is possible or do I need to do some other stuff?
Data file:
#id,value,date
1,a,2011-09-13 15:00:00
2,b,2011-09-13 15:10:00
3,c,2011-09-13 15:20:00
4,d,2011-09-13 15:30:00
Type file:
id,<type 'int'>
value,<type 'str'>
date,<type 'datetime.datetime'>
As I understand, you already parsed the file, you now just need to get the right type. So let's say id_
, type_
and value
are three strings that contain the values in the file. (Note, type_
should contain 'int'
— for example —, not '<type 'int'>'
.
def convert(value, type_):
import importlib
try:
# Check if it's a builtin type
module = importlib.import_module('__builtin__')
cls = getattr(module, type_)
except AttributeError:
# if not, separate module and class
module, type_ = type_.rsplit(".", 1)
module = importlib.import_module(module)
cls = getattr(module, type_)
return cls(value)
Then you can use it like..:
value = convert("5", "int")
Unfortunately for datetime this doesnt work though, as it can not be simply initialized by its string representation.
Your types file can be simpler:
id=int
value=str
date=datetime.datetime
Then in your main program you can
import datetime
def convert_datetime(text):
return datetime.datetime.strptime(text, "%Y-%m-%d %H:%M:%S")
data_types = {'int':int, 'str':str, 'datetime.datetime':convert_datetime}
fields = {}
for line in open('example_types.txt').readlines():
key, val = line.strip().split('=')
fields[key] = val
data_file = open('actual_data.txt')
field_info = data_file.readline().strip('#\n ').split(',')
values = [] #store it all here for now
for line in data_file.readlines():
row = []
for i, element in enumerate(line.strip().split(',')):
element_type = fields[field_info[i]] # will get 'int', 'str', or 'datetime'
convert = data_types[element_type]
row.append(convert(element))
values.append(row)
# to show it working...
for row in values:
print row
Follow these steps:
- Read the file line by line, for each line do the following steps
- Split the line using
split()
with,
as the separator. - Cast the first element of list (from step 2) as an int. Keep the second element as string. Parse the third value
(e.g. using slices)
and make adatetime
object of the same.
I had to deal with a similar situation in a recent program, that had to convert many fields. I used a list of tuples, where one element of the tuples was the conversion function to use. Sometimes it was int
or float
; sometimes it was a simple lambda
; and sometimes it was the name of a function defined elsewhere.
Instead of having a separate "type" file, take your list of tuples of (id, value, date)
and just pickle
it.
Or you'll have to solve the problem of storing your string-to-type converters as text (in your "type" file), which might be a fun problem to solve, but if you're just trying to get something done, go with pickle
or cPickle
First, you cannot write a "universal" or "smart" conversion that magically handles anything.
Second, trying to summarize a string-to-data conversion in anything other than code never seems to work out well. So rather than write a string that names the conversion, just write the conversion.
Finally, trying to write a configuration file in a domain-specific language is silly. Just write Python code. It's not much more complicated than trying to parse some configuration file.
Is is possible or do i need to do some other stuff?
Don't waste time trying to create a "type file" that's not simply Python. It doesn't help. It is simpler to write the conversion as a Python function. You can import that function as if it was your "type file".
import datetime
def convert( row ):
return dict(
id= int(row['id']),
value= str(row['value']),
date= datetime.datetime.strptime(row['date],"%Y-%m-%d %H:%M:%S"),
)
That's all you have in your "type file"
Now you can read (and process) your input like this.
from type_file import convert
import csv
with open( "date", "rb" ) as source:
rdr= csv.DictReader( source )
for row in rdr:
useful_row= convert( row )
in many cases i do not know the number of columns or the data type before runtime
This means you are doomed.
You must have an actual definition the file content or you cannot do any processing.
"id","value","other value"
1,23507,3
You don't know if "23507" should be an integer, a string, a postal code, or a floating-point (which omitted the period), a duration (in days or seconds) or some other more complex thing. You can't hope and you can't guess.
After getting a definition, you need to write an explicit conversion function based on the actual definition.
After writing the conversion, you need to (a) test the conversion with a simple unit test, and (b) test the data to be sure it really converts.
Then you can process the file.
You might want to look at the xlrd module. If you can load your data into excel, and it knows what type is associated with each column, xlrd will give you the type when you read the excel file. Of course, if the data is given to you as a csv then someone would have to go into the excel file and change the column types by hand.
Not sure this gets you all the way to where you want to go, but it might help
精彩评论