Detect data type from XML string using python
I have some XML tagged string as follows.
<Processor>AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ 2.31 GHz</Processor>
<ClockSpeed>2.31</ClockSpeed>
<NumberOfCores>2</NumberOfCores>
<InstalledMemory>2.00</InstalledMemory>
<OperatingSystem>Windows 7 Professional</OperatingSystem>
How can I detect the data type automatically using python? For example, "AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ 2.31 GHz" -> string, "2.31" 开发者_开发技巧-> float, and on.
I need this functionality as I need to make SQLite Table out of the XML data something like
CREATE table ABC (Processor string, ClockSpeed float ... )
One possibility is to try various types in precise sequence, defaulting to str
if none of those work. E.g.:
def what_type(s, possible_types=((int, [0]), (float, ()))):
for t, xargs in possible_types:
try: t(s, *xargs)
except ValueError: pass
else: return t
return str
This is particularly advisable, of course, when you to use want exactly the same syntax conventions as Python -- e.g., accept '0x7e'
as int
as well as '126'
, and so on. If you need different syntax conventions, then you should instead perform parsing on string s
, whether via REs or by other means.
Depending on the kinds of formats you expect, you could use regexes to detect floats and ints, and then assume that anything which can't be parsed into a number is a string, like so:
import re
FLOAT_RE = re.compile(r'^(\d+\.\d*|\d*\.\d+)$')
INT_RE = re.compile(r'^\d+$')
# ... code to get xml value into a variable ...
if FLOAT_RE.match(xml_value):
value_type = 'float'
elif INT_RE.match(xml_value):
value_type = 'int'
else:
value_type = 'string'
This is just a very basic stab at it - there are more complex formats for expressing numbers that are possible; if you think you might expect some of the more complex formats you'd have to expand this to make it work properly in all cases.
BeautifulSoup is a good HTML/XML parser:
http://www.crummy.com/software/BeautifulSoup/
I'm not entirely sure if it can convert data by type given an xsd/xsl, but it can detect encoding, so there might be a start.
精彩评论