开发者

populating data from xml file to a sqlite database using python

I have a question related to some guidances to solve a problem. I have with me an xml file, I have to populate it into a database system (whatever, it might be sqlite,开发者_开发技巧 mysql) using scripting language: Python.

Does anyone have any idea on how to proceed?

  • Which technologies I need to read further?
  • Which environments I have to install?
  • Any tutorials on the same topic?

I already tried to parse xml using both by tree-based and sax method in other language, but to start with Python, I don't know where to start. I already know how to design the database I need.

Another question, is Python alone possible of executing database ddl queries?


I recommend you study on ElementTree for parsing your XML file into memory (parse it all, then emit it all to a SQL DB, is probably easier, but element-tree also allows incremental operation if your file is huge) -- it's part of the standard Python library as module xml.etree.

I recommend sqlite3 (also in the standard Python library) as the relational DB of choice (if you have a choice), again because it's handy and easy -- the underlying SQLite embedded relational DB is also well documented at its own site. If you need a general tutorial on how Python likes to interface to relational DBs (the "DB-API"), there's a nice one here.

Once you fully understand etree and sqlite3 -- and you don't necessarily need to install anything for either (which is part of their charms;-) -- you're basically all set. (Of course an installation of SQLite itself (commandline and/or GUI tools) so you can look at your DB files and tweak them may be nice, as may graphical XML editors such as oXygen or XMLmindto look at and/or tweak your XML, but neither kind of tool is at all needed, not at all related to using Python rather than other languages for the XML parsing and SQLite writing;-).

And yes, you can perfectly well do CREATE TABLE and other DDL queries from sqlite3 (and any other DB-API compliant Python module, if you choose to use other relational DBs;-).


You can do it all by hand with sqlite3 and xmlstarlet.

  • convert xml to csv
  • import csv to db.

(Maybe it's not your want. but it's handy)

# content of xml file
$ cat artist.xml
<results><artist><id>Ae2300d8b0232c06c</id><name>莎拉 布莱曼(Sarah Brightman)</name><hotSongs><!--freemusic/song/result/Sa3f6b810d7f98646--><song><id>Sa3f6b810d7f98646</id><name>Gloomy Sunday</name><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>Ae2300d8b0232c06c</artistId><album>月光女神</album><duration>227.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>true</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>Bfef909d41922984a</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song><!--freemusic/song/result/Sbb94fa21258a8b51--><song><id>Sbb94fa21258a8b51</id><name>我和你(2008年奥运会主题曲)(You And Me)</name><artist>刘欢</artist><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>A6682d46bd4bdfc84</artistId><artistId>Ae2300d8b0232c06c</artistId><album>我和你 2008北京奥运会主题歌</album><duration>256.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>true</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>Babf3ab3b7ef0e3ed</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song><!--freemusic/song/result/S6a47788bef57ff87--><song><id>S6a47788bef57ff87</id><name>Scarborough Fair</name><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>Ae2300d8b0232c06c</artistId><album>月光女神</album><duration>251.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>true</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>Bfef909d41922984a</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song><!--freemusic/song/result/Sde1aa68da126ddfe--><song><id>Sde1aa68da126ddfe</id><name>Time To Say Goodbye</name><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>Ae2300d8b0232c06c</artistId><album>Symphony: Live In Vienna</album><duration>275.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>true</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>B0187937a07b940f7</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song><!--freemusic/song/result/S7483cc8c44e06bbb--><song><id>S7483cc8c44e06bbb</id><name>Amazing Grace</name><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>Ae2300d8b0232c06c</artistId><album>真爱永恒 - 冬之歌</album><duration>186.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>true</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>B38e85a3056c0381c</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song><!--freemusic/song/result/S2d9683c7230e18de--><song><id>S2d9683c7230e18de</id><name>This Love</name><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>Ae2300d8b0232c06c</artistId><album>月光女神</album><duration>371.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>true</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>Bfef909d41922984a</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song><!--freemusic/song/result/Scfb4989195b44e57--><song><id>Scfb4989195b44e57</id><name>Here With Me</name><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>Ae2300d8b0232c06c</artistId><album>月光女神</album><duration>324.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>true</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>Bfef909d41922984a</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song><!--freemusic/song/result/S7bafc588178b932c--><song><id>S7bafc588178b932c</id><name>La Lune</name><artist>莎拉 布莱曼(Sarah Brightman)</artist><artistId>Ae2300d8b0232c06c</artistId><album>月光女神</album><duration>173.0</duration><canBeDownloaded>true</canBeDownloaded><hasFullLyrics>false</hasFullLyrics><canBeStreamed>true</canBeStreamed><albumId>Bfef909d41922984a</albumId><hasSimilarSongs>true</hasSimilarSongs><hasRecommendation>false</hasRecommendation></song></hotSongs></artist></results>

# convert xml to csv file using xmlstarlet
$ xml sel -t -m '//song' -v 'concat(id, ",", name, ",", duration)' -n artist.xml | sed '$d' >artist.csv

# content of csv file
$ cat artist.csv
Sa3f6b810d7f98646,Gloomy Sunday,227.0
Sbb94fa21258a8b51,我和你(2008年奥运会主题曲)(You And Me),256.0
S6a47788bef57ff87,Scarborough Fair,251.0
Sde1aa68da126ddfe,Time To Say Goodbye,275.0
S7483cc8c44e06bbb,Amazing Grace,186.0
S2d9683c7230e18de,This Love,371.0
Scfb4989195b44e57,Here With Me,324.0
S7bafc588178b932c,La Lune,173.0

# create sqlite database
$ sqlite3 song.db
sqlite> CREATE TABLE song(id, name, duration);

# enter csv mode to import csv file
sqlite> .mode csv
sqlite> .import artist.csv song

# check everything is ok
sqlite> .mode column
sqlite> .header on
sqlite> select * from song;
id                 name           duration
-----------------  -------------  ----------
Sa3f6b810d7f98646  Gloomy Sunday  227.0
Sbb94fa21258a8b51  我和你(200  256.0
S6a47788bef57ff87  Scarborough F  251.0
Sde1aa68da126ddfe  Time To Say G  275.0
S7483cc8c44e06bbb  Amazing Grace  186.0
S2d9683c7230e18de  This Love      371.0
Scfb4989195b44e57  Here With Me   324.0
S7bafc588178b932c  La Lune        173.0


If you are accustomed to DOM (tree) access to xml from other language, you may find useful these standard library modules (and their respective docs):

  • xml.dom
  • xml.dom.minidom

To save tha data to DB, you can use standard module sqlite3 or look for binding to mysql. Or you may wish to use something more abstract, like SQLAlchemy or Django's ORM.


I know it's an old post, but I think it's worth noting for reference to future visitors who need one-time solution: The simplest way I found was to use Libre Office Calc, particularly Data --> XML source feature.

https://help.libreoffice.org/latest/lo/text/scalc/01/xml_source.html

Once imported into ODS, one can save it easily as CSV (i.e. to import to SQLite), but having the Open Document Sheet may itself suffice in many cases user's needs, i.e. using Automatic filter feature to search and filter data. I had no problem processing 15K rows.

If the task has to be done repeatedly, it has to be scripted and Python libraries are a good way to do it. However I had some troubles using xml.etree and in my case LibreOffice was a perfect fit.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜