How do I turn a flat file of data into a queryable data source
I generate files, lets call them .dwrf files, which contain a significant amount of data. Currently we export those to .CSV and the resulting files are large (2GB+). I would like to cut out the export process a开发者_运维知识库nd make the contents of a .dwrf file queryable directly from Excel or other applications.
What I would like to do is write a utility/service - lets call it dwrfMiner - to extract data from the file and pass it on as a datasource and link dwrfMiner to .dwrf files in some way so that Excel recognises it as an external data source.
Any ideas?
While writing an ODBC driver for this is probably overkill, if the format of the files you are working with is known in advance and isn't too hard to translate (it sounds like not considering you are already creating CSVs) then using an ODBC DSN sounds like your best bet.
There are a nice selection of ODBC drivers already built in to Windows (.txt, .csv, .mdb, .xl*, .dbf, Paradox .db, etc etc) and you can obtain other drivers from the web for a lot of common formats.
If the size of the existing format you're exporting to is too onerous (CSV) then the logical point to start is a transformation of your data to something more space-conscious that has ODBC support.
Failing that, your last option is the overkill option (Writing an ODBC driver).
Excel can query external data souces, but beware that Excel (all versions) have hard-limits on the number of rows they can display, per work-book. I think in Excel 2003 the limit is ~65k. It's higher in other versions.
See my question: reporting tool/viewer for large datasets (and I had much less than > 2GB).
I used PHP FlatFile DB to query flat-files in the past
I'd get out gcc and write yourself a full ODBC driver for it. Then you can sit back and use SQL.
You know, if you're bored. ;)
use odbc driver with multithreading
精彩评论