What are some "mainstream" lightweight alternatives to storing files in .csv format? [duplicate]

2023-02-26 20:32 问答作者：

This question already has answers here: Alternative to CSV? (5 answers) Closed 5 years ago.

I'm on a project which heavily favors the use of .csv files for data storage. I see many issues with using .csv, especially for storing relational data. Parsing .csv is generally a pain, particu开发者_JAVA百科larly when using ad-hoc column assigments.

I've advocated the use of XML and minimal databases such as SQLite, but I'm looking for "faster, better, cheaper" alternatives.

What are some other, "mainstream" lightweight alternatives to .csv files?

Also, what about CouchDB. How does it compare to SQLite in terms of lightweight-ness?

EDIT: I missed it. This question has been asked before.

I would argue there is no direct replacement for a CSV file. CSV is a flat file index-oriented format. It doesn't matter if you replace commas with pipes or whatnot. It's the same thing with slightly different rules.

With that being said, I often opt for SQLite when the data is in my control.

Using SQLite consistently lends to using the same tooling, can be used as either an ad-hoc store or a relational model, has a 'step up' plan to a "standalone" RDBMS, provides DQL "for free" (which is a big plus for me), etc. Unless space is a real issue or there isn't support for the data-access, why not? (Modern Firefox also uses SQLite).

(There are a number of object-database out there, such as DB4O as well -- or even simpler key/value hierarchical stores, etc. Not trying to say SQLite is the only way to obtain relationships in a micro/embedded database.)

One down-side over say, XML is that special tooling (sqlite/adapter) is required. XML, while not the most human-friendly format, can be edited just fine in notepad. Additionally, there is no extra overhead (fragmentation or structure) in XML beside the markup/data itself and XML is generally quite amendable to compression. There are also many libraries to map an entire object graph to XML (and thus maintain relationships) so that might be a nice feature.

Other formats like JSON are also out there -- but if the format is opaque then it doesn't really make a difference over XML (it's more of a matter of tooling support).

So... "it depends".

It looks like YAML is relatively small compared to formats such as XML, but slightly more descriptive than JSON (it's a superset). It's another candidate I'll consider.

It's all about use-case.

My rule of thumb: use SQLite if there are dependencies or relations between two pieces of data; use CSV (or some other "flat" format) if it's just flat data files. The simplest thing that just works is often the most reliable solution as well.

(Note: Ensure the CSV is well formed. Nobody likes having to hack around bad CSV implementations.)

HDF5 is a good choice for storing large tabular datasets, if you do not require concurrent writes.

In Python, Pandas + PyTables are very easy to use. Example from the Pandas documentation:

In [259]: store = HDFStore('store.h5')

In [260]: print(store)
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty
Objects can be written to the file just like adding key-value pairs to a dict:

In [261]: np.random.seed(1234)

In [262]: index = date_range('1/1/2000', periods=8)

In [263]: s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [264]: df = DataFrame(randn(8, 3), index=index,
   .....:                columns=['A', 'B', 'C'])
   .....: 

In [265]: wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'],
   .....:            major_axis=date_range('1/1/2000', periods=5),
   .....:            minor_axis=['A', 'B', 'C', 'D'])
   .....: 

# store.put('s', s) is an equivalent method
In [266]: store['s'] = s

In [267]: store['df'] = df

In [268]: store['wp'] = wp

# the type of stored data
In [269]: store.root.wp._v_attrs.pandas_type
Out[269]: 'wide'

In [270]: store
Out[270]: 
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[8,3])  
/s             series       (shape->[5])    
/wp            wide         (shape->[2,5,4])

XML is designed to be mainstream and relativey "lightweight". JSON is another popular choice but much more suited to object modeling as opposed to data storage.

MySQL is a good option if you need relational querying capabilities.

继续阅读：csv data-storage local-storage storage

What are some "mainstream" lightweight alternatives to storing files in .csv format? [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？