Suitability of Amazon SimpleDB for large temporal data sets eminating from thousands of separate devices

2023-03-10 21:09 问答作者：

I'm trying to establish whether Amazon SimpleDB is suitable for a subset of data I have.

I have thousands of deployed autonomous sensor devices recording data.

Each sensor device essentially reports a couple of values four times an hour each day, over months and years. I need to keep all of this data for historic statistical analysis. Generally, it is write once, read many times. Server-based applications run regularly to query the data to infer other information.

The rows of data today, in SQL look something like this:

(id, device_id, utc_timestamp, value1, value2)

Our existing MySQL solution is not going to scale up much further, with tens of millions of rows. We query things like "tell me the sum of all the value1 yesterday" or "show me the average of value2 in the last 8 hours". We do this in SQL but can happily change to doing it in code. SimpleDBs "eventual consistency" appears fine for our开发者_如何学Python puposes.

I'm reading up all I can and am about to start experimenting with our AWS account, but it's not clear to me how the various SimpleDB concepts (items, domains, attributes, etc.) relate to our domain.

Is SimpleDB an appropriate vehicle for this and what would a generalised approach be?

PS: We mostly use Python, but this shouldn't matter when considering this at a high level. I'm aware of the boto library at this point.

Edit:

Continuing to search on solutions for this I did come across Stack Overflow question What is the best open source solution for storing time series data? which was useful.

Just following up on this one many months later...

I did actually have the opportunity to speak to Amazon directly about this last summer, and eventually got access to the beta programme for what eventually became DynamoDB, but was not able to talk about it.

I would recommend it for this sort of scenario, where you need a primary key and what might be described as a secondary index/range - eg timestamps. This allows you much greater confidence in search, ie "show me all the data for device X between monday and friday"

We haven't actually moved to this yet for various reasons but do still plan to.

http://aws.amazon.com/dynamodb/

I my opinon, Amazon SimpleDb as well as Microsoft Azure Tables is a fine solution as long as your queries are quite simple. As soon as you trying to do stuff that's absolutely a non-issue on relational databases like aggregates you begin to run into trouble. So if you are going to do some heavy reporting stuff it might get messy.

It sounds like your problem may be best handled by a round-robin database (RRD). An RRD stores time variable data in such a way so that the file size never grows beyond its initial setting. It's extremely cool and very useful for generating graphs and time series information.

I agree with Oliver Weichhold that a cloud based database solution will handle the usecase you described. You can spread your data across multiple SimpleDB domains (like partitions) and stored your data in a way that most of your queries can be executed from a single domain without having to traverse the entire database. Defining your partition strategy will be key to the success of moving towards a cloud based DB. Data set partitioning is talked about here

继续阅读：amazon-simpledb temporal

Suitability of Amazon SimpleDB for large temporal data sets eminating from thousands of separate devices

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？