开发者

topics in distributed systems

what do you think is an interesting top开发者_如何学Goic in distributed systems.

i should pic a topic and present it on monday. at first i chose to talk about Wuala, but after reading about it, i don't think its that interesting.

so what is an interesting (new) topic in distributed systems that i can research about.

sorry if this is the wrong place to post.


Take for example a database like Cassandra with the following features:

  • Decentralized: Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
  • Elastic: Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
  • Fault Tolerant: Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
  • Consistent, Eventually: Cassandra implements an eventually consistent model and includes sophisticated features such as Hinted Handoff and Read Repair to minimize inconsistency windows.
  • Highly Availabile: Writes and reads offer a tunable ConsistencyLevel, all the way from "writes never fail" to "block for all replicas to be readable," with the quorum level in the middle.

I think you could hold a semester of lectures on just solving problems encountered creating such a system and/or making it high-performance. As a bonus, the topic is of wide interest (anyone writing applications for the web, basically) and already partly known so you have a good chance to capture the attention of a crowd of developers.


The consensus agreement.

  1. The Byzantine Generals Problem in the synchronous environment.
  2. The whole idea of impossibility proof by FLP for asynchronous systems.
  3. The sincere effort of Lamport to have the best possible solution for the problem in asynchronous leading to PAXOS.


coordinated checkpointing is interesting. To recover from a failure a system must be returned to a correct state. So distributed systems record and recover their state through checkpointing and logging. With checkpointing the system records its state from time to time. And when an error occurs the system reverts to that. A record of the systems state is called also called distributed snapshot. With coordinated checkpointing processes write, in sync, records of all input & output since the previous snapshot. The coordination is necessary because without you have a domino effect where you can't determine what the global state was at any time, you keep having to trace events backwards until you reach the systems initial state.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜