couchdb - json performance in a messenger application
I have a web based instant messenger application written by PHP. Recently I migrated to couchdb from mysql and i think that it's generally a good idea, it's working perfectly so far. I don't need views etc. Basicly I fetch document by ID.
However I've some doubts about performance. A conversation between two users is stored in a single document. For example between A and B, I've a document, between C and B I've another document etc.
I never delete logs and when a new开发者_JAVA技巧 conversation is initiated between those two users, I decode stored document with json_decode, print out the recent conversation history two the users. When one of them write something new, I add the new chat line at the end of the array (that I obtained from the document), reencode array to json and finally update document.
Do I right? What's the best practice about storing such large arrays in no-sql databases?
I'd model it differently; Use a document for each thing said. {"from":"foo","to":"bar","text":"hey there"}. Thus you only ever make new documents and every document stays very small.
Add a timestamp and then use a view keyed on that timestamp to reconstruct the conversation.
You'll want to use the server's time to ensure the proper ordering, so I'd recommend you update via an update handler (http://wiki.apache.org/couchdb/Document_Update_Handlers) and add the timestamp there.
A workaround is to occasionally bundle old messages into CouchDB attachments. Those will not be visible when you query by document ID.
For example, a conversation document:
{ "_id": "alice_bob"
, "_rev": "123-abcdef"
, "messages":
[ "alice: Hi, Bob!"
, "alice: Are you there?"
, "bob: Yes, what's up?"
, // ... etc.
, "bob: Thanks!"
]
}
For example, if the conversation length is greater than 200 messages, then split it into two arrays:
- 150 of the oldest messages (
"alice: Hi, Bob", and the next 149)
- 50 of the latest messages: (49 messages, up to and including
"bob: Thanks!"
)
Archive the old messages in an attachment, perhaps with a timestamp.
{ "_id": "alice_bob"
, "_rev": "123-abcdef"
, "_attachments":
{ "2011-09-19T17:29:17.293Z.json":
{ "content_type": "application/json"
, "data": /* 150 message JSON string goes here */
}
}
, "messages": [ /* latest 50 messages go here */ ]
}
When you fetch /db/alice_bob
you will not get the attachment data; however you can fetch and even delete attachments directly from this URL.
/db/alice_bob/2011-09-19T17:29:17.293Z.json
Note, this is mostly a work-around. The advantage is that you will not have to change your PHP code at all. (Archiving messages can be a background process.) But In the long-term, Robert's technique is more scalable and correct.
精彩评论