开发者

Updating a local sqlite db that is used for local metadata & caching from a service?

I've searched through the site and haven't found a question/answer that quite answer my question, the closest one I found was: Syncing objects between two disparate systems best approach.

Anyway to begun, because there is no RSS feeds available, I'm screen scraping a webpage, hence it does a fetch then it goes through the webpage to scrap out all of the information that I'm interested in and dumps that information into a sqlite database so that I ca开发者_运维技巧n query the information at my leisure without doing repeat fetching from the website.

However I'm also storing various metadata on the data itself that is stored in the sqlite db, such as: have I looked at the data, is the data new/old, bookmark to a chunk of data (Think of it as a collection of unrelated data, and the bookmark is just a pointer to where I am in processing/reading of the said data).

So right now my current problem is trying to figure out how to update the local sqlite database with new data and/or changed data from the website in a manner that is effective and straightforward.

Here's my current idea:

  1. Download the page itself
  2. Create a temporary table for the parsed data to go into
  3. Do a comparison between the official and the temporary table and copy updates and/or new information to the official table

This process seems kind of complicated because I would have to figure out how to determine if the data in the temporary table is new, updated, or unchanged. So I am wondering if there isn't a better approach or if anyone has any suggestion on how to architecture/structure such system?

Edit 1: I'm not sure where to put the additional information, in an comment or as an edit, so I'm going to add it here.

This expands a bit on the metadata in regards of bookmarking, basically the data source can create new data/addition to the current data, so one reason why I was thinking of doing the temporary table idea was so that I would be able to determine if an data source that has been "bookmarked" has any new data or not.


Is it really important to determine if the data in the temporary table is new, updated or unchanged? Do you really need to keep an history of the changes?

NO: don't use the temporary table but just mark as old (timestamp) your old records, don't do updates, and just insert your new data.

YES: your idea seems correct to me but all depends on how much data you need to process each time; i don't think it is feasible with a large amount of data.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜