开发者

Data cleansing : user entered database data tools

We got a database with some redundant, bad data. As example some names of articles have an uppercase开发者_如何学C lower-case difference, other a problem of accent, others a missing letter and so on. The idea is to merge the db records that are actually the same.

Is there nice tools out there that allow to easily clean-up a database, ideally this would be not done automatically but would require a user confirmation


There are quite a few tools out there for Data Cleansing. Also there are many more companies that offer data cleansing as a service.

I have performed data cleansing for several large corporations and it is not an easy task, or as straightforward as it seems and de duplicating data is also fraught with all sorts of issues that do not become apparent until you have begun the excercise.

IMHO, if your legacy data is in a relatively poor state and you have no in-house expertise in this (quite specialised) area, I'd look into employing a third party to do this for you as they are likely to perform it faster and at a lower total cost than starting from scratch.

If you want to build the in-house skills to do this then I have done a couple of quick Google searches and seen many software packages on offer, you might want to look into the relative strengths of these against each other for the specific types of data you are looking to cleanse as some will be better in certain areas than others.

Hope this helps, Ollie.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜