开发者

practical problems of transforming data in data warehouse

i need to explain the practical problems that might be encountered when transforming their transactional (and other) data from their diverse sources into the Data Warehouse. according to my knowledge this is about cleansing and scr开发者_如何学Cubbing data. if anyone knows about any practical problem please help me.thanks for your help


That's a broad topic, but I'll offer a few good starting points.

For starters, think about history. If a transaction updates some data point, do you need to apply that retroactively, or do you need to remember what the value was at any given point in time. For example, suppose you have a monthly report of customers by city, and one of your customers moves. How should the DW reflect that.

Think about data acceptance. Is every input row a good input? For example, if you're dealing with web data, there are crawlers and spammers that you might not want to count the same as you count user traffic.

Think about data synchronization. Do all your inputs use the same keys? Do you know how to translate between them? Does Team A mean the same thing by "cust_id" as Team B does? A project glossary is very helpful here.

Think about localization. Are you inputs all in the same time zone? Do they all use the same calendar system? Do you need to handle unicode?

Think about reporting. Are the data you're capturing able to answer the questions people will ask of the DW? If not, how can you capture data that can?

Think about presentation. Should you be showing customers the same data you're using for internal reporting? Does finance need to see a different slice of the data than marketing?

This really only scratches the surface of the issues that come up on a major DW project. I would refer you to Ralph Kimball's assorted books on Data Warehousing for a more in depth discussion of problems and solutions. Hope this helps you get started.


You give the answer in your question.

According to my knowledge this is about cleansing and scrubbing data.

And you are correct. Cleansing data means that you have a company-wide list of clean element attributes, and a mapping that changes the unclean elements into clean elements.

Processing the data against the clean element attributes is a piece of cake compared to creating the company-wide list of clean element attributes.

You have to get people from different departments to agree on what data to warehouse, and to agree on what each element means. This is a difficult sociological problem. It's not a terribly hard technical problem.

Good luck getting your company-wide list of clean element attributes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜