开发者

Is there a standard DSL for data integrity validation?

I am faced with CSV files which come from clients and whi开发者_运维技巧ch can contain hundreds of thousands of rows. Is there a DSL (or wildly popular library in Java or Python) which can efficiently run calculations on this information, applying various rules to issue warnings and errors (user-configurable, of course)?


Can you imagine a DSL that would do it? How will the rules look like?

Several months ago I worked on such problem - in the end it turned out to be harder than it seemed first.

  1. The first step was obvious - all rows were parsed and placed to the special data stuctures so I could work with them; the ones with missing fields were thrown out.

  2. Every row had its current "strategy" property, and a list of possible strategies (Default action, Ignore, Force, Overwrite etc.)

  3. At first all rows had their "strategy" property set to "Default".

  4. Row processor checked that operation was possible, creating list of errors and warnings.

  5. After processing and analyzing results every row that caused problems was given a list of alternate strategies the user could choose;

So, if there were any problems, user could change the row strategy (or just simply use "Ignore") and go back to step 4.

So, I'm curious at what step would such a DSL work?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜