Need suggestion on reading huge XLs with validating the data
I have a requirement where client uploads spread sh开发者_C百科eet containing thousands of rows. different columns of a row have different data type and the data must comply with some validation rules.e.g.
below is a sample file structure:
(Header - Colume_name,Variable_type,field_size,i/p mask,required_field,validation_Text)
(P/N,String,20,none,yes,none)
(qty,Integer,10,none,yes,none)
(Ship_From,String,20,none,yes,none)
(Request_Date,Date,MM/DD/YY ,yes,none)
(Status,String,10,none,yes,Failed OR Qualified)
while reading the xl sheet,I need to validate the data against the above constraints and in case of any error in the data, I need to store the error and inform the customer.
Please let me know the best possible approach maintaining the performance of the system.
Any early responses will be much appreciated.
Thanks, Ashish Gupta
If I understand your question, you would like to read a file of validation rules such as the sample above. You would like to compile the rules such that they would read a large Excel spreadsheet (or is it a CSV file?) and perhaps print out a message for every line that is deemed invalid.
It seems like a two-pass process: 1) Validation and compilation of the validation file and 2) Compilation of the output of pass 1 and applying it to the Excel file.
You could approach the field validation in any of several ways, depending on your skills and inclinations.
- Develop VBA code to read the validation file. Then write a separate macro to validate each line
- Write a parser in your favorite language that reads in the validation file. Add some columns to the read-in Excel spreadsheet with fields such as Column name (e.g., Qty), type (e.g., Integer), required (e.g., true). Then have Excel or OpenOffice highlight invalid lines
- Have lex and yacc generate Java or C++ a parser to scan the validation file and output BNF. Then have another lex and yacc file read in the output from the previous step and have it validate the Excel file.
You indicated POI
on your tag, so I'm thinking that you will want to generate Java code.
Of course, you could also write a one-time program to do all of this meta-compiling and compiling, but this would be a brittle process.
If you have any freedom to specify the validation file, you might want to make it an .XSD file because there are automated tools to make its scanning much simpler. There are tools to determine whether the XML file were valid, as well as compilers that can turn it into Java.
(A thought came to mind when I was reading your validation file. How will you separate one part from another? For example, if you read in P/N, Qty, Request_Date, Ship_From, Status, P/N
, is that one part with two P/N or one complete part and one with several required parts missing?)
My first thought was to have Excel do this validation, as Rajah seems to suggest too. Built-in functionality and/or VBA should be able to handle these requirements. If you need to handle this in Java, I'd go for the XML approach.
Cheers, Wim
I heard about a friend validating his spreadsheets using JBOSS DROOLS: http://www.jboss.org/drools
I have a XML based excel validator built on top of POI. You just need to specify which data you need to validate in excel, the java api does the validation & returns the error message if not valid.
Eg:
<data rowNumber="2" columnNumber="2" dataType="string" > <mandatory errorMessage="Name label is missing">Y <value ignoreCase="true" errorMessage="Name label value is not matching.">Name</value>
The above is a simple validation for a plain text field, it has additional validations too, please let me know if you are intrested?
精彩评论