开发者

Convert fixed length text file into SQL

I need to convert a fixed length text file into a MySQL Table.

My biggest problem is that multiple cells are contained on each line, and this is how the file is sent to me, and the main reason why I want to convert it.

The cells are all of a specific length; however all are included on the one line.

For example the first 3 positions (1 - 3) of a line are the IRT, the next three positions (4 - 6) are the IFTC the next 5 positions (7 - 11) are the FSC, etc.

As the file ca开发者_如何学JAVAn contain up to 300 lines of records, I need an easy way to import it straight into the SQL Tables.

I have been searching the net for hours trying to find a solution, however without comma separation I haven't been able to find a working solution yet.

I would like to code this solution in PHP, if possible as well. And am willing to do the long yards of working out how to use the function required to do this if someone could give me the function name, I don't expect people to write my code out for me.


File:

testfile.txt (4 rows)

AAA11111xx
BBB22222yy
CCC33333zz
DDD 444 aa

Table:

CREATE TABLE TestLoadDataInfile
( a VARCHAR(3)
, b INT(5)
, c CHAR(2)
) CHARSET = latin1;

Code:

LOAD DATA INFILE 'D:\\...\\testfile.txt'
INTO TABLE TestLoadDataInfile
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\r\n' ;

Result:

mysql> SELECT * FROM TestLoadDataInfile ;
+-----+-------+----+
| a   | b     | c  | 
+-----+-------+----+ 
| AAA | 11111 | xx | 
| BBB | 22222 | yy | 
| CCC | 33333 | zz | 
| DDD |   444 | aa | 
+-----+-------+----+ 

The LOAD DATA INFILE documentation is not very good at this point (fixed-size fields). Here's the related parts:

  • If the FIELDS TERMINATED BY and FIELDS ENCLOSED BY values are both empty (''), a fixed-row (nondelimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. For TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.

LINES TERMINATED BY is still used to separate lines. If a line does not contain all fields, the rest of the columns are set to their default values. If you do not have a line terminator, you should set this to ''. In this case, the text file must contain all fields for each row.

Fixed-row format also affects handling of NULL values, as described later. Note that fixed-size format does not work if you are using a multi-byte character set.


NULL handling

With fixed-row format (which is used when FIELDS TERMINATED BY and FIELDS ENCLOSED BY are both empty), NULL is written as an empty string. Note that this causes both NULL values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.


Some cases are not supported by LOAD DATA INFILE:

  • Fixed-size rows (FIELDS TERMINATED BY and FIELDS ENCLOSED BY both empty) and BLOB or TEXT columns.

  • User variables cannot be used when loading data with fixed-row format because user variables do not have a display width.


You probably won't like it very much, but there really isn't an easy way to do what you're after. A long time ago (circa 1991), I wrote a tool, DBLDFMT (for 'database load format') to deal with such fixed-length, non-delimited files. It is tuned to generating the load format preferred by Informix databases (so it uses a pipe symbol by default to separate the fields, but of course you can tune that with a command line option or an environment variable). It can, however, create delimited data which you can then process more normally, probably using the LOAD DATA INFILE command.

Contact me by email (see my profile) if you want the source code for DBLDFMT. (The current version, 3.17 from 2008, does not have direct support for CSV output. It would not be hard to add it. You can, more or less, achieve the required effect, but it should be a lot easier than it is.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜