开发者

Struggling with SQL BCP (uniqidentifier, default values on columns...)

EDIT: My only pending issue is c) (True and False on file, bit on database, I can't change neither the file nor the database scheme, there's hundreds of terabytes I can't touch).

The system receives a file (hundreds of thousands of them, actually) with a certain format. Things are: a) First type is a uniqidentifier (more on this later) b) On the database, the table's first 4 values are generated by the database (they are related to dates), meaning that those 4 values are not found on the files (all the rest are -and are in order-, even if it's always their representation as text or they are empty) c) Bit values are represented with a False/True on the file.

So, the issue for 1 is that in the text file I receive as input, the uniqidentifier is using brackets. When I tried to generate the file with the format nul options using the bcp command tool, it would make it a sqlchar with 37 characters (which makes no sense to me, since it would either be 36 or 38).

Row separator is "+++\r\n", column separator is "©®©".

How would I go about generating the format files? I've been stuck with this for some time, I never used bcp before and errors I've got don't really tell much ("Unexpected EOF encountered in BCP data-file")

Am I supposed to specify all the columns in the format file or just the ones I desire to read from the files I get?

Thanks!

NOTE: I can't provide the SQL schema since it's for the company I work for. But it's pretty much: smalldate, tinyint tinyint tinyint (this four are generated by the db), uniqidentifier, chars, chars, more varchars, some bits, more varchars, some nvarchar. ALL values, except for those generated by the d开发者_运维知识库b, accept null.

My current problem is with the skipping the first 4 columns.

http://msdn.microsoft.com/en-us/library/ms179250(v=SQL.105).aspx

I followed that guide but somehow it's not working. Here's the changes (I'm just hard-changing column names to keep privacy of the project, even if it sounds stupid)

This is the one generated with bcp (with format nul -c) -note I put it as link 'cause it's not that short- http://pastebin.com/4UkpPp1n

The second one, which is supposed to do the same but ignoring the first 4 columns is in the next pastebin: http://pastebin.com/Lqj6XSbW

Yet it is not working. The error is "Error = [Microsoft][SQL Native Client]The number of fields provided for bcp operation is less than the number of columns on the server.", which was supposed to be the purpose of all that.

Any help will be greatly appreciated.


I'd create a new table with a CHAR(38) for the GUID. Import your data into this staging table, then translate it with CAST(SUBSTRING(GUID, 2, 36) AS UNIQUEIDENTIFIER) to import the staging data into your permanent table. This approach also works well for dates in odd formats, numbers with currency symbols, or generally any kind of poorly-formatted input.

BCP format files are a little touchy, but fundamentally aren't too complicated. If that part continues to give you trouble, one option is to import the whole row as a single VARCHAR(1000) field, then split it up within SQL - if you're comfortable with SQL text processing that is.

Alternately, if you are familiar with some other programming language, like Perl or C#, you can create a script to pre-process your inputs into a more friendly form, like tab-delimited. If you're not familiar with some other programming language, I suggest you pick one and get started! SQL is a great language, but sometimes you need a different tool; it's not great for text processing.

If you're familiar with C#, here's my code to generate a format file. No one gets to make fun of my Whitestone indentation :P

private static string  CreateFormatFile(string filePath, SqlConnection connection, string tableName, string[] sourceFields, string[] destFields, string fieldDelimiter, string fieldQuote)
    {
    string         formatFilePath = filePath + ".fmt";
    StreamWriter   formatFile     = null;
    SqlDataReader  data           = null;
    try
        {
        // Load the metadata for the destination table, so we can look up fields' ordinal positions
        SqlCommand  command = new SqlCommand("SELECT TOP 0 * FROM " + tableName, connection);
                    data    = command.ExecuteReader(CommandBehavior.SchemaOnly);
        DataTable   schema  = data.GetSchemaTable();

        Dictionary<string, Tuple<int, int>>  metadataByField = new Dictionary<string, Tuple<int, int>>();
        foreach (DataRow row in schema.Rows)
            {
            string  fieldName = (string)row["ColumnName"];
            int     ordinal   = (int)row["ColumnOrdinal"] + 1;
            int     maxLength = (int)row["ColumnSize"];
            metadataByField.Add(fieldName, new Tuple<int, int>(ordinal, maxLength));
            }

        // Begin the file, including its header rows
        formatFile = File.CreateText(formatFilePath);
        formatFile.WriteLine("10.0");
        formatFile.WriteLine(sourceFields.Length);

        // Certain strings need to be escaped to use them in a format file
        string  fieldQuoteEscaped     = fieldQuote     == "\"" ? "\\\"" : fieldQuote;
        string  fieldDelimiterEscaped = fieldDelimiter == "\t" ? "\\t"  : fieldDelimiter;

        // Write a row for each source field, defining its metadata and destination field
        for (int i = 1;  i <= sourceFields.Length;  i++)
            {
            // Each line contains (separated by tabs): the line number, the source type, the prefix length, the field length, the delimiter, the destination field number, the destination field name, and the collation set
            string  prefixLen   = i != 1 || fieldQuote == null ? "0" : fieldQuote.Length.ToString();
            string  fieldLen;
            string  delimiter   = i < sourceFields.Length ? fieldQuoteEscaped + fieldDelimiterEscaped + fieldQuoteEscaped : fieldQuoteEscaped + @"\r\n";
            string  destOrdinal;
            string  destField   = destFields[i - 1];
            string  collation;
            if (destField == null)
                {
                // If a field is not being imported, use ordinal position zero and a placeholder name
                destOrdinal = "0";
                fieldLen    = "32000";
                destField   = "DUMMY";
                collation   = "\"\"";
                }
            else
                {
                Tuple<int, int>  metadata;
                if (metadataByField.TryGetValue(destField, out metadata) == false)  throw new ApplicationException("Could not find field \"" + destField + "\" in table \"" + tableName + "\".");
                destOrdinal = metadata.Item1.ToString();
                fieldLen    = metadata.Item2.ToString();
                collation   = "SQL_Latin1_General_CP1_CI_AS";
                }
            string  line = String.Join("\t", i, "SQLCHAR", prefixLen, fieldLen, '"' + delimiter + '"', destOrdinal, destField, collation);
            formatFile.WriteLine(line);
            }

        return formatFilePath;
        }
    finally
        {
        if (data       != null)  data.Close();
        if (formatFile != null)  formatFile.Close();
        }
    }

There was some reason I didn't use a using block for the data reader at the time.


It seems as if it is not possible for BCP to understand True and False as bit values. It's better to either go with SSIS or first replace the contents of the text (not a good idea to create views or anything like that, it is more overhead).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜