Struggling with SQL BCP (uniqidentifier, default values on columns...)
EDIT: My only pending issue is c) (True and False on file, bit on database, I can't change neither the file nor the database scheme, there's hundreds of terabytes I can't touch).
The system receives a file (hundreds of thousands of them, actually) with a certain format. Things are: a) First type is a uniqidentifier (more on this later) b) On the database, the table's first 4 values are generated by the database (they are related to dates), meaning that those 4 values are not found on the files (all the rest are -and are in order-, even if it's always their representation as text or they are empty) c) Bit values are represented with a False/True on the file.
So, the issue for 1 is that in the text file I receive as input, the uniqidentifier is using brackets. When I tried to generate the file with the format nul options using the bcp command tool, it would make it a sqlchar with 37 characters (which makes no sense to me, since it would either be 36 or 38).
Row separator is "+++\r\n", column separator is "©®©".
How would I go about generating the format files? I've been stuck with this for some time, I never used bcp before and errors I've got don't really tell much ("Unexpected EOF encountered in BCP data-file")
Am I supposed to specify all the columns in the format file or just the ones I desire to read from the files I get?
Thanks!
NOTE: I can't provide the SQL schema since it's for the company I work for. But it's pretty much: smalldate, tinyint tinyint tinyint (this four are generated by the db), uniqidentifier, chars, chars, more varchars, some bits, more varchars, some nvarchar. ALL values, except for those generated by the d开发者_运维知识库b, accept null.
My current problem is with the skipping the first 4 columns.
http://msdn.microsoft.com/en-us/library/ms179250(v=SQL.105).aspx
I followed that guide but somehow it's not working. Here's the changes (I'm just hard-changing column names to keep privacy of the project, even if it sounds stupid)
This is the one generated with bcp (with format nul -c) -note I put it as link 'cause it's not that short- http://pastebin.com/4UkpPp1n
The second one, which is supposed to do the same but ignoring the first 4 columns is in the next pastebin: http://pastebin.com/Lqj6XSbW
Yet it is not working. The error is "Error = [Microsoft][SQL Native Client]The number of fields provided for bcp operation is less than the number of columns on the server.", which was supposed to be the purpose of all that.
Any help will be greatly appreciated.
I'd create a new table with a CHAR(38)
for the GUID. Import your data into this staging table, then translate it with CAST(SUBSTRING(GUID, 2, 36) AS UNIQUEIDENTIFIER)
to import the staging data into your permanent table. This approach also works well for dates in odd formats, numbers with currency symbols, or generally any kind of poorly-formatted input.
BCP format files are a little touchy, but fundamentally aren't too complicated. If that part continues to give you trouble, one option is to import the whole row as a single VARCHAR(1000)
field, then split it up within SQL - if you're comfortable with SQL text processing that is.
Alternately, if you are familiar with some other programming language, like Perl or C#, you can create a script to pre-process your inputs into a more friendly form, like tab-delimited. If you're not familiar with some other programming language, I suggest you pick one and get started! SQL is a great language, but sometimes you need a different tool; it's not great for text processing.
If you're familiar with C#, here's my code to generate a format file. No one gets to make fun of my Whitestone indentation :P
private static string CreateFormatFile(string filePath, SqlConnection connection, string tableName, string[] sourceFields, string[] destFields, string fieldDelimiter, string fieldQuote)
{
string formatFilePath = filePath + ".fmt";
StreamWriter formatFile = null;
SqlDataReader data = null;
try
{
// Load the metadata for the destination table, so we can look up fields' ordinal positions
SqlCommand command = new SqlCommand("SELECT TOP 0 * FROM " + tableName, connection);
data = command.ExecuteReader(CommandBehavior.SchemaOnly);
DataTable schema = data.GetSchemaTable();
Dictionary<string, Tuple<int, int>> metadataByField = new Dictionary<string, Tuple<int, int>>();
foreach (DataRow row in schema.Rows)
{
string fieldName = (string)row["ColumnName"];
int ordinal = (int)row["ColumnOrdinal"] + 1;
int maxLength = (int)row["ColumnSize"];
metadataByField.Add(fieldName, new Tuple<int, int>(ordinal, maxLength));
}
// Begin the file, including its header rows
formatFile = File.CreateText(formatFilePath);
formatFile.WriteLine("10.0");
formatFile.WriteLine(sourceFields.Length);
// Certain strings need to be escaped to use them in a format file
string fieldQuoteEscaped = fieldQuote == "\"" ? "\\\"" : fieldQuote;
string fieldDelimiterEscaped = fieldDelimiter == "\t" ? "\\t" : fieldDelimiter;
// Write a row for each source field, defining its metadata and destination field
for (int i = 1; i <= sourceFields.Length; i++)
{
// Each line contains (separated by tabs): the line number, the source type, the prefix length, the field length, the delimiter, the destination field number, the destination field name, and the collation set
string prefixLen = i != 1 || fieldQuote == null ? "0" : fieldQuote.Length.ToString();
string fieldLen;
string delimiter = i < sourceFields.Length ? fieldQuoteEscaped + fieldDelimiterEscaped + fieldQuoteEscaped : fieldQuoteEscaped + @"\r\n";
string destOrdinal;
string destField = destFields[i - 1];
string collation;
if (destField == null)
{
// If a field is not being imported, use ordinal position zero and a placeholder name
destOrdinal = "0";
fieldLen = "32000";
destField = "DUMMY";
collation = "\"\"";
}
else
{
Tuple<int, int> metadata;
if (metadataByField.TryGetValue(destField, out metadata) == false) throw new ApplicationException("Could not find field \"" + destField + "\" in table \"" + tableName + "\".");
destOrdinal = metadata.Item1.ToString();
fieldLen = metadata.Item2.ToString();
collation = "SQL_Latin1_General_CP1_CI_AS";
}
string line = String.Join("\t", i, "SQLCHAR", prefixLen, fieldLen, '"' + delimiter + '"', destOrdinal, destField, collation);
formatFile.WriteLine(line);
}
return formatFilePath;
}
finally
{
if (data != null) data.Close();
if (formatFile != null) formatFile.Close();
}
}
There was some reason I didn't use a using
block for the data reader at the time.
It seems as if it is not possible for BCP to understand True and False as bit values. It's better to either go with SSIS or first replace the contents of the text (not a good idea to create views or anything like that, it is more overhead).
精彩评论