Tuesday, June 12, 2012

I was wrong: Text qualifiers do not count toward column width in SSIS 2008

I know, what?

The Error

I'm importing from a text qualified, pipe delimited flat file with a the following columns (Name (Datatype, maxlength)):

  • CustomerIdentifier (WSTR, 120)
  • GroupName (WSTR, 120)
  • UserName (WSTR, 100)
  • UserStatus (WSTR, 15)
  • UserType (WSTR, 20)
My package was chugging along, until it failed.

Error: 2012-06-12 09:38:17.73
   Code: 0xC02020A1
   Source: InsertUserRecords _ Pipe Flat File Source [870]
   Description: Data conversion failed. The data conversion for column "UserName" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
End Error
Error: 2012-06-12 09:38:17.73
   Code: 0xC020902A
   Source: InsertUserRecords _ Pipe Flat File Source [870]
   Description: The "output column "UserName" (888)" failed because truncation occurred, and the truncation row disposition on "output column "UserName" (888)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
End Error
Error: 2012-06-12 09:38:17.74
   Code: 0xC0202092
   Source: InsertUserRecords _ Pipe Flat File Source [870]
   Description: An error occurred while processing file "C:\Scratch\DataLoad\User_USERDATA.txt" on data row 25519.
End Error

I highlighted the important bits.

I opened up the offending file and read the offending line (which I sanitized before posting here).

"0DEADBEEFFACE2AADEADBEEFCAFEFACE78BEEF87"|"Group"|"S99S99S99S999S99S99S0S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99S99"|"Frequent"|"2"|

I highlighted the offending column. The text (from the first 'S' to the last '0') is 99 characters. The column width is 100, so it should have fit. I loaded it up in a hex editor and went through every character to see if there were any spurious nonprinting characters. Nope.

The Problem

Out of curiosity, I deleted a character and tried to import it into a test database. That worked.

Let me reiterate: It worked with 98 characters, with a column width of 100. If you count the text qualifiers (the " character on either side of the string), that's 100 total characters.

I put the missing character back and deleted the " characters. I then changed the Flat File Source such that the UserName column was NOT text qualified, and tried again. 99 characters, no quotes, and it worked. I put the " characters back, turned Text Qualified back on, and it failed.

The Conclusion

The only conclusion I can draw from this is that SSIS initializes a column 100 characters wide, attempts to insert the whole string into it (text qualifiers included), and then strips the text qualifiers out. 

The Solution

Here's how I wound up handling the issue gracefully. I set the UserName column width to 102 characters, to allow for 100 characters and the text qualifiers. In the Data Flow task where I handle that Flat File Source, I have a Derived Column transform adding a new column, ShortenedUserName (WSTR, 100) = Subtstring(Username, 1, 100). In theory there should be no instances where UserName.Length > 100 characters once the text qualifiers have been stripped, but I'm not going to take that chance in production. I'm also going to have it write to an error file whenever it encounters this scenario, so we can double-check the data integrity in the database.


Update

It appears this is only the case if the text qualifier is not set on the file level, but Text Qualified is set to True on the column level.

Second Update

I was wrong. Text qualifier was not set in the General tab. Observe:

The value I circled and put arrows toward needs to be set to the actual text qualifier. If it isn't, well, then all hell breaks loose as I described above. 

No comments:

Post a Comment