How padding data solved my SQL problem
I was having an issue linking two database tables together because of the data and datatypes.
I had two database tables that I had to link together on a field employee_id .
One table had the Employee IDs as numbers, so it was stored with a datatype of FLOAT.
The other table had data that would soon have Employee IDs that included non-numeric data and some that would occasionally have a leading zero. I wanted to change the datatype in the first table from FLOAT to VARCHAR to ensure compatibility, but wanted to first verify that my existing query would work.
I ultimately needed to link two tables:
- Table with Employee ID stored as a number (currently, FLOAT)
- Table with Employee ID that had non-numerical data, numerical data, and sometimes the numerical data had leading zeroes
The original query, that would no longer work as intended, included:
SELECT ... FROM tableOne
INNER JOIN tableTwo ON tableOne.id = tableTwo.id
While it did run without an error, it gave less rows than it should have.
Bad data makes the query worthless
My first idea was to convert the two fields to an integer; this way the data that had leading zeros would be removed automatically / converted to a number:
SELECT ... FROM tableOne
INNER JOIN tableTwo ON CAST (tableOne.id AS INT) =
CAST (tableTwo.id AS INT)
This caused an error: Conversion failed when converting the varchar value ‘myDataValueHere’ to data type int .This was expected as my initial concern because I’m introducing non-numerical data into the table.
My next idea was to ensure that both fields I’m linking on were of the same type. I needed to change the one that was FLOAT into a VARCHAR:
SELECT ... FROM tableOne
INNER JOIN tableTwo ON CAST (tableOne.id AS VARCHAR) =
CAST (tableTwo.id AS VARCHAR)
While the query ran without errors, I still didn’t have the right number of rows returned. The reason it didn’t work was because some Employee IDs had leading zeroes and others did not. While the datatypes now match, this left the data intact; i.e. it did not remove the leading zeros. Performing a direct link (INNER JOIN) thus did not retrieve all of the results.
The solution
I ended up needing to “pad” the data with leading zeroes. I knew that there should be at most 6 characters in the Employee ID field. By adding 6 zeros at the beginning of the data, and then using the RIGHT() function to take the right-most 6 characters, I was able to successfully link the data tables:
SELECT ... FROM tableOne
INNER JOIN tableTwo ON CAST (tableOne.id AS VARCHAR) =
RIGHT( '000000' + CAST (tableTwo.id AS VARCHAR), 6)
Epilogue
I ended up linking on the employee’s email address instead, so I don’t have to worry about the various types of data that existed in the Employee ID! 😛