Monday, November 25, 2013

MYSQL using numbers in VARCHAR fields

Recently someone asked me to help out with some coding parts for a startup they where beginning. Some of the things they needed help with where infrastructure related and some where more in data management and some query modeling and optimization on an already existing datamodel done in a MySQL database.

This gave me some options to start exploring again some things in MySQL which is not my standard database to develop in and use as I am more focusing on Oracle databases. Part of the fun was trying to find out why some queries where not working as expected, main reason for this was that the person who originally designed the datamodel had a love for the VARCHAR datatype.

In the database we do have a table named ovsocial_inst which holds an ID column named inst_m_id. For some reason the original datamodel developer created the column as a VARCHAR even though it is only holding numbers. Now some funny effects do happen when you try to sort.

When you execute the following query:

SELECT 
      inst_m_id
FROM 
    ovsocial_inst
ORDER BY 
        inst_m_id

you will get a result something like:

inst_m_id
1
10
2
3
4
5
6
7
8
9
This is somehow a little strange as long as you do not realize that the values of inst_m_id are treated as text. When you consider it to be text everything makes sense and the query is doing exactly what you ask it to do. However, we do not want it to behave in this manner, we do want it to treat the numbers as numbers even though they are stored in a VARCHAR column. To do so in a sort we can use the following query which converts the VARCHAR into an unsigned.

SELECT 
      inst_m_id
FROM 
    ovsocial_inst
ORDER BY
        convert(inst_m_id,unsigned) ASC

Now you will get a result as you expect;

inst_m_id
1
2
3
4
5
6
7
8
9
10

Now imagine another scenario. we know that the table ovsocial_inst will be relative small so to assign a new ID to a record we would like to query the table for the max inst_m_id + 1 and we would like to use that value to insert a new record. When you do not consider the fact that the values are written to the table in a VARCHAR manner this process will work until you have 10 records in your table. Reason for this is that if you have 9 records in your table the highest value (the latest in the sort) is 9. This means that the new ID is 10 (9+1). When we hit 10 records something strange will happen. when we hit 10 records the highest value or at least the latest in the sort will be 9. This results in a new ID of 9+1 instead of 10+1.

When your ID field inst_m_id would have been a proper column for holding numbers you would use a query like the one below;

SELECT 
      (max(inst_m_id)+1) as new_id
FROM 
    ovsocial_inst

however, the above will give the issue as soon as you hit more then 10 records. To prevent this you will need to construct your query like the one below;

SELECT 
      max(convert(inst_m_id,unsigned))+1 as new_id
FROM 
    ovsocial_inst

In this way you will ensure that it keeps working even if you hit the 10 marker.

No comments: