Fun with MySQL – Temporary tables, creating tables, dropping duplicates

Recently, I’ve spent a lot of time in MySQL. It’s been a long time since I’ve done anything in MySQL. In fact, one of the reasons that I like frameworks like Django is that I generally don’t have to do any SQL. Anyway, here are a couple of beginner-level tips for those of you who don’t work with MySQL very often.

It’s nice having a GUI for working with MySQL

Based on the recommendation in the excellent Pentaho Solutions book, I’ve been using SQL Power*Architect as my lightweight tool for accessing MySQL. You can do all sorts of fancy things with it, but I mainly use the Universal SQL Access tool that’s part of Power*Architect. It’s an excellent no-fuss tool that allows you to try out SQL queries. It saves you some typing by allowing you to drag and drop table and field names into the query-builder window and it has a command history, a little bit like the one from the MySQL command-line tool.

Power*Architect can do a whole lot more; from giving you printable maps of your databases to helping forward and reverse engineer database structures. It also works with multiple databases and works on multiple platforms. It runs flawlessly on my mac.

Temporary tables and creating tables from queries

Somehow I missed this on my last excursion into MySQL-land. It’s easy to create temporary or permanent tables from queries. For example, if you have a table called students that looks like this:

ID|Name|Phone
1|Ben|555-1212
2|Dan|555-1213
3|Dell|555-1214
4|Jason|555-1215

it’s easy to duplicate that table, both structure and content in one line of code:

create table students_test select * from students;

Now you have a new table, students_test, that is an exact copy of students. You can now test out your SQL skills on students_test without worrying about damaging the data in the students table. However, once you’re done, you’ll need to remember to issue:

drop table students_test

If  you know that you won’t permanently need the table, you can go one step further and explicitly create a temporary table that will disappear as soon as you disconnect from the server:

create temporary table students_test select * from students;

This table is temporary and will be destroyed as soon as you disconnect from the current session with the server.

For more information on temporary tables and on MySQL in general, the definitive work is Paul Dubois’s MySQL. I have an old edition, so I’m sure that page numbering has changed but look in the index for temporary tables. In my edition, it’s in Chapter 3 – Temporary Tables.

Deleting duplicate rows

In my ongoing struggles with SQL, there are certain recurring problems I can never seem to remember how to solve. One of them is efficient deletion of duplicate (or almost duplicate) rows. Imagine that you have this set of rows for a particular student:

ID DESCR COURSE_ID SECTION TERM CREDITS GRDBASIS
3404 Professionalism Prog 599A 019 5089 0 SUS
3405 Professionalism Prog 599A 019 5089 0 SUS
3406 International Human 690 02A 5089 3 GRD
3407 International Human 690 02A 5089 3 GRD
3408 Corporate Finance 712 08A 5089 3 SUS
3409 Corporate Finance 712 08A 5089 3 SUS
3410 Bankruptcy 716 2A 5089 3 GRD
3411 Bankruptcy 716 2A 5089 3 GRD

It’s not hard to see that something odd happened in the data import. The student record has been duplicated and she’s shown as being enrolled twice in each class. How do you fix this? Well, the first thing I did was to make a copy of the table by using the trick from above. I then located this neat recipe from an old discussion online.  This is from a 2003 article from The Database Journal.

The most elegant solution to this issue was provided in the comments by Paul Swarthout in a comment to the article made six years after the original article was posted. Alter the database to add a unique index:

alter ignore table students_lawcourse
add unique index `unique_index` (descr,course_id,section,term,credits,grdbasis);

This creates an index on the unique combinations of values in the fields that you have specified; note that we left off id, as it is not duplicated. It was probably created as an auto-incrementing primary key at the time the table was created. This is also why a simple select distinct is not sufficient to cull out the duplicate records. Once you have created the index, you’re almost done. Drop the index, as you no longer need it:

drop index `unique_index`;

And there you have it – the table will only have unique records. Obviously, you need to check carefully whether there are any dependencies on the deleted records before you do this – you might wish to practice on a temporary table as described above.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s