|
MySQL Import and Normalization
By Antje Binas-Holz
Last Update:
Saturday, January 10, 2004
Please refer following sides in order to get more information about normalizing:
Database Journal
About
InformIT
Lots of articles and books describe data normalization. Information given there often scare people instead of offering some help. I am not going to dive into another theoretical discussion about different normalization forms and how to achieve them. This article is trying to solve a real world problem people are often faced with instead: How can I import a single Excel like spreadsheet into one or more MySQL tables representing the first normal database form?
Normalization Basics
Data normalization is an approach to design databases in order to make sure data can as easily as possible maintained over time. There are three basic normal forms with appropriate rules:
- First normal form: Eliminate repeating information and create separate tables for related data.
- Second normal form: No non-key attributes depend on a portion of the primary key (all fields in a table are entirely related to a primary key).
- Third Normal Form: No attributes depend on other non-key attributes (See if more fields exist that can be broken down further and that are not depending on a key).
Database purists, or should I say nerds, are trying to normalize data as much as possible, which is nice but sometimes not very helpful because there is another kind of computer users giving them a very hard time:
Spreadsheet Gurus
Most of them are Excel experts. They are trying to solve everything with their believed Microsoft spreadsheet software including data maintenance. I must admit that Excel is indeed a very mighty software covering a lot of business problems. The big disaster of people working with Excel is their approach to organize all data by a single spreadsheet.
The Problem
The problem many application programmers are faced with, at least from what I know, is following: If they ask their customers for data for application they have to develop and maintain, customers answer:
"No problem, I have an Excel spreadsheet!".
Well, I must admit that this is not an ideal situation, but nothing, which could not be solved.
The Solution
Excel has the built in ability to export the spreadsheet into a CSV file.
This is the first important step to do. CSV files are supported by all databases I know including MySQL. If you import that file into MySQL, e.g using MySQL Data Import Utility ImpSQL, you will end up with a flat table containing all necessary data which need to be translated at least into first normal database form. The translation requires some SQL knowledge, but there is another way:
ImpSQL has been extended to be able to create one master table and different related child tables from a single CSV file automatically. If you have an appropriate CSV file you can try it right here. Just type in the name of the CSV file you want to upload to the import utility and the list of columns to be stored into child tables and you will receive the resulting file containing all SQL statements and data in return:
Note: Your data will be converted but not stored at the server in any way. All temporary files necessary for conversion will be immediately deleted after finishing the request.
Sample
Assume you have an Excel spreadsheet containing different hardware with columns Group1, Group2, Prod_ID, Caption, OS and Price.

Its pretty evident that columns Group1, Group2 and OS are candidates for being stored into separate tables because they contain repeating data. Just pointer to appropriate entries should remain within the master table test. In order to convert the flat spreadsheet into the first normal form it has to be saved as CSV first. The resulting file test.csv looks like follows:
Group1;Group2;Prod_ID;Caption;OS;Price
Komponenten;Grafikkarten;N350018;PINE SIS 315E;Win 98/200;29,00
Komponenten;Grafikkarten;N35Z018;XFX GF FX5600 AGP8X;Win 98/NT/;189,00
Komponenten;Grafikkarten;N35Z020;XFX GF FX5200 AGP8X;Win 9x/NT/;79,00
Komponenten;Grafikkarten;N35Z021;XFX GF FX5200 AGP8X;Win 98/200;89,00
Komponenten;Grafikkarten;N35Z026;XFX GEFORCE4 MX440SE AGP-4X;Win 98/200;59,00
Peripherie;Projektoren;M35Z183;PB8220 DLP-PROJECTOR;WIN/MAC;2449,00
Peripherie;Scanner;M35Z200;S2W 5450U;;89,00
Peripherie;Monitore;M35Z203;FP991;WIN/MAC;749,00
Peripherie;Projektoren;M35Z218;PB8230 DLP-Projektor;WIN/MAC;2809,00
Peripherie;Projektoren;M35Z219;PB7200 DLP-PROJECTOR;;2939,00
Peripherie;Projektoren;M35Z220;PB7220 DLP-PROJECTOR;;3339,00
Peripherie;Projektoren;M35Z221;PB2120 DLP-PROJECTOR;;1539,00
Peripherie;Projektoren;M35Z222;PB8120 DLP-PROJECTOR;WIN/MAC;1739,00
Peripherie;Projektoren;M350061;SL705S DLP-PROJECTOR;WIN/MAC;1329,00
Peripherie;Projektoren;M350062;7765PA DLP-PROJECTOR;WIN/MAC;1829,00
Peripherie;Projektoren;M350063;DX550 DLP-PROJECTOR;WIN/MAC;2339,00
Peripherie;Scanner;M350073;S2W 7400UT;Win 98/200;209,00
...
This file can be imported to MySQL using the form above. Select the file using the Browse button and type in the list of columns with redundant information into the Child Columns edit control:

The resulting file downloaded afterwards as test.txt will look like follows:
#
# MySQL-Dump created by SQL Database Import Utility
# HTTP-Host: www.sqldbu.com
# Date: 2004-01-10 15:43:08
# --------------------------------------------------------
#
#
CREATE TABLE test (Group1 INT(10) NOT NULL, Group2 INT(10) ...
INSERT INTO test SET Group1=1,Group2=1,Prod_ID='N350018', ...
INSERT INTO test SET Group1=1,Group2=1,Prod_ID='N35Z018', ...
...
#
# Structure Definition for Child Table 'Group1'
#
CREATE TABLE Group1 (Group1 INT(10) unsigned NOT NULL, ...
#
# Data for Child Table 'Group1'
#
INSERT INTO Group1 SET Group1=1, CAPTION='Komponenten';
INSERT INTO Group1 SET Group1=2, CAPTION='Peripherie';
#
# Structure Definition for Child Table 'Group2'
#
CREATE TABLE Group2 (Group2 INT(10) unsigned NOT NULL, ...
#
# Data for Child Table 'Group2'
#
INSERT INTO Group2 SET Group2=1, CAPTION='Grafikkarten';
INSERT INTO Group2 SET Group2=2, CAPTION='Projektoren';
INSERT INTO Group2 SET Group2=3, CAPTION='Scanner';
INSERT INTO Group2 SET Group2=4, CAPTION='Monitore';
#
# Structure Definition for Child Table 'OS'
#
CREATE TABLE OS (OS INT(10) unsigned NOT NULL, ...
#
# Data for Child Table 'OS'
#
INSERT INTO OS SET OS=1, CAPTION='Win98/2000';
INSERT INTO OS SET OS=2, CAPTION='Win98/NT';
INSERT INTO OS SET OS=3, CAPTION='Win98/NT/XP';
INSERT INTO OS SET OS=4, CAPTION='WIN/MAC';
INSERT INTO OS SET OS=5, CAPTION='N/A';
INSERT INTO OS SET OS=6, CAPTION='Win98/ME/XP';
INSERT INTO OS SET OS=7, CAPTION='Win98/2000/XP';
Just import test.txt and the job is done, first normal form has been created.
Connection based mode
When using the ImpSql utility along with a valid MySQL connection data will be straight imported into the appropriate table. It could be demonstrated here as well but how should you receive the resulting (already imported) MySQL table? You better download the project archive file impsql-10.zip in order to unpack and run ImpSql at your machine. See MySQL Data Import for more detailed information.
{
|