Sometimes I need to import large spreadsheets into MySQL. The easy way would be to assume all fields are varchar, but then the database would lose features such as ordering by a numeric field. The hard way would be to manually determine the type of each field to define the schema.
That doesn’t sound much fun so I created the below solution to automatically define a spreadsheet schema by analyzing determine So to address csv2mysql.py automatically parses a CSV file, creates MySQL table with appropriate field types, and then writes CSV data to the table.
Here is an example spreadsheet:
Name Age Height DOB Active
John 29 180.3 1980-11-20 12:30:20
Sarah 25 174.5 1990-01-01 07:12:32
Peter 45 156.4 1965-05-02 23:09:33
Now run the importing script:
$ python csv2mysql.py --user=root --database=test --table=test test.csv
Importing `test.csv' into MySQL database `test.test'
Analyzing column types ...
['varchar(255)', 'integer', 'double', 'date', 'time']
Inserting rows ...
Committing rows to database ...
Done!
And check the results in MySQL:
$ mysql -uroot -p test
mysql> describe test;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
| height | double | YES | | NULL | |
| dob | date | YES | | NULL | |
| active | time | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
6 rows in set (0.01 sec)
mysql> SELECT * FROM test;
+----+-------+------+--------+------------+----------+
| id | name | age | height | dob | time |
+----+-------+------+--------+------------+----------+
| 1 | John | 29 | 180.3 | 30-10-1980 | 12:30:20 |
| 2 | Sarah | 25 | 174.5 | 01-01-1990 | 07:12:32 |
| 3 | Peter | 45 | 156.4 | 22-05-1965 | 23:09:33 |
+----+-------+------+--------+------------+----------+
3 rows in set (0.00 sec)
As you can see above the name has been stored as a varchar, age as an int, height as a double, dob as a date, and active as a time type.
The source code is available on github.