Sometimes I need to import large spreadsheets into MySQL.
The easy way would be to assume all fields are varchar, but then the database would lose features such as ordering by a numeric field.
The hard way would be to manually determine the type of each field to define the schema.
That doesn’t sound much fun so I created the below solution to automatically define a spreadsheet schema by analyzing determine So to address
csv2mysql.py automatically parses a CSV file, creates MySQL table with appropriate field types, and then writes CSV data to the table.
Here is an example spreadsheet :
Name Age Height DOB Active
John 29 180.3 1980-11-20 12:30:20
Sarah 25 174.5 1990-01-01 07:12:32
Peter 45 156.4 1965-05-02 23:09:33
Now run the importing script:
$ python csv2mysql.py --user = root --database = test --table = test test.csv
Importing ` test.csv' into MySQL database `test.test'
Analyzing column types ...
[ 'varchar(255)' , 'integer' , 'double' , 'date' , 'time' ]
Inserting rows ...
Committing rows to database ...
Done!
And check the results in MySQL:
$ mysql -uroot -p test
mysql> describe test ;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int( 11) | NO | PRI | NULL | auto_increment |
| name | varchar( 255) | YES | | NULL | |
| age | int( 11) | YES | | NULL | |
| height | double | YES | | NULL | |
| dob | date | YES | | NULL | |
| active | time | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
6 rows in set ( 0.01 sec)
mysql> SELECT * FROM test ;
+----+-------+------+--------+------------+----------+
| id | name | age | height | dob | time |
+----+-------+------+--------+------------+----------+
| 1 | John | 29 | 180.3 | 30-10-1980 | 12:30:20 |
| 2 | Sarah | 25 | 174.5 | 01-01-1990 | 07:12:32 |
| 3 | Peter | 45 | 156.4 | 22-05-1965 | 23:09:33 |
+----+-------+------+--------+------------+----------+
3 rows in set ( 0.00 sec)
As you can see above the name has been stored as a varchar, age as an int, height as a double, dob as a date, and active as a time type.
The source code is available on github .
Please enable JavaScript to view the comments powered by Disqus.