import large excel csv data into mysql using php

In this post we will try to find out that what is the best/fastest way to upload a large excel csv data into a MySQL table. In one of my present projects there is a requirement to upload of millions of records from several csv files into a database. The files are uploaded to the server with the help of Ajax controlled browser/server-transactions. On the server the file contents should be checked and then transferred into an appropriate database table. In my case a MyISAM table.

This is the first time that I had to deal with such big record numbers on a MySQL database. You have to be very care full while importing data when they came from other sources and were supplied more continuously than in form of a one step csv file import.

Although the data structure and table structures for the import are flat the pure amount of data makes it necessary to think a little about the way of transferring them from file lines to records of a MySQL database table with the help of a PHP server program. Below, I really do not present fundamentally new insights - a MySQL expert will not find anything new. But maybe others who for the first time stumble into a big data import problem may find some of the hints below useful.

1. A simple, but time consuming standard approach for a csv import with PHP

In the past when I had to import data from files I often used a sequence of steps comprising
1. the opening of the file for reading,
2. the use of fgetcsv() to read in one line,
3. the usage my own and problem specific perform check methods of the identified fields,
4. the processing of a sequence of one INSERT SQL command for each of the file lines.

for achieving this step I have written my own logic to convert CSV file into PHP array collection.

function csvToArray($filePath='', $delimiter='|', $header = null, $skipLines = -1) { 
	$lineNumber = 0; $dataList = array(); 
	if (($handle = fopen($filePath, 'r')) != FALSE) { 
		while (($row = fgets($handle, 4096)) !== false) { 
			if($lineNumber > $skipLines) { 
				$items = explode($delimiter, $row); $record = array(); 
				for($index = 0, $m = count($header); $index < $m; $index++){ 
					//If column exist then and then added in data with header name 
					if(isset($items[$index])) { 
						$record[$header[$index]] = trim($items[$index]); 
					} 
				} 
				$dataList[] = $record; 
			} else { 
				$lineNumber++; 
			} 
		} 
		fclose($handle); 
	} 
	return $dataList; 
}

We need to pass following four parameter to this function

$filePath - This is the path of file on disk.
$delimiter - This is delimiter of file to separate data. Default is '|'  e.g tab, comma.
$header - This is array of headers or columns names. This function will return list of arrays with key name as header name.
$skipLines - This is number of lines that want to skip from top of file.

This is how I had called above function

csvToArray('/var/www/livescript/arvind.csv', "t", array("column1", "column2", "column3"), 1);

2. Use "LOAD DATA INFILE ... "

$strSql = "LOAD DATA LOCAL INFILE 'D:/xampp/htdocs/livescript/testing.csv' INTO TABLE my_testing_table FIELDS TERMINATED BY ',' LINES TERMINATED BY 'n' (field1,field2,field3,field4,....)";
$this->model->ExecuteQuery($strSql);
If you need an acceleration factor for loading csv data, I strongly suggest that you use a special feature of the MySQL engine and its related SQL extension in your PHP loader program. See:

http://dev.mysql.com/doc/refman/5.6/en/extensions-to-ansi.html
http://dev.mysql.com/doc/refman/5.6/en/load-data.html

Syntax

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL]
INFILE 'file_name' [REPLACE | IGNORE]
INTO TABLE tbl_name [CHARACTER SET charset_name] [{FIELDS | COLUMNS}
[TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char'] ]
[LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ] [IGNORE number LINES] [(col_name_or_user_var,...)] [SET col_name = expr,...]
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed.The file name must be given as a literal string.

LOAD DATA INFILE is the complement of SELECT ... INTO OUTFILE.

To write data from a table to a file, use SELECT ... INTO OUTFILE. To read the file back into a table, useLOAD DATA INFILE. The syntax of the FIELDS and LINES clauses is the same for both statements. Both clauses are optional, but FIELDS must precede LINES if both are specified.

The character set indicated by the character_set_database system variable is used to interpret the information in the file.SET NAMES and the setting of character_set_client do not affect interpretation of input. If the contents of the input file use a character set that differs from the default, it is usually preferable to specify the character set of the file by using the CHARACTER SET clause, which is available. A character set of binary specifies "no conversion."

LOAD DATA INFILE interprets all fields in the file as having the same character set, regardless of the data types of the columns into which field values are loaded. For proper interpretation of file contents, you must ensure that it was written with the correct character set.

For example, if you write a data file with mysqldump -T or by issuing a SELECT ... INTO OUTFILE statement in MySQL, be sure to use a --default-character-set option with mysqldump or MySQL so that output is written in the character set to be used when the file is loaded with LOAD DATA INFILE.

If the LOW_PRIORITY keyword is used, insertions are delayed until no other clients are reading from the table. The CONCURRENT keyword allowes the use of concurrent inserts. These clauses cannot be specified together.

If the storage engine supports ALTER TABLE ... DISABLE KEYS, indexes are automatically disabled during the execution of LOAD DATA INFILE.

That's it!