Thursday, September 13, 2012

Mass inserts, updates: SQLite vs MySQL (update: delayed inserts)

Lessons learned:
  • SQLite performs inserts 8-14 times faster then InnoDB / MyISAM
  • SQLite performs updates 4-8 times faster then InnoDB and as fast as MyISAM
  • SQLite performs selects 2 times faster than InnoDB and 3 times slower than MyISAM
  • SQLite requires 2.6 times less disk space than InnoDB and 1.7 times more than MyISAM
  • Allowing null values or using synchronous=NORMAL makes inserts 5-10 percent faster in SQLite

Using SQLite instead of MySQL can be a great alternative on certain architectures. Especially if you can partition the data into several SQLite databases (e.g. one database per user) and limit parallel transactions on one database. Replication, backup and restore can be done easily over the file system.

The results:
(MySQL 5.6.5 default config without binlog, SQLite 3.7.7, PHP 5.4.5, 2 x 1.4 GHz, disk 5400rpm)

insert [s] sum [s] update [s] size [MB]
JSON 1.84 1.30 2.92 2.96
CSV 1.97 2.25 3.7 2.57
SQLite (memory)2.740.120.520.00
SQLite (memory, not null)3.000.120.520.00
SQLite (normal sync)3.060.171.212.88
SQLite3.170.160.962.88
SQLite (not null)3.400.171.152.88
MySQL (memory)21.250.040.063.17
MySQL (memory, not null)21.280.040.073.17
MySQL (CSV, not null)23.800.171.562.51
MySQL (MyISAM)25.600.051.021.72
MySQL (MyISAM, not null)25.650.051.011.72
MySQL (InnoDB)27.720.315.657.52
MySQL (InnoDB, not null)27.740.314.917.52
insert / update 200k (1 process)

Using delayed inserts makes MySQL even slower:
delayed insert [s]
MySQL (memory not null)28.55
MySQL (memory)28.73
MySQL (CSV, not null)60.09
MySQL (MyISAM, not null)33.44
MySQL (MyISAM)32.94
MySQL (InnoDB)60.04
MySQL (InnoDB, not null)61.62
insert 200k (1 process)

insert [s] sum [s] update [s] insert [s] sum [s] update [s]
SQLite (memory)2.850.130.55 6.070.231.10
SQLite (memory, not null)3.110.140.54 6.440.241.09
SQLite (normal sync)3.210.181.56 5.600.171.82
SQLite3.390.181.51 5.940.171.74
SQLite (not null)3.690.181.61 6.340.191.36
MySQL (memory)32.910.060.06 61.090.040.14
MySQL (memory, not null)32.400.040.07 60.590.050.08
MySQL (CSV, not null)36.000.161.70 66.840.204.48
MySQL (MyISAM)37.070.051.09 73.660.111.45
MySQL (MyISAM, not null)36.240.051.08 72.100.071.53
MySQL (InnoDB)40.650.305.62 83.550.3714.41
MySQL (InnoDB, not null)41.730.325.84 88.950.6014.69
insert / update 2 x 200k (2 processes), 4 x 200k (4 processes)

Here is the test script:
// CSV
$csv = tempnam('/tmp', 'csv');
$start = microtime(true);
$fp = fopen($csv, 'w');
for ($i=0; $i<200000; $i++) fputcsv($fp, array($i, $i*2));
fclose($fp);
echo 'csv '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($csv, 'r');
$sum = 0;
while (!feof($fp)) $sum += @array_pop(fgetcsv($fp));
fclose($fp);
echo ' '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($csv, 'r');
$fp2 = fopen($csv.'2', 'w');
while (!feof($fp)) {
  $data = fgetcsv($fp);
  fputcsv($fp2, array($data[0], $data[1]+2));
}
fclose($fp);
fclose($fp2);
echo ' '.number_format(microtime(true)-$start, 2)."\n";
// JSON
$json = tempnam('/tmp', 'json');
$start = microtime(true);
$fp = fopen($json, 'w');
for ($i=0; $i<200000; $i++) fwrite($fp, '['.$i.','.($i*2).']'."\n");
fclose($fp);
echo 'json '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($json, 'r');
$sum = 0;
while (!feof($fp)) $sum += @array_pop(json_decode(fgets($fp)));
fclose($fp);
echo ' '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($json, 'r');
$fp2 = fopen($json.'2', 'w');
while (!feof($fp)) {
  $data = json_decode(fgets($fp));
  fputs($fp2, '['.$data[0].','.($data[1]+2).']'."\n");
}
fclose($fp);
fclose($fp2);
echo ' '.number_format(microtime(true)-$start, 2)."\n";
// SQLite
// 100.000 was the best transaction size in several runs
sqlite_test(':memory:');
sqlite_test(':memory:', 'not null');
sqlite_test('/tmp/test1a.db', '', 'sync_norm'); // use test2..n for n processes
sqlite_test('/tmp/test1b.db');
sqlite_test('/tmp/test1c.db', 'not null');

function sqlite_test($file, $null='', $opt=false) {
  $db = new SQLite3($file);
  if ($opt) $db->exec('PRAGMA synchronous=NORMAL');
  $db->exec('CREATE TABLE foo (i INT '.$null.', i2 INT '.$null.')');
  $start = microtime(true);
  $db->exec('begin');
  for ($i=0; $i<200000; $i++) {
    $db->exec('INSERT INTO foo VALUES ('.$i.', '.($i*2).')');
    if ($i%100000==0) {
      $db->exec('commit');
      $db->exec('begin');
    }
  }
  $db->exec('commit');
  echo "sqlite $file $null $opt ".number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->query('SELECT sum(i2) FROM foo')->fetchArray();
  echo ' '.number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->exec('UPDATE foo SET i2=i2+2');
  echo ' '.number_format(microtime(true)-$start, 2);
  echo ' '.number_format(@filesize($file)/1048576, 2)."\n";
}
// MySQL
// 30.000 was the best transaction size in several runs
mysql_test('memory', 'not null');
mysql_test('memory');
mysql_test('csv', 'not null');
mysql_test('myisam', 'not null');
mysql_test('myisam');
mysql_test('innodb', 'not null');
mysql_test('innodb');

function mysql_test($e, $null='') {
  $db = new mysqli('127.0.0.1', 'root', '', 'test'); // use test2..n
  $db->query('DROP TABLE foo');
  $db->query('CREATE TABLE foo (i INT '.$null.',i2 INT '.$null.') ENGINE='.$e);

  $start = microtime(true);
  $db->query('begin');
  for ($i=0; $i<200000; $i++) {
    $db->query('INSERT INTO foo VALUES ('.$i.', '.($i*2).')');
    if ($i%30000==0) {
      $db->query('commit');
      $db->query('begin');
    }
  }
  $db->query('commit');
  echo "mysql $e $null ".number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->query('SELECT sum(i2) FROM foo')->fetch_row();
  echo ' '.number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->query('UPDATE foo SET i2=i2+2');
  echo ' '.number_format(microtime(true)-$start, 2);
  
  $row = $db->query('SHOW TABLE STATUS')->fetch_array();
  if (!$row['Data_length']) $row['Data_length'] = filesize('.../test/foo.CSV');
  echo ' '.number_format($row['Data_length']/1048576, 2)."\n";
}

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)