Thursday, September 13, 2012

Mass inserts, updates: SQLite vs MySQL (update: delayed inserts)

Lessons learned:
  • SQLite performs inserts 8-14 times faster then InnoDB / MyISAM
  • SQLite performs updates 4-8 times faster then InnoDB and as fast as MyISAM
  • SQLite performs selects 2 times faster than InnoDB and 3 times slower than MyISAM
  • SQLite requires 2.6 times less disk space than InnoDB and 1.7 times more than MyISAM
  • Allowing null values or using synchronous=NORMAL makes inserts 5-10 percent faster in SQLite

Using SQLite instead of MySQL can be a great alternative on certain architectures. Especially if you can partition the data into several SQLite databases (e.g. one database per user) and limit parallel transactions on one database. Replication, backup and restore can be done easily over the file system.

The results:
(MySQL 5.6.5 default config without binlog, SQLite 3.7.7, PHP 5.4.5, 2 x 1.4 GHz, disk 5400rpm)

insert [s] sum [s] update [s] size [MB]
JSON 1.84 1.30 2.92 2.96
CSV 1.97 2.25 3.7 2.57
SQLite (memory)2.740.120.520.00
SQLite (memory, not null)3.000.120.520.00
SQLite (normal sync)3.060.171.212.88
SQLite3.170.160.962.88
SQLite (not null)3.400.171.152.88
MySQL (memory)21.250.040.063.17
MySQL (memory, not null)21.280.040.073.17
MySQL (CSV, not null)23.800.171.562.51
MySQL (MyISAM)25.600.051.021.72
MySQL (MyISAM, not null)25.650.051.011.72
MySQL (InnoDB)27.720.315.657.52
MySQL (InnoDB, not null)27.740.314.917.52
insert / update 200k (1 process)

Using delayed inserts makes MySQL even slower:
delayed insert [s]
MySQL (memory not null)28.55
MySQL (memory)28.73
MySQL (CSV, not null)60.09
MySQL (MyISAM, not null)33.44
MySQL (MyISAM)32.94
MySQL (InnoDB)60.04
MySQL (InnoDB, not null)61.62
insert 200k (1 process)

insert [s] sum [s] update [s] insert [s] sum [s] update [s]
SQLite (memory)2.850.130.55 6.070.231.10
SQLite (memory, not null)3.110.140.54 6.440.241.09
SQLite (normal sync)3.210.181.56 5.600.171.82
SQLite3.390.181.51 5.940.171.74
SQLite (not null)3.690.181.61 6.340.191.36
MySQL (memory)32.910.060.06 61.090.040.14
MySQL (memory, not null)32.400.040.07 60.590.050.08
MySQL (CSV, not null)36.000.161.70 66.840.204.48
MySQL (MyISAM)37.070.051.09 73.660.111.45
MySQL (MyISAM, not null)36.240.051.08 72.100.071.53
MySQL (InnoDB)40.650.305.62 83.550.3714.41
MySQL (InnoDB, not null)41.730.325.84 88.950.6014.69
insert / update 2 x 200k (2 processes), 4 x 200k (4 processes)

Here is the test script:
// CSV
$csv = tempnam('/tmp', 'csv');
$start = microtime(true);
$fp = fopen($csv, 'w');
for ($i=0; $i<200000; $i++) fputcsv($fp, array($i, $i*2));
fclose($fp);
echo 'csv '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($csv, 'r');
$sum = 0;
while (!feof($fp)) $sum += @array_pop(fgetcsv($fp));
fclose($fp);
echo ' '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($csv, 'r');
$fp2 = fopen($csv.'2', 'w');
while (!feof($fp)) {
  $data = fgetcsv($fp);
  fputcsv($fp2, array($data[0], $data[1]+2));
}
fclose($fp);
fclose($fp2);
echo ' '.number_format(microtime(true)-$start, 2)."\n";
// JSON
$json = tempnam('/tmp', 'json');
$start = microtime(true);
$fp = fopen($json, 'w');
for ($i=0; $i<200000; $i++) fwrite($fp, '['.$i.','.($i*2).']'."\n");
fclose($fp);
echo 'json '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($json, 'r');
$sum = 0;
while (!feof($fp)) $sum += @array_pop(json_decode(fgets($fp)));
fclose($fp);
echo ' '.number_format(microtime(true)-$start, 2);

$start = microtime(true);
$fp = fopen($json, 'r');
$fp2 = fopen($json.'2', 'w');
while (!feof($fp)) {
  $data = json_decode(fgets($fp));
  fputs($fp2, '['.$data[0].','.($data[1]+2).']'."\n");
}
fclose($fp);
fclose($fp2);
echo ' '.number_format(microtime(true)-$start, 2)."\n";
// SQLite
// 100.000 was the best transaction size in several runs
sqlite_test(':memory:');
sqlite_test(':memory:', 'not null');
sqlite_test('/tmp/test1a.db', '', 'sync_norm'); // use test2..n for n processes
sqlite_test('/tmp/test1b.db');
sqlite_test('/tmp/test1c.db', 'not null');

function sqlite_test($file, $null='', $opt=false) {
  $db = new SQLite3($file);
  if ($opt) $db->exec('PRAGMA synchronous=NORMAL');
  $db->exec('CREATE TABLE foo (i INT '.$null.', i2 INT '.$null.')');
  $start = microtime(true);
  $db->exec('begin');
  for ($i=0; $i<200000; $i++) {
    $db->exec('INSERT INTO foo VALUES ('.$i.', '.($i*2).')');
    if ($i%100000==0) {
      $db->exec('commit');
      $db->exec('begin');
    }
  }
  $db->exec('commit');
  echo "sqlite $file $null $opt ".number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->query('SELECT sum(i2) FROM foo')->fetchArray();
  echo ' '.number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->exec('UPDATE foo SET i2=i2+2');
  echo ' '.number_format(microtime(true)-$start, 2);
  echo ' '.number_format(@filesize($file)/1048576, 2)."\n";
}
// MySQL
// 30.000 was the best transaction size in several runs
mysql_test('memory', 'not null');
mysql_test('memory');
mysql_test('csv', 'not null');
mysql_test('myisam', 'not null');
mysql_test('myisam');
mysql_test('innodb', 'not null');
mysql_test('innodb');

function mysql_test($e, $null='') {
  $db = new mysqli('127.0.0.1', 'root', '', 'test'); // use test2..n
  $db->query('DROP TABLE foo');
  $db->query('CREATE TABLE foo (i INT '.$null.',i2 INT '.$null.') ENGINE='.$e);

  $start = microtime(true);
  $db->query('begin');
  for ($i=0; $i<200000; $i++) {
    $db->query('INSERT INTO foo VALUES ('.$i.', '.($i*2).')');
    if ($i%30000==0) {
      $db->query('commit');
      $db->query('begin');
    }
  }
  $db->query('commit');
  echo "mysql $e $null ".number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->query('SELECT sum(i2) FROM foo')->fetch_row();
  echo ' '.number_format(microtime(true)-$start, 2);

  $start = microtime(true);
  $db->query('UPDATE foo SET i2=i2+2');
  echo ' '.number_format(microtime(true)-$start, 2);
  
  $row = $db->query('SHOW TABLE STATUS')->fetch_array();
  if (!$row['Data_length']) $row['Data_length'] = filesize('.../test/foo.CSV');
  echo ' '.number_format($row['Data_length']/1048576, 2)."\n";
}

3 comments:

  1. Normally I would ask the following methodology-related questions to someone conducting performance tests like this:

    Are these claims anywhere near valid when applied to 13 table join selects with correllated subqueries? How about on a system 4x as powerful with a RAID 10 setup? What if you tune various settings and create optimal indexes?

    However, as you have done us the favor of including your source code with your claims, anyone at all serious about databases can easily discover that your testing methodology is flawed on an even more simple basis than that.

    Why use a database if all you're going to do is flat inserts and selects, why not use a data structure more suited to that? SQL databases are for transactional requests on highly structured data, not simple storage and retrieval of mundane data.

    ReplyDelete
    Replies
    1. Joins and subqueries are not part of this article.
      Having huge tables, you normally try to avoid any joins and subqueries. Adding indexes makes inserts slower.
      The write rate for inserts is about 1 MB/s, so IO is not relevant here.

      > why not use a data structure more suited to that?

      See the CSV testm, the point is that updates are slower with CSV.

      Delete
  2. Thank your for the test. I'ts excactly what I'd discovered.
    No doubt, MySQL has its area of application. But SQLITE is very nice also and in some cases the better choice.
    I've worked with Huge Tables on MySQL. but now I've migrated to SQLITE3. In my case, the Database doesn't require more than one user simultaneously. The Job is only data matching and converting. I was suprised about that SQLITE3 performs many faster than MySQL. Because of this circumstances and because of the easy handling I will keep an eye on SQLITE.

    ReplyDelete