Tuesday, June 26, 2012

Things you should not do in PHP (update: references)

Here is a list of things you should not do in PHP. Most of the stuff is pretty obvious, but over the years I've seen a lot of them. In most cases, these problems remain hidden until data grows above 10000 entries. So on a development system, things are always fast and there are no problems with memory limits :-)

Suppose we have a table with 100k entries:
$db->query('create table stats (c1 int(11) primary key, c2 varchar(255))');
$db->query('begin');
for ($i=0; $i<100000; $i++) {
  $db->query('insert into stats values ('.$i.','.($i*2).')');
}
$db->query('commit');
Populate a big array instead of streaming results:
$result = $db->query('select * from stats');
$array = $result->fetch_all(); // 35M
// or
while ($row = $result->fetch_assoc()) $array[] = $row; // 35M
// or
while ($row = $result->fetch_array()) $array[] = $row; // 44.5M
// process $array ...

// instead of:
while ($row = $result->fetch_assoc()) { // 0.5M
  // process $row
}
Sum with PHP instead of SQL:
$sum = 0;
foreach ($array as $val) $sum += $val[0]; // 44M, 1.2s

// instead of:
list($sum,) = $db->query('select sum(t1) from stats')->fetch_row(); // 0.2M, 0.1s
Sort with PHP instead of SQL:
usort($array, function ($a, $b) { return $a[0] > $b[0]; }); // 4.1s
// or
foreach ($array as $key=>$val) $helper[$key] = $val[0];
asort($helper); // 2.2s

// instead of:
$result = $db->query('select * from stats order by c1');
while ($row = $result->fetch_assoc()) { // 1.2s
Let's add a second table:
$db->query('create table stats2 (c1 int(11) primary key, c2 varchar(255))');
$db->query('begin');
for ($i=50000; $i<51000; $i++) {
  $db->query('insert into stats2 values ('.$i.','.($i*2).')');
}
$db->query('commit');
Join with PHP instead of SQL (join result contains 1000 entries):
$array = $db->query("select * from stats")->fetch_all();
$array2 = $db->query("select * from stats2")->fetch_all();

foreach ($array as $key=>$val) {
  foreach ($array2 as $key2=>$val2) { // 35.7M, 69s
    if ($val[0] == $val2[0]) // do sth.
  }
}

// instead of:
$result = $db->query('select * from stats a, stats2 b where a.t1=b.t1');
while ($row = $result->fetch_array()) { // 0.5M, 0.015s
Modify arrays without references:
$array = array();
for ($i=0; $i<1000000; $i++) $array[] = $i*2;

$start = microtime(true);
foreach ($array as &$val) $val++;
echo (memory_get_peak_usage(true)/1048576)."\n"; // 80M (32bit), 200M (64bit)
echo (microtime(true)-$start)."\n"; // 0.14s

$start = microtime(true);
foreach ($array as $key=>$val) $array[$key]++;
echo (memory_get_peak_usage(true)/1048576)."\n"; // 161M (32bit), 399M (64bit)
echo (microtime(true)-$start)."\n"; // 0.64s
more examples coming ...

Scripts running on a 1.4 GHz machine with PHP 5.4.0.

1 comment:

  1. Nice and good article.. it is very useful for me to learn and understand easily.. thanks for sharing your valuable information and time.. please keep updating.more 
    php jobs in hyderabad.

    ReplyDelete

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)