Sunday, August 12, 2012

How to implement a real life benchmark with PHP

To determine the maximum capacity of a web page, Apache ab is often used in the first step. Fetching one URL very often is optimal for caching and gives a best case. To get the worst case for caching, it is necessary to fetch different URLs in a random order.

Here is a PHP script to walk randomly on a web page:

To get the average case concerning caching and response times, we need to choose the most relevant links. For example, we skip links from headers and footers. This can be done by using a different xpath expression in the code:
// fetch all links under <div id="content">...</div>
$xpath = '//div[@id="content"]//a';

// fetch all links under <div id="content"> and <div id="menu">
$xpath = '//div[@id="content" or @id="menu"]//a';

To make the benchmark more realistic, you can define a waiting period between two requests: Uncomment "// sleep(1)" at the end of the script.
To get the right values for $limit (number of pages per user) and $processes (number of users), you can consult your favorite analytics tool.

Example output:
php random_crawler.php >details.log

Testing http://www.spiegel.de/, 100 requests, 10 processes
#2393 start 10 requests
#2395 start 10 requests
#2396 start 10 requests
#2394 start 10 requests
#2399 start 10 requests
#2397 start 10 requests
#2398 start 10 requests
#2402 start 10 requests
#2401 start 10 requests
#2400 start 10 requests
#2398 end 188/815 KB 3.54s 0.35s/req
#2393 end 176/751 KB 3.78s 0.38s/req
#2396 end 153/562 KB 3.90s 0.39s/req
#2401 end 137/628 KB 4.19s 0.42s/req
#2397 end 149/456 KB 4.89s 0.49s/req
#2399 end 156/525 KB 4.90s 0.49s/req
#2402 end 171/619 KB 5.95s 0.60s/req
#2400 end 127/349 KB 7.40s 0.74s/req
#2394 end 157/465 KB 8.36s 0.84s/req
#2395 end 167/662 KB 10.62s 1.06s/req
(sizes shown as compressed/uncompressed)

No comments:

Post a Comment

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)