Friday, February 1, 2013

Analyze log files from several servers in real-time (update: whois, firewall)

First, we setup a machine to analyze the logs:

# install missing packages (e.g. with Ubuntu 12.10)
apt-get install netcat logtop

# create a ramdisk with 1GB for storing the logs
mkdir /ramdisk
mount -t tmpfs -o nosuid,noexec,noatime,size=1G none /ramdisk

# receive logs on port 8080
ncat --ssl -l -k 8080 > /ramdisk/access.log
# open second terminal
tail -f /ramdisk/access.log | logtop

# clean up the ramdisk from time to time
echo >/ramdisk/access.log

Second, we setup the web servers:

# install missing packages
apt-get install netcat

# send logs to analyzer-ip:8080
tail -f /var/log/apache2/access.log | ncat --send-only --ssl <analyzer-ip> 8080
Besides access.log, we can also monitor other log files using different port numbers.

Let's start watching the requests coming in:

Instead of analyzing each line separately, we can also aggregate all requests by client IPs:

tail -f /ramdisk/access.log | awk -Winteractive '{print $1}' | logtop
Or we can aggregate all requests by URLs:
tail -f /ramdisk/access.log | awk -Winteractive '{print $7}' | logtop
Or filter all requests by a user agent:
# show only iPad
tail -f /ramdisk/access.log | grep iPad | logtop

# don't show Google, Msn, Bing
tail -f /ramdisk/access.log | grep -Ev 'Googlebot|bingbot|msnbot' | logtop
To extend the IP address with its owner, we write a small PHP script:
// whois.php
<?php
$fp = fopen('php://stdin', 'r');
while (!feof($fp)) {
  list($ip, $null) = explode(' ', fgets($fp), 2);
  if (!isset($whois[$ip])) {
    $who = shell_exec('whois '.escapeshellarg($ip));
    preg_match_all('!(?:descr|orgname|organization|country|owner).*:\s+(.+)!im',
      $who, $m);
    $whois[$ip] = ' '.str_pad($ip, 15).'  '.$m[1][0].' '.$m[1][1]."\n";
  }
  echo $whois[$ip];
}
fclose($fp);
and run:
apt-get install whois logtop php5-cli
tail -f /ramdisk/access.log | php whois.php | logtop

To send uptime messages every 5 seconds, we can use:

# @analyzer
ncat --ssl -l -k 8081 > /ramdisk/uptime.log
tail -f /ramdisk/uptime.log
# @webserver
while true; do echo -n `hostname`; uptime; sleep 5; done | ncat --send-only ...
# or free disk space, replace uptime with: df -h | grep sda

To configure a firewall on Ubuntu, we can use ufw:

# start firewall, block incoming connections
ufw enable
# allow incoming connections on port 80
ufw allow 80/tcp
# allow limited connections on port 22 (max. 6 connections in 30 seconds)
ufw limit 22/tcp
# show firewall status
ufw status verbose
ufw show listening

# block all new connections from IP 192.168.1.66
ufw insert 1 deny from 192.168.1.66

# remove blocking rule for IP 192.168.1.66
ufw delete deny from 192.168.1.66

Note: If your servers are connected by a secure network (e.g. VPN), you can skip --ssl and certificates.

Coming next: ncat with ssl-certificates

3 comments:

  1. This is an awesome resource! Appreciate the examples, like filtering by iPad/user agent. How would you extend this to pull out only errors, for example?

    Do you see value in a third-party solution for tailing logs (http://www.stackify.com/11-ways-to-tail-a-log-file-on-windows-unix/), or prefer the DIY approach?

    ReplyDelete
    Replies
    1. You can get errors from the error log:
      tail -f /var/log/apache2/error.log
      and filter by certain keywords:
      tail -f /var/log/apache2/error.log | grep -Ev 'Warning|Fatal'

      I can't tell if stackify is better or not, so I'd recommend to ask them directly. If you need tail in the browser, you can use a small node.js-script:
      https://gist.github.com/thomasbley/4727187

      Delete
  2. I think that thanks for the valuabe information and insights you have so provided here.
    whois reverse lookup

    ReplyDelete

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)