There are many great tools to analyze your webserver's access logfiles. AWstats is one I am very fond of as it creates all kinds of nice graphs and shows a lot of information about the visitors of your website(s). Google analytics is a very popular tool as well.
But sometimes you just want to know what URI is requested a lot, and which IP has accessed your site the most. Perhaps because you suddenly experienced a burst in traffic and want to find out the cause. For this, I've created a simple python script, which can be downloaded from my github page here.
It's quite easy to use. Simply copy the script to a place on your server and make it executable:
$ chmod +x top10.py
Then feed it an access logfile and pass the '-e' parameter to get some information about column info:
$ ./top10.py -f access.log -e Parsing logfile access.log Displaying the first access log line: col: value: 0: 127.0.0.1 1: - 2: - 3: [20/Feb/2012:13:02:19 4: +0100] 5: "GET 6: / 7: HTTP/1.1" 8: 500 9: 613 10: "-" 11: "Mozilla/5.0 12: (X11; 13: Linux 14: x86_64; 15: rv:10.0) 16: Gecko/20100101 17: Firefox/10.0"
Then run top10.py using the '-u' (for URI column) and '-i' (for the client IP). In the above example it's 6 and 0, so:
$ ./top10.py -f access.log -i 0 -u 6 Parsing logfile access.log Analyzing 2968 lines: [=================== ] 102% TOP 10 REQUESTED URLs +--------+-----------------------------------------+ | Visits | URL | +--------+-----------------------------------------+ | 143 | /static/img/feed.png | | 142 | /static/img/search.png | | 136 | /static/css/style.css | | 131 | /static/js/syntaxhl/shCore.js | | 130 | /static/js/syntaxhl/shBrushBash.js | | 130 | /static/css/syntaxhl/shThemeDefault.css | | 130 | /static/css/syntaxhl/shCore.css | | 130 | /static/js/syntaxhl/shBrushPython.js | | 123 | /static/img/pier-oud.jpg | | 119 | /static/img/seperator.gif | +--------+-----------------------------------------+ TOP 10 client IP's +--------+-----------------+ | Visits | IP | +--------+-----------------+ | 2118 | xx.151.232.yy | | 710 | xxx.172.124.yyy | | 71 | ::1 | | 35 | xx.32.234.yyy | | 31 | xxx.124.167.yyy | | 1 | xx.169.34.yy | | 1 | xx.249.179.yyy | | 1 | xx.198.63.yyy | +--------+-----------------+
I've implemented some more features, such as providing an apache configuration file to grab the LogFormat nick name, with which it's possible to set a start and end date (for very large logfiles). Unfortunately I haven't had much time to thouroughly test these.