Quickly analyze (apache) access logfiles
calendar Dec. 27, 2015   category  scripting , python   comments  comments


There are many great tools to analyze your webserver's access logfiles. AWstats is one I am very fond of as it creates all kinds of nice graphs and shows a lot of information about the visitors of your website(s). Google analytics is a very popular tool as well.

But sometimes you just want to know what URI is requested a lot, and which IP has accessed your site the most. Perhaps because you suddenly experienced a burst in traffic and want to find out the cause. For this, I've created a simple python script, which can be downloaded from my github page here.

It's quite easy to use. Simply copy the script to a place on your server and make it executable:

 

$ chmod +x top10.py

 

Then feed it an access logfile and pass the '-e' parameter to get some information about column info:

 

$ ./top10.py -f access.log -e
Parsing logfile access.log
Displaying the first access log line:

col:	value:

0:	127.0.0.1
1:	-
2:	-
3:	[20/Feb/2012:13:02:19
4:	+0100]
5:	"GET
6:	/
7:	HTTP/1.1"
8:	500
9:	613
10:	"-"
11:	"Mozilla/5.0
12:	(X11;
13:	Linux
14:	x86_64;
15:	rv:10.0)
16:	Gecko/20100101
17:	Firefox/10.0"

 

Then run top10.py using the '-u' (for URI column) and '-i' (for the client IP). In the above example it's 6 and 0, so:

 

$ ./top10.py -f access.log -i 0 -u 6
Parsing logfile access.log
Analyzing 2968 lines:
[=================== ] 102%

TOP 10 REQUESTED URLs
+--------+-----------------------------------------+
| Visits | URL                                     |
+--------+-----------------------------------------+
|  143   | /static/img/feed.png                    |
|  142   | /static/img/search.png                  |
|  136   | /static/css/style.css                   |
|  131   | /static/js/syntaxhl/shCore.js           |
|  130   | /static/js/syntaxhl/shBrushBash.js      |
|  130   | /static/css/syntaxhl/shThemeDefault.css |
|  130   | /static/css/syntaxhl/shCore.css         |
|  130   | /static/js/syntaxhl/shBrushPython.js    |
|  123   | /static/img/pier-oud.jpg                |
|  119   | /static/img/seperator.gif               |
+--------+-----------------------------------------+

TOP 10 client IP's
+--------+-----------------+
| Visits | IP              |
+--------+-----------------+
|  2118  | xx.151.232.yy   |
|  710   | xxx.172.124.yyy |
|   71   | ::1             |
|   35   | xx.32.234.yyy   |
|   31   | xxx.124.167.yyy |
|   1    | xx.169.34.yy    |
|   1    | xx.249.179.yyy  |
|   1    | xx.198.63.yyy   |
+--------+-----------------+

 

I've implemented some more features, such as providing an apache configuration file to grab the LogFormat nick name, with which it's possible to set a start and end date (for very large logfiles). Unfortunately I haven't had much time to thouroughly test these.



Share: