Simple logfile parsing script
This perl script will take a logfile and a list of regex tokens and count the number of times each token appears in the log. This is useful for monitoring the number of times somebody has logged in, how many mails your server has accepted, total number of pages fetched from HTTP etc.
At present, this script is used on a medium load mailserver and analyses 20000 lines for 30 tokens in under a second, run nightly to get information on the logs from the previous day. However, the script can be used on any plain text logfile.
The script
The script is written in Perl and is very simple. To run it, simply pass the name of a logfile and the name of a tokens file. You may also pass a title for the report.
Usage: FileStats.pl --file=file-to-search --tokens=file-with-tokens [--title="Report title"]
The tokens file
The format of the tokens file is also very simple, a regex one the left, the summary on the right. Comments are prefixed with a hash and a double asterisk can be used to force a newline in the report.
The pipe | is used to seperate the regex from the message. In the event you need to use it in your regex, you’re out of luck.
H=\(.*\).*rejected|Rejected where HELO didn't actually match rDNS
# A blank line below!
**
R=mail_route T=remote_smtp S=[0-9]+ H=80.xx.xx.xx|Mails sent to xxxx
T=remote_smtp|Number of times remote_smtp transport was used
T=virtualuser_delivery|Mails delivered to virtual users
# The first part matches the -XXXXXX-XX end part of the message ID
-\w{6}-\w{2} Completed|Completed mail
Example
Below is a real example report taken from a server running Exim with a fairly trivial tokens file.
EXIM STATS
Page 1
Event description Count
================================================================
Rejected due to DNS lists 3913
Rejected due to bad header syntax 32
Rejected because sender verification failed 90
Rejected because there was no valid sender 91
Rejected HELO because it was one of our domains 0
Rejected HELO because it was our interface address 0
Rejected due to bad HELO/EHLO arguments 703
Refused because of connection limit 0
Rejected because we're not a relay 1384
Deferred because host lookup didn't complete 34
Rejected because of a protocol violation 99
Total rejected 7476
Mails sent to xxxx 3175
Number of times remote_smtp transport was used 3373
Mails delivered to virtual users 461
Completed mail 3561
No hostname found for IP 0
Mails sent to xxx 42
Mails sent to staff@xxxxxxxxxx.co.uk 16
Mail arriving from a .com address 1121
Mail arriving from a .co.uk address 807
Number of times "drugs" appears in logs 3
Number of times "viagra" appears in logs 0
Number of virii caught by scanner 104
Amount of spam mail rejected 465
Mails that skipped virus scanning 63
Mails that skipped spam scanning 149
Amount of mail rejected because of bad attachment 0
END OF REPORT
Total log lines analysed: 19820
Total regex tokens tried: 28
Todo
At present, the script uses a Perl report to output information. This needs updating to use something newer.