Navigation: home » linux » logstats

Simple logfile parsing script

This perl script will take a logfile and a list of regex tokens and count the number of times each token appears in the log. This is useful for monitoring the number of times somebody has logged in, how many mails your server has accepted, total number of pages fetched from HTTP etc.

At present, this script is used on a medium load mailserver and analyses 20000 lines for 30 tokens in under a second, run nightly to get information on the logs from the previous day. However, the script can be used on any plain text logfile.

The script

The script is written in Perl and is very simple. To run it, simply pass the name of a logfile and the name of a tokens file. You may also pass a title for the report.

Usage: FileStats.pl --file=file-to-search --tokens=file-with-tokens [--title="Report title"]

The tokens file

The format of the tokens file is also very simple, a regex one the left, the summary on the right. Comments are prefixed with a hash and a double asterisk can be used to force a newline in the report.

The pipe | is used to seperate the regex from the message. In the event you need to use it in your regex, you’re out of luck.

H=\(.*\).*rejected|Rejected where HELO didn't actually match rDNS

# A blank line below!
**

R=mail_route T=remote_smtp S=[0-9]+ H=80.xx.xx.xx|Mails sent to xxxx
T=remote_smtp|Number of times remote_smtp transport was used
T=virtualuser_delivery|Mails delivered to virtual users
# The first part matches the -XXXXXX-XX end part of the message ID
-\w{6}-\w{2} Completed|Completed mail

Example

Below is a real example report taken from a server running Exim with a fairly trivial tokens file.

                           EXIM STATS
                             Page 1

                 Event description                      Count
================================================================
 Rejected due to DNS lists                              3913
 Rejected due to bad header syntax                       32
 Rejected because sender verification failed             90
 Rejected because there was no valid sender              91
 Rejected HELO because it was one of our domains          0
 Rejected HELO because it was our interface address       0
 Rejected due to bad HELO/EHLO arguments                 703
 Refused because of connection limit                      0
 Rejected because we're not a relay                     1384
 Deferred because host lookup didn't complete            34
 Rejected because of a protocol violation                99
 Total rejected                                         7476

 Mails sent to xxxx                                     3175
 Number of times remote_smtp transport was used         3373
 Mails delivered to virtual users                        461
 Completed mail                                         3561
 No hostname found for IP                                 0
 Mails sent to xxx                                       42
 Mails sent to staff@xxxxxxxxxx.co.uk                    16
 Mail arriving from a .com address                      1121
 Mail arriving from a .co.uk address                     807
 Number of times "drugs" appears in logs                  3
 Number of times "viagra" appears in logs                 0

 Number of virii caught by scanner                       104
 Amount of spam mail rejected                            465
 Mails that skipped virus scanning                       63
 Mails that skipped spam scanning                        149
 Amount of mail rejected because of bad attachment        0



END OF REPORT
Total log lines analysed: 19820
Total regex tokens tried: 28

Todo

At present, the script uses a Perl report to output information. This needs updating to use something newer.