Let’s imagine that you are using Nagios/Centreon to monitor your systems and you need an advanced tool to scan your logfiles.

Your logfiles have a custom format timestamp and contain several fields, separated by spaces. You would like to know the number of log lines matching a certain string, but perform the search in a specific field only. And it would be nice if you could use regexs in the search string.

Plus, you do not want to search all log entries but only the recent ones, let’s say those in the last 15 minutes.

To further complicate things, all your machines are on Windows so you don’t have access to any Linux tool.

How do you solve that?

Here’s how: a PowerShell script that does what you need and that you can easily integrate in Nagios via NSClient++.

You can download the script from our repository on GitHub. Below we’ll explain the various parts of the code.

 

First, let’s have a look at the logfile:

2015-04-09_09:08:04 FOOBAR GVA-1\svc_hypervisor  ERROR2 checkScannedFile, Couldn't create service AA1
2015-04-09_09:08:05 FOOBAR GVA-1\svc_hypervisor  ERROR1 Error in checkData 0x44BC
2015-04-09_09:19:27 FOOVB GVA-1\svc_hypervisor  INFO checkOpenFiles, service exited with fd 1
2015-04-09_09:19:56 FOOQUX GVA-1\svc_hypervisor WARNING checkScannedFile, Couldn't create service AB2
2015-04-09_09:19:58 FOOBAZ GVA-1\svc_hypervisor  ERROR1 Error in checkData 0XF328
2015-04-09_09:20:03 FOOGMX GVA-1\svc_hypervisor ERROR3 Cannot open file /tmp/xyzzy

 

And here’s the code.

First we check that the user launched the script correctly:

if ($args.Count -ne 4) {
Write-Output "Usage: .\scanlog.ps1 "
exit 3
}

Important: this check works when running the script as a standalone program. NSClient++ may add some additional parameter when calling the script, which would make this check to fail. If you experience this, remove these four lines of code.

Next, we define what is the format of the timestamp in the first field of each line. You will need to change this variable to map exactly the format your logfile uses. The full list of timestamp formats can be found here.

$DATE_FORMAT = "yyyy-MM-dd_HH:mm:ss"

We parse the arguments from the command line. As said, the script parses $logfilename and prints out the lines that match $regex in field number $nfield in the last $nminutes minutes.
Note that the first field (containing the timestamp) is number 0.

$logfilename=$args[0]
$regex=$args[1]
$nfield=$args[2]
$nminutes=$args[3]

We get $now_datetime, the current timestamp in the format previously specified, and we subtract from it a TimeSpan object (initialized with the specific number of minutes) to obtain $startcheck_datetime, which is the timestamp the checks must start. Both timestamp variables are DateTime objects.

$now_datetime = [DateTime]::ParseExact($((Get-Date).ToString($DATE_FORMAT)),$DATE_FORMAT,$null)
$startcheck_datetime = $now_datetime - $(New-TimeSpan -minutes $nminutes)

We initialize the variables that will hold the number of valid matches, and load the logfile into memory.
Beware: the Get-Content command can be quite slow when operating on large logfiles.

$matchinglines = 0
$log = Get-Content $logfilename

This is the main loop:

foreach ($logline in $log) {
$token = $logline.Split(" ",[System.StringSplitOptions]::RemoveEmptyEntries)
if ($($token[$nfield]) -match $regex) {
$logline_datetime = [DateTime]::ParseExact($($token[0]),$DATE_FORMAT,$null)
if ($logline_datetime -ge $startcheck_datetime) {
$matchinglines++
}
}
}

Here we split the log line into tokens using a single whitespace for separator, we check if the token (the field) matches the regex and, if it does, we do a second check by creating a DateTime object from the log line timestamp and verifying if it’s past the $startcheck_datetime. If all checks pass, we increment the variable that holds the number of valid matches.

The RemoveEmptyEntries option in the string tokenizer function ensures that we get only useful, non-empty tokens. In fact, had you split the following line (notice the extra whitespaces) using a single whitespace as separator and no further options:

2015-04-09_09:19:27 FOOVB GVA-1\svc_hypervisor INFO checkOpenFiles, service exited with fd 1

you would discover that FOOVB is the 2nd token, INFO is the 8th, while the 3rd, 5th, 6th, and 7th tokens are an empty string each. That’s probably not what you wanted.

Finally, here’s the end of the script, where we print the output to console and exit the program.

$output = "Found " + $matchinglines + " lines matching " + $regex + " in field " + $nfield
$output += " on file " + $logfilename + " within the last " + $nminutes + " minutes."
if ($matchinglines -eq 0) {
$exitcode = 0
}
else {
$exitcode = 2
}
Write-Output $output
exit $exitcode

Our script will return an exit code of 0 if no line matches, 2 if one or more lines match, and 3 if it was called with the wrong number of arguments. The exit code can be viewed by querying the variable $LastExitCode. You might want to change these exit codes to match the ones Nagios uses (0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN).

 

We can now call the script in this way:

.\scanlog.ps1 test.log ERROR.* 3 60

This operates on the example logfile previously shown, searching for log entries with any kind of error (ERROR1, ERROR2, etc.) in the 4th field, within the last hour only.

 

0 réponses

Laisser un commentaire

Participez-vous à la discussion?
N'hésitez pas à contribuer!

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.