(This Document was Modified from NS_Client readme.html )
Shatter IT. - www.shatterit.com
NC_Net Home Page - www.shatterit.com/NC_Net
Original author - Tony Montibello – amontibello@shatterit.com
Purpose
License
Requirements
History
Features
Installation
Uninstall
Configuration
Plug-in Syntax
CPU Load
Disk Usage
Uptime
Service
States
Process
States
Client Version
Memory Usage
File Age Usage
Custom Counter
Instances
Event Log
Check
WMI Check
Technical Information
Problems
Known Issues
NC_Net has been developed as a drop in replacement for the program NS_Client. NS_Client has been developed to get performance information from Windows Servers and return them to Nagios using the Check_nt client. In addition to the standard metrics, a generic COUNTER function is provided to return the value of any counter maintained by Windows. NC_Net also provides passive checks via NSCA protocal (you do not need to install NSCA on the Windows client this has been built into NC_Net) NC_Net also provides remote configuration, and several newer checks (Event Log and WMI, Enumeration of performance counters)
NC_Net is released with a GNU General Public License.NS_Client software is released under the GNU General Public License. A copy of the license is included here.
| 12/13/04 | v0.16 |
First release - more enhancements still to come. |
| 12/20/04 | v0.17 | Added unique Event ID for each message to event viewer |
| 12/20/04 | v0.18 | changed myconnect method(this is the main command chooser) from switch case to ifelse to eliminate compile warnings. |
| 12/21/04 | v1.00 | Added FileAge - should now be completely compatibe with NS_Client |
| 12/23/04 | v1.01 | modified tcp Cient properties as per MS KB825724 and added NODELAy to TCPclient |
| 12/30/04 | v1.02 | modified Check Process to check WMI then Performance counters. |
| 12/30/04 | v1.03 | modified Check Process to check preformance counters then on failure check WMI |
| 01/03/05 | v1.04 | BUG FIX - MEMUSE not able to handle over 4GIG --fixed.. (Documentation updated) |
| 01/04/05 | v1.05 | BUG FIX - FILEAGE Fixed(was passing wrong arg), Whitespace trimed from ends of Tokenized Recieve buffer. Backup test of Process State fixed. |
| 01/04/05 | v1.06 | Modified Disk Usage to be more efficient. (Checks Drive letter up to Z: disk Types Checked-Local,Network,Compact,Ram) only tested local disk type3 |
| 1/21/05 | v1.07 | Added EVENTLOG check to NC_Net - Updated Documents - Tested ok (Need updated of check_nt or test console to run) |
| 1/24/05 | v1.08 | BUG FIX - Made sure Results send ERROR instead of -1 or Error |
| 1/25/05 | v1.09 | Added WMI CHeck |
| 2/3/05 | v2.10 | Passive checks added (memory issues being looked at) |
| 2/8/05 | v2.14 | Passive checks refined; BUG FIX Performance Counters |
NC_Net is a drop in replacement for NS_Client. On its own NS_Client must be used with Check_nt. This is no longer true with the current version of NC_Net. NC_Net when configured for Passive checks does not require Check_nt. Check_nt or NC_Net test console is required to send active checks or configuration commands to NC_Net remotely. Check_NT source code (enhanced for NC_Net) has been included in the NC_Net install. This file should be compiled to replace your check_nt for access to the enhancements that NC_Net has to offer. (check_NT enhanced for NC_Net source code also availible from NC_Net home page at www.shatterit.com/NC_Net )
Installing NC_Net onto the Windows Host
Make sure To uninstall older versions of NC_Net (this is done through Control Panel -> add remove programs)
Two known issues can exist durring installation. First issue; if you try to install NC_Net without previously uninstalling NC_Net. This causes the instances of NC_Net Service to conflic resulting in NC_Net being unable to install, uninstall or run. To fix this you must Manualy uninstall via regedit. See uninstall.txt for more details of the process. Second issue, when you go to start NC_Net the service dialog tells you that the service has stoped becaus it proboly had nothing to do. This is a issue with the event log settings. To repair the issue, Clear out the event logs. This is a problem with the autolog property of a service in Dot Net not being able to handle log files that are full durring startup of the service.
The installation will create new folders on the system:
Start Menu -->programs--> NC_Net
Program Files --> Shatter IT --> NC_Net
Please Note NC_Net has not been tested or configured to work with other versions
of windows other than English (US).
On the Unix machine
Configuration can be changed in many ways with the new version of NC_Net. but to keep it simple the best way to modify the configuration is through the startup.cfg file prior to starting NC_Net. This fle contains all the settings that can be modified for NC_Net - form changing ports and passwords to turning off particular featchers of NC_Net. The default startup.cfg comes perconfigured with the default setup equivalant to NS_client and NSCA. To run both passive and active checks using NC_Net a minimum configuration change of Host_passive and IP_passive need to be modified as well as adding a few service checks to passsive.cfg and there corrisponding checks into your Nagios configuration files.
Syntax:
check_nt -H HOSTNAME
minutes_range between 1 and
1440 (24 hours)
warning_percent and critical_percent : thresholds between 1
and 100.
Check_nt send Buffer:
password&2&interval
-only one interval is sent per TCP
Check_nt
receive Buffer: CPU_LOAD%
-only single number is returned
in the receive buffer
NC_Net uses a custom stack class (cpustack.cs) to
store and calculate CPU Load.
NC_Net saves the value the CPU load every 5
seconds.
CPU load value comes from performance counters:
CounterName = "% Processor Time"; CategoryName =
"Processor"; InstanceName = "_Total";
You
can check several intervals in one shot. The following command get the average
for the last 10min., 60min. and 24hours.
Check_nt sends out separate request
when multiple intervals are chosen.
Check_nt Result:CPU Load
(10 min. 22%)|10min=22
Check_NT Return Codes: 0 - ok;1 - Warning;2 -
Critical
Syntax: check_nt -H HOSTNAME -v
USEDDISKSPACE -l drive_letter -w warning_percent -c
critical_percent
drive letter should be only one character.
warning_percent and critical_percent : thresholds between 1 and 100.
NC_Net uses the WMI database to retrieve the Freespace and disk size of the
Logical disk:
WMI scope: "root\\cimv2"; WMI Class: "win32_logicalDisk";
drive: is the drive letter concatenated with a colen.
WMI Query: "SELECT
FreeSpace,Size,Name FROM "+WMI Class +" WHERE DriveType =
3 or...'"
Check_nt
send Buffer: password&4&driveLetter
-drive letter is
only a single logical drive letter (not case sensitive)
Check_nt
receive Buffer: freespace&size
- the result from NC_Net
is only the numbers for freespace and disk size from the WMI query
NC_Net returns -1&-1 to Check_nt if it was
unable to perform the query.
Disk Check uses the WMI
fatabase and queries the drive type 3 Local , 4 Network, 5 Compact Disc, 6 Ram
Disk; skipping drive types of 0 unknown, 1 No root Directory and 2
Removable. Not tested yet for ram,Compact or Network. It then tests the
name of the
drives found to see if it matches the single letter drive
name.
Example:./check_nt -H 192.168.1.1 -v USEDDISKSPACE -l C -w 80 -c 90
Check_nt Result: C:\ - total: 17.10 Gb - used: 11.03 Gb
(64%) - free 6.07 Gb (36%)|used=11838660608.00
Check_nt Result for -1&-1:
Free disk space : Invalid drive
Check_NT Return Codes: 0 - ok; 1 - Warning; 2 -
Critical; 127 - Unknown
Syntax: ./check_nt -H HOSTNAME
This plug-in doesn't care about warning or critical values. Only
the uptime of the machine is received.
Check_nt send Buffer: password&3
-all other arguments ignored
Check_nt
receive Buffer: uptime_in_seconds
- the only return is the
number for the uptime in seconds
Uptime value comes is actively checked when
request is received and comes from performance counter:
CounterName = "System Up Time";CategoryName =
"System";InstanceName = null;
Example: ./check_nt -H 192.168.1.1 -v UPTIME
Check_nt result: System Uptime :
2 day(s) 20 hour(s) 51 minute(s)|uptime=247918
Check_NT Return Codes: 0 - ok;
1 - Warning; 2 - Critical; 127 - Unknown
Syntax: check_nt -H HOSTNAME -v
SERVICESTATE [-d SHOWALL] -l service_1[,service_2,service_3,...]
service
should be the real name of the service or the displayed name. (Changed Form
NS_Client)
Put service name in ""quotes when space is in descriptive name.
-d SHOWALL can be specified if you want to see all tested services including
started ones.
ShowAll will display the result of each service in output.
-d SHOWFAIL - is the default will hide running services from displayed
results
You can specify serveral services in one request. No blank should
appear in the list !
If not all services are running, you get the faulty
one(s) and a critical state.
If any services are not services you will get
an warning and the service will be listed as unknown.
NC_Net uses
ServiceProcess.ServiceController.GetServices to get the list of services on the
system.
Check_nt send Buffer:
password&5&SHOW[ALL|FAIL]&Service[&service2]
-The third
argument is either showall or showfail. if it is anything else it will be
ignored (even if it is a service) and showfail will be used.
Check_nt
receive Buffer: return code&Detailed result
string
return codes from NC_Net are:
2 - critical at
least one service reported as not running
1 - warning - no services critical
but at least one service not found in service list.
0 -ok - all services
found and running.
Example:./check_nt -H 192.168.1.1 -p 1248 -v SERVICESTATE -d
SHOWALL -l LanmanServer,Schedule
CHeck_NT Result:
Lanmanserver: Started - Schedule: Started
Check_NT Result: All services are
running
Check_NT Return Codes: 0 - ok; 1 - Warning; 2 - Critical; 127 -
Unknown
Syntax: check_nt -H HOSTNAME -v PROCSTATE
[-d SHOWALL] -l process_1[,process_2,process_3,...]
processes- You can
find process name in the Windows NT Task Manager.
-d SHOWALL can be
specified if you want to see all tested processes including running ones.
ShowAll will display the result of each process in output.
-d SHOWFAIL -
is the default will hide running processes from displayed results
Since it
is checking Running processes That are listed in TaskManager make suree to
include .exe on name of process
Process names are not case sensitive.
You can specify several processes in one request. No blank should appear in
the list !
If not all processes are running, you get the faulty one(s) and a
critical state.
NC_Net uses System.Diagnostics.Process.GetProcesses to get
the list of processes.
Check_nt send Buffer:
password&6&SHOW[ALL|FAIL]&Process[&process2]
-The third
argument is either showall or showfail. if it is anything else it will be
ignored (even if it is a service) and showfail will be used.
-Processes are
not case sensitive but must match including the .exe
-idle and system always
return true, and are not actually checked.
Check_nt receive Buffer: return code&Detailed result string
return codes from NC_Net are:
2 - critical at least one process not found
in list of running processes
0 -ok - all processes
found.
Example: ./check_nt -H 192.168.1.1 -v PROCSTATE -l
NC_Net,nc_net.exe,NC_Net.exe -d SHOWALL
Check_NT Result: NC_Net: not
running - nc_net.exe: Running - NC_Net.exe: Running
Syntax: check_nt -H HOSTNAME -v CLIENTVERSION
Check_nt send Buffer: password&1
all other arguments are ignored
Check_nt receive buffer: Version_String
Return the NC_Net version.
Syntax: check_nt -H HOSTNAME -v MEMUSE [-w
warning_percent ] [-c critical_percent]
warning_percent and
critical_percent : thresholds between 1 and 100.
NC_Net uses performance
counters to retrieve results:
CounterName = "Commit Limit";CategoryName =
"Memory";InstanceName = null;
CounterName = "Committed Bytes";CategoryName =
"Memory";InstanceName = null;
Check_nt send Buffer:
password&7
all other arguments are ignored
Check_nt receive buffer: Commit Limit&Committed Bytes
-NC_Net returns the commit
limit value and the Commited Bytes back to Check_nt.
IF there was a problem
retrieving these values NC_Net will return:
-1&Could not process memory
usage check
Example: ./check_nt -H 192.168.1.1 -p 1248 -v MEMUSE -w
80 -c 90
CHeck_NT result: Memory usage:
total:619.08 Mb - used: 315.64 Mb (51%) - free: 303.44 Mb
(49%)|used=303.44
Check_nt_SendBuffer:
None&9&path_to_file
Check_NT_reciev Buffer:
(Systemtime-fileMod_time in min) & ( fileMOd Time- 01/01/1970 00:00:00 in
seconds)
wildcards that are ok to use: * or ?
? replaces single
character.
* replaces all character till delimiter (dot)
*.* return
directory file age.
If more than one file matches search - first file from
function call is used.
filename_path : file to check. Don’t
forget to use \\ for each \ (c:\\autoexec.bat). Filename can contain wildcards
(c:\\*.bat). When wildcards are used the returned date will be from the first
file found that matches and there is no order to the files. If no file matching
the pattern is found it will return that no file was found except if the
wildcard *.* is used, then it will return the last modification time of the
directory.
date_format :the string and formatted to display
the file's date. The date is passed to the strftime function which requires uses
specific characters to format the date. View this reference on the strftime
function for all its options. If the date format is not passed, it will default
to Date: mm/dd/yyyy hh:mm:ss am/pm. This default is compiled into the check_nt
source code. If you want a different default, find the following line in
check_nt.c
strftime(description, 50, "Date: %D %I:%M:%S %p",
localtime(&rettime));
Change the "Date: %D %I:%M:%S %p" part to what you
want as a default and then compile it.
warning and critical : maximum number
of minutes since the last update of the file.
Example:
./check_nt -H 192.168.1.1 -p 1248 -v
FILEAGE –l "c:\\program files\\nsclient\\pnsclient.exe" -w 1440 -c
2880
./check_nt -H 192.168.1.1 -p 1248 -v FILEAGE -l "c:\\program
files\\nsclient\\pnsclient.exe","Date: %d-%m-%Y %I:%M:%S %p" -w 1440 -c
2880
Syntax: check_nt -H HOSTNAME -v COUNTER -l
counter_name[,counter_description] [-w warning_percent ] [-c critical_percent]
counter_name is the exact description of the Windows counter. Is must be
enclosed in " and the \ should be doubled.
counter_description is the
description which is displayed in the "Service Information" column. This string
is passed to a printf command, which means that you need to give a precise and
specific syntax to make it work. Have a look at any printf reference to get the
full story about this C command.
Here are some examples:
"Paging file
usage is %.2f %%" (the %.2f means that the result will be displayed with two
decimal digits. The double % represents a % in the printf syntax.
"%.f %% of
the paging file used" (the %.f means that the result will be displayed with no
decimal digit.
warning_percent and critical_percent : thresholds between 1
and 100. If warning_percent is higher than critical_percent, the check is
reversed.
Check_nt send Buffer:
password&8&Counter_to_check&CounterDescription
- the counter
description is used by Check_nt and is not expected or used by
NC_Net
Check_nt receive Buffer: decimal_counter_value
- a
decimal value is returned by NC_Net
- 5 decimal places are
retuned
error returns to Check_nt:
-1&No Counter
to check (if less than three arguments sent to NC_Net)
-1&Could not
process check counter (if the Performance counter check failed for any
reason.)
Details of CHECK_NT results:
IF warning
equals critical and less then result then OK
Else If warning equals critical
and greater than or equal result then CRITICAL
Else If could not find Counter
then result is -1 plug in returns OK -no Unknown for check
Else If warning
less than Critical and critical less than or equal result then critical
Else
If warning less than Critical and warning less than or equal result then
Warning
Else If warning less than critical and result less than warning than
OK
Else If critical less than warning and result less than or equal critical
then Critical
Else If critical less than warning and result less than or
equal warning then Warning
Else If critical less than warning and result
greater than warning then OK
Example:
./check_nt -H
192.168.1.1 -p 1248 -v COUNTER -l "\\Paging File(_Total)\\%% Usage","Paging file
usage is %.2f %%" -w 80 -c 90
./check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l
"\\Process(_Total)\\Thread Count","Thread Count: %.f" -w 600 -c
800
./check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l "\\Server\\Server
Sessions","Server Sessions: %.f" -w 20 -c 30
Syntax: check_nt -H HOSTNAME -v INSTANCES -l
Category_object[,Category_object2]
Category_object is a Windows Performance Counter Categroy (e.g.. Process), if
it is two words, it should be enclosed in quotes
The returned results will be a comma-separated list of instances on the
selected Category.
accept comma separated list of categories to check.
Old NS_client called the Categories - Perfmon Counter Objects. (categories is
used in Dot NET Documentation)
The purpose of this is to be run from command line to determine what instances
are available for monitoring without having to log onto the Windows server to
run Perfmon directly.
Check_nt send Buffer:
password&10&Category&Category
NC_Net can check multiple categories
Categories are not case sensitive
Check_nt receive Buffer: String of descriptive results.
Basic output is the category name colon comma separated list of instances.
if no instances then the list is empty.
Categories outputs are delimted with a dash - in the output.
If a Category does not exist it will also list an empty list.
Output String Format:
Cat1: inst1,inst2,inst3 - cat2: inst1,inst2
Example
./check_nt -H 192.168.1.1 -p 1248 -v INSTANCES -l Process
Syntax: check_nt -H HOSTNAME -v EVENTLOG -l "Event Log,Event Type, Time Interval min, Evt Source List size, [Event source list], Evt Desc list size [RegExp list], Evt ID list size, [Event ID filter list]"
ok the command line here looks bad but it is not so bad once you look at what it does. To start with The first two parameters are either a single log, type or type use the keywork 'any' (no quotes) to check all types or logs. the third parameter is for time interval in minutes (how old do you want the events entrys to check) If you enter 0 it will check all entries in the specified log. these three parameters are done first to cut the list down before applying any of the other filters. THe rest of the filters are comma seperated list with the the first being a integer representing the type and quantity of that filter. A 0 is to not use the fillter, so the next parameter is the next filters count. A negitive number means to exclude the items in the list. A positive number means only check items from the list. I did not see any logical reason to implement both of these at the same time ( I am sure someone will have a reason that they want to do it) but it seems to be reasonable to either exclude special items (neg number), include only listed items( positive number) or not use that filter (Zero). Filter are applied one at a time to the list that was constructed usig the Log, Type and time interval. Filters are applied in the order specified by the command syntex. The fist of the filters is the Event Source. Since a Source needs to be registed in Windows it will do this check before processind and post a warining in event viewer if the Source is not registed. The second Filter accepts regular expressions and looks for matches in the message of the event.. The Last Filter is for the Event ID. All event id must be numbers. If it is not a valid integer it will be converted to a 0 and reported as a warning to event viewer.
Example: to check all events in last 5 min from any source, in any log, of any type use -l "any,any,5,0,0,0"
to check for event ids 100 through 105 in the last day use -l "any,any,1440,0,0,6,100,101,102,103,104,105
To check for the regular expresion for start/stop only in system log -l "System,any,5,0,1,\(stop|start\),0
or you can use: -l "System,any,5,0,2,stop,start,0
The regular expresions are using the Dot net RegExp class. and have the options set for multiline ( the multiline allows ^ and $ to function based on lines instead of the Dot Net Default that only checks the begining and end of the string varable. and the Option of Whitespace pattern matching ignored. any problems with syntex of the command let me know.
The output of event log is a list of the event ID of all events that match the search. as well as the message from the last Message posted. (the last message is calculated based on the time writen, if multiple entries have the same time writen, then the entry that was checked first will be the one that is reported.) using event Index would be bad becasue the index is not uniform accross logs and all the entries were merged into a aingle list.
Check_nt send Buffer: password&11&long string of parameters
Check_nt receive Buffer: Return Code#String of descriptive
results.
Socket timeout after 10 seconds
return Code:2
If timeouts occure frequently increas the timout with the -t switch for
check_nt
C# is not always as fast as NS_Client due to the Virtual Machine as well as
most checks dynamicly gathering info after the check is invoked.
Syntax: check_nt -H HOSTNAME -v WMICHECK -l "<Namespace>[&<Select> | &<FULL SELECT>][&<CLass>][&<condition>]
checks wmi for a specific querry and return result chopping the result at the output limit... input are seperate args from input string is seperated with ampersands so that commas can be used withing the query WMI uses a sql type of querry lannguage- details of WMI can be found with WMI development kit or on MSDNCheck_NT send Buffer Password&12&<Namespace>[&<select>[&<class>[&<where] ] ]
Check_Nt Reciev buffer String- output form command or ERROR
Windows Service
NC_Net has been developed using
C# and Dot Net Framework. It is installed as a service.
Every five seconds, NC_Net will query Windows to get the CPU load and store this information in a circular stack which keeps the measures for the last 24 hours. When a check is requested, NC_Net will dynamicly run the check. sometimes this takes longer than disired (which case timeout of check_nt may need to be increased to 15 seconds). NC_Net has been designed to only process one active check at a time. When Multiple active checks are sent to NC_Net they should be buffered by the TCP_socket (I have not tested this, assuming that this is default of TCP_Listener class) More time needs to be invested into cleanup of the functions and global reference varables to allow for multiple checks at once.
Have a look at the source code in case you want to know more about it. It is documented, Contact me if you have any questions about the code. Most of the code was developed from scratch based on the expected behavour and tests of NS_client to maintain compatibility with existing Nagios configurations running check_NT plug-in. Borrowed code should be labled as to were its source was. (Please Notify me if any code was not appropiatly credited)
NC_Net v1.xx has all the functionality of NS_Client if anyone has a problem, please report it. All runtime errors should have reported the error to the event application log, either as a warning for handled exception or as a error for unhandled exceptions. When reporting errors please include a copy of the event log details for the event. 90% of the code is inside exception handlers to make sure a event log item are writen , however not all events writen have enough data to solve the problem encountered. These exceptions will be upgraded on an as need base.
NC_Net V2.xx (in development)will be implementing passive checks as well as active checks. the service should be able to manage both at the same time.
NC_Net V3.xx (Not Started) will be a stable more final release of version 2 with both active and passive checks as well as some other tools for configuring and viewing the check results from the active server.
Unix plug in
Check_NT is included in the Nagios Plug-ins and is not packaged with NC_Net. However I have been planning some updates to Check_nt to allow more functinality and easier inputs to NC_Net (these modifications should not change the current functionallity of Check_Nt ) (Currently Not started)
Let us know if any problems are encountered.
All errors from NC_Net will be reported to the Application Log in Event Viewer.
More work needs to be accomplished on the error reporting. It should on all cases list an event however the meaning of the event is not always very meaningful. The event reporting will be updated slowly as new revisions are released. For the most part the event reporting updates will focus on Fixing current problems.
bug fix - MEMUSE was unable to process numebrs over 4gig - FIXED as of v1.04
Contact us: amontibello@shatterit.com
Site: www.shatterit.com
#Bassic
Nagios Command for Check_nt
command[check_nt_disk]=$USER1$/check_nt
-H $HOSTADDRESS$ -p 1248 -v USEDDISKSPACE -l $ARG1$ -w $ARG2$ -c $ARG3$
command[check_nt_cpuload]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v CPULOAD
-l $ARG1$
command[check_nt_uptime]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v UPTIME
command[check_nt_clientversion]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v
CLIENTVERSION
command[check_nt_process]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v PROCSTATE
-l $ARG1$
command[check_nt_service]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v
SERVICESTATE -l $ARG1$
command[check_nt_memuse]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v MEMUSE -w
$ARG1$ -c $ARG2$
command[check_nt_fileage]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v FILEAGE
–l $ARG1$ -w $ARG2$ -c $ARG3$
# Custom counters (one per required counter).
command[check_nt_pagingfile]=$USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v
COUNTER -l "\\Paging
File(_Total)\\%% Usage","Paging File usage is %.2f %%" -w
$ARG1$ -c $ARG2$
#Do not run Instance from Nagios -- run from prompt to check instance names of perfomance counter
/check_nt -H $HOSTADDRESS$ -p 1248 -v INSTANCE -l "Paging File","Memory"