Using Nagios to monitor the status of services is great for peace of mind and technology stability. Previously I setup Airtime to be monitored by a Docker powered API that checks a stream for DB range. If the DB range is below a threshold, it considers it quiet and starts counting. Every second when this count increases, Nagios monitors the change and has parameters set for warning and critical time periods.

Goal

SAML is a complicated beast with hundreds of points of failure. Having an automated test that is monitored by Nagios was the goal after the SAML outage October 2015. I've implemented a simpleSAMLPHP Service Provider that works great with our Shibboleth IDP, but there isn't a way to test the full authentication flow because of CAS.

Problem

CAS is the middle man between the IDP and the browser. CAS provides a consistent interface for handling authentication. Shibboleth asks CAS who is logged in and passes the required information to our Service Providers from LDAP.

When following the authentication flow, there has to be a browser to enter credentials into CAS. This is where PhantomJS comes in. By emulating the a browser, it can programmatically follow SAML redirects and enter test credentials into CAS.

SimpleSAMLPHP

After configuring SimpleSAMLPHP, I created a simple php test page that dumps the user attributes to JSON.

/root/simplesaml-psd/custom-www/test.php

<?php  
  require_once('/var/simplesamlphp/lib/_autoload.php');
  $as = new SimpleSAML_Auth_Simple('default-sp');
  $as->requireAuth(array(
    'saml:idp' => 'https://idptest.psd401.net/idp/shibboleth',
  ));
  $attributes = $as->getAttributes();

  echo json_encode($attributes); 
?>

Simply, it requires the simpleSAML api, then requires authentication. When authentication is returned, echo all attributes as JSON.
The output is similar to this {"login_id":["martinb"],"email":["martinb@psd401.net"],"department":["TECH SUPPORT"],"givenName":["BRANDON"],"role":["teacher"],"id":["psd_martinb"],"familyName":["MARTIN"]}

This JSON will be used to test against.

Phantom JS

PhantomJS is kinda tricky to work with, which is why you find libraries like CasperJS to handle the flow and asynchronous callbacks. I found a minimal PhantomJS Docker image that installs the dependencies.

/root/phantomjs-samltet/docker-compose.yml

samltest:  
        image: servebox/phantomjs:latest
        container_name: saml_monitor
        ports:
                - "9969:8080"
        volumes:
                - ./scripts:/var/scripts:rw
        command: phantomjs /var/scripts/samltest.js

Docker-Compose adds a volume to this image that lives within the same directory, and runs the samltest.js script within on startup.

/root/phantomjs-samltest/scripts/samltest.js

var webserver = require('webserver');  
var server = webserver.create();


var config = {  
        username: 'atest',
        password: 'password'
};

var service = server.listen(8080, function(request, response) {

  response.statusCode = 200;
  logout(request, response);

});


function logout(request, response) {  
        var page = require('webpage').create();
        page.open("https://simplesaml.psd401.net/simplesaml/module.php/core/authenticate.php?as=default-sp&logout", function(){
                renderPage("https://simplesaml.psd401.net/test.php", request, response);
        });
}

This first section sets up a web server provided by PhantomJS. This will allow Nagios to hit this via http on an interval. Each time the script is hit, it must log out of SAML because SimpleSAMLPHP will remember and keep it logged in, thus defeating the purpose of monitoring.

After the logout page is opened, the renderPage function is ran.

var atCAS = false;  
var atSAML = false;  
var finishedSAML = false;


function renderPage(url, request, response) {  
  var page = require('webpage').create();
  var redirectURL = null;

  page.onResourceReceived = function(resource) {
    if (resource.status !== 200 && resource.status !== 302) {
        var error = {
                        url: resource.url,
                        status: resource.status,
                        error: 1
                };
                response.write(JSON.stringify(error));
                response.close();
    }
        if (resource.redirectURL) {
        redirectURL = resource.redirectURL;
        if (redirectURL.indexOf("https://login.psd401.net") != -1){
            atCAS = true;
        } else if (redirectURL == "https://simplesaml.psd401.net/test.php") {
            atCAS = false;
            atSAML = true;
            renderPage(redirectURL, request, response);
        } else {
            atCAS = false;
            atSAML = false;
        }
    }
  }

  page.onResourceRequested = function(requestData, request) {
    if ((/google-analytics\.com/gi).test(requestData['url']) || (/ttf/gi).test(requestData['url']) || (/svg/gi).test(requestData['url'])){
        request.abort();
    }
  };

  page.open(url, function(status) {
    if (status != 'success') {
        var error = {
            url: url,
            status: status,
            error: 1
        };
        response.write(error);
        response.close();
    }
  });

  page.onLoadFinished = function(status) {
        if (atCAS) {
        logInCAS(page, response);
        }
    if (atSAML) {
        checkSAMLResult(page, response);
    }

    if(finishedSAML){
        getFinalData(page, response);
    }
  };


}

There are three primary steps involved that the function needs to keep track of. atCAS, atSAML, finishedSAML.

  • page.onResourceReceived is called whenever a new file is received. We check for status errors, and follow redirects.
  • page.onResourceRequested had to be implemented because Phantom was getting caught up on Google Analytics, font files, and svg files.. We need to cancel the requests for these files.
  • page.open is used for more error handling.
  • page.onLoadFinished is where we check to make sure we evaluate the loaded page correctly.
function logInCAS(page, response) {  
    page.evaluate(function(config){
        document.querySelector("input[name='username']").value = config.username;
        document.querySelector("input[name='password']").value = config.password;
        document.querySelector("form[name='login']").submit();
    }, config);
    atCAS = false;
}

Here we evaluate the page and pass the credentials to CAS and press the login button. Then CAS redirects us back to the idp, then finally back to the page we initially requested. One would think this is all we need, that we then should be able to take the response of the SimpleSAMLPHP test page, but unfortunately not...

What I found was there is a page that appears before SimpleSAML allows the browser through. This page is blank, but has a form that passes a key back to SimpleSAML. There is javascript that onload clicks the form submit button. So in a browser you don't even see it, but if you don't have Javascript enabled, you must click the continue button.

function checkSAMLResult(page, response) {  
    page.evaluate(function(){
        if (document.forms[0]){
            document.forms[0].submit()
        }
    });
    atSAML = false;
    finishedSAML = true;
}

So here we check to make sure there is a form on the page, if there is, we submit it and set finishedSAML to true.

function getFinalData(page, response) {

    var userData = page.evaluate(function(){
        if (document.forms.length > 0){
            return "";
        }
        try {
            return JSON.parse(document.querySelector("body").innerHTML);
        } catch(e) {
            return "error";
        }
    });

    if (userData != "") {
        if (userData["login_id"]) {
            userData["login_id"] = userData["login_id"][0];
            userData["error"] = 0;
        } else {
            userData = {
                url: page.url,
                status: "Could not parse user",
                error: 1
            }
        }
        response.write(JSON.stringify(userData));
        response.close();
    } else {
        checkSAMLResult(page, response);
    }
}

Here we evaluate the page again. First step is to make sure there still isn't any forms on the page. If there are, go back to the previous function by setting userData to an error string.

If there is no form, try to parse the body of the HTML as JSON. If this fails, return an error.

Once the evaluation is finished, parse the results. Replace the "login_id" with it's 0th index (SimpleSAML nests everything in arrays...), for Nagios use later. Or create an error object that gives some kind of clue.

Write the userData to the server and close the response. The output is mostly identical to SimepleSAMLphp test.php output, but it has some primitive error handling.

Nagios

Using the same nagios command I used for Airtime, this is the final Nagios command

define service{  
        use                     generic-service
        host_name               vmnocwsdocker01
        service_description     SAML Monitor Testing
        first_notification_delay 15
        check_command           check_json!-H servername -P 9969 -c error,1 --key_equals login_id,"atest" error,0 -m email
        check_interval          30
}

Basically this checks to make sure the login_id is atest and that the error is 0. If neither of these conditions apply it throws a warning. I set error,1 to critical but wasn't able to get a CRITICAL response. Here it is in the interface.

Nagios monitoring