Monitoring the Health of a Container: Akana System Health Tool

This technical note provides information about the Akana System Health tool and its associated API.

Using the Admin Console Managing Containers

Supported Platforms: 8.2 and later

Table of Contents

  1. Summary
  2. Configuration
  3. Panels
  4. Settings
  5. System Health Tool API
  6. System Health Tool API: Links
  7. Thresholds
  8. Checking Health Statistics
  9. Custom Panel
  10. System Health tool: additional information
  11. What's Next?

Summary

Before API Platform version 8.2, container health statistics were only available visually through the Monitoring Tool (see Using the Admin Monitoring Tool). Since these statistics were only available through the GUI, there was no way to use the data for operational use. With the introduction of the System Health Tool in version 8.2, you can now retrieve these health statistics by making a GET call to any Akana container. In addition, the System Health Tool allows thresholds to be configured for any monitored attributes, providing the means to get information on any values that fall outside normal or expected ranges.

This is a feature of the core platform and can be leveraged by all containers, including Policy Manager, Community Manager, and Envision. A primary use case for this tool is to check the health of the container to determine if it is ready to handle traffic.

The health statistics available are the same as the Monitoring Tool, which means that any standard OSGi Monitorable instance in the system can be tracked, including but not limited to:

  • Outgoing HTTP connection pool statistics
  • Incoming HTTP thread pools
  • Database connection pools
  • Container memory usage
  • Usage monitoring queues
  • JMS connections
  • Container configuration state
  • Container lifecycle

Note: The system health statistics reflect real-time information about the container. The System Health Tool does not store this information. You can either view real-time information, or you can have an external monitoring tool call the System Health Tool so that you can store and track the data.

All real-time. If they have their own monitoring tool they can call the health APIs and store the info in their monitoring tool.

Back to top

Configuration

All containers for version 8.2 and later have the System Health Tool installed when any of the PM, CM, or ND features are installed. No additional configuration is required.

You can explore this feature through the Health tab in the Akana Administration Console:

Health tab

Back to top

Panels

A panel is a grouping of various types of system health information. Each panel can have any combination of types of system health information that you want to monitor.

Depending on the features installed, the Health tab will include a number of pre-configured panels. You can expand them to view detailed monitoring information, as shown below.

detailed monitoring information

Back to top

Settings

The slider in the top right corner allows you to change the frequency of the polling interval. In addition, each panel has a toggle to enable or disable authentication when accessing the System Health Tool API using the links provided.

settings

Back to top

System Health Tool API

You can access the API by using a GET method on /admin/health context for any Akana container. For example, if the container Akana Administration Console URL is http://acme.akana.com:9900/admin, the URL for the System Health Tool API is http://acme.akana.com:9900/admin/health/measurables/akana.system.health.

A sample response is shown below.

System Health API: Response

{
  "links": [
    {
      "rel": "self",
      "href": "http://acme.akana.com:9905/admin/health/"
    },
    {
      "rel": "measurables",
      "href": "http://acme.akana.com:9905/admin/health/measurables"
    },
    {
      "rel": "available",
      "href": "http://acme.akana.com:9905/admin/health/available"
    }
  ],
  "measurables": [
    {
      "id": "akana.system.health",
      "name": "System Health",
      "path": "akana.system.health",
      "state": "NORMAL",
      "childCount": 2,
      "editable": false,
      "options": {
        "enableAuth": false,
        "links": [
          {
            "rel": "self",
            "href": "http://acme.akana.com:9905/admin/health/measurables/akana.system.health/configuration"
          }
        ]
      },
      "links": [
        {
          "rel": "self",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.system.health"
        },
        {
          "rel": "brief",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.system.health?brief=true"
        },
        {
          "rel": "children",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.system.health/children"
        },
        {
          "rel": "options",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.system.health/configuration"
        },
        {
          "rel": "values",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.system.health/values"
        }
      ]
    },
    {
      "id": "akana.service.container.readiness",
      "name": "Service Container Readiness",
      "path": "akana.service.container.readiness",
      "state": "NORMAL",
      "childCount": 6,
      "editable": false,
      "options": {
        "enableAuth": false,
        "links": [
          {
            "rel": "self",
            "href": "http://acme.akana.com:9905/admin/health/measurables/akana.service.container.readiness/configuration"
          }
        ]
      },
      "links": [
        {
          "rel": "self",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.service.container.readiness"
        },
        {
          "rel": "brief",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.service.container.readiness?brief=true"
        },
        {
          "rel": "children",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.service.container.readiness/children"
        },
        {
          "rel": "options",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.service.container.readiness/configuration"
        },
        {
          "rel": "values",
          "href": "http://acme.akana.com:9905/admin/health/measurables/akana.service.container.readiness/values"
        }
      ]
    }
  ]
}

This provides an overview of the available health monitors (named measurables), and a set of links that provide access to more detailed information. Each of the measurables in the response corresponds to a monitoring panel shown on the Akana Administration Console.

Back to top

You can follow each of the links on the response to view different health dimensions as well as the configuration info.

For example, to get detailed information on the akana.system.health category, you would perform an HTTP GET using the self link. In the above example, the link is http://acme.akana.com:9900/admin/health/measurables/akana.system.health.

Or, for example, if you want to view all the available configurable options, you can fetch http://acme.akana.com:9900/admin/health/available.

Back to top

Thresholds

Each health monitor may have a set of threshold settings. These settings are used to indicate current health status using three values: NORMAL, WARNING, or FAILURE.

System default health monitors have the threshold ranges predefined. To view or modify the values, go to the Akana Administration Console and click the status icon, as shown below.

warning threshold

Thresholds are defined using the Apache Java Expression Language syntax. For more information, refer to the JEXL overview on the Apache Commons site. Currently, only a single variable named value is accessible, which is the current value of the monitored property.

Back to top

Checking Health Statistics

You can check the system health information by following the links provided in the health monitor's panel.

An example of how you can use this feature is to provide load balancers with an appropriate status based on the health of any aspect of container operation.

links

Checking Health Statistics: Response

{
  "id":"akana.system.health",
  "name":"System Health",
  "path":"akana.system.health",
  "state":"NORMAL",
  "childCount":2,
  "children":[
    {
      "id":"akana.health.lifecycle",
      "path":"akana.system.health/akana.health.lifecycle",
      "state":"NORMAL",
      "attributes":[
        {
          "type":"2",
          "name":"lifecycle.state",
          "path":"akana.system.health/akana.health.lifecycle/lifecycle.state",
          "description":"The current system lifecycle state",
          "threshold":{
            "normal":"value eq 'STARTED'",
            "failure":"value ne 'STARTED'",
            "links":[
              {
                "rel":"self",
                "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/children/akana.health.lifecycle/variables/lifecycle.state/threshold"
              }
            ]
          },
          "state":"NORMAL",
          "value":"STARTED",
          "links":[
            {
              "rel":"self",
              "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/children/akana.health.lifecycle/variables/lifecycle.state"
            }
          ]
        }
      ],
      "editable":false,
      "links":[
        {
          "rel":"self",
          "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/children/akana.health.lifecycle"
        },
        {
          "rel":"values",
          "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.health.lifecycle/values"
        }
      ]
    },
    {
      "id":"com.soa.vmstats",
      "path":"akana.system.health/com.soa.vmstats",
      "state":"NORMAL",
      "attributes":[
        {
          "type":"1",
          "name":"free.memory.pct",
          "path":"akana.system.health/com.soa.vmstats/free.memory.pct",
          "description":"The amount of available heap memory as a percentage of the max",
          "threshold":{
            "normal":"value ge 20",
            "warning":"(value lt 20) and (value ge 5)",
            "failure":"value lt 5",
            "links":[
              {
                "rel":"self",
                "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/children/com.soa.vmstats/variables/free.memory.pct/threshold"
              }
            ]
          },
          "state":"NORMAL",
          "value":21.0,
          "links":[
            {
              "rel":"self",
              "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/children/com.soa.vmstats/variables/free.memory.pct"
            }
          ]
        }
      ],
      "editable":false,
      "links":[
        {
          "rel":"self",
          "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/children/com.soa.vmstats"
        },
        {
          "rel":"values",
          "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/com.soa.vmstats/values"
        }
      ]
    }
  ],
  "editable":false,
  "options":{
    "enableAuth":true,
    "links":[
      {
        "rel":"self",
        "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/configuration"
      }
    ]
  },
  "links":[
    {
      "rel":"self",
      "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health"
    },
    {
      "rel":"brief",
      "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health?brief=true"
    },
    {
      "rel":"children",
      "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/children"
    },
    {
      "rel":"options",
      "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/configuration"
    },
    {
      "rel":"values",
      "href":"https://rcoaless.apiportal.akana.com:443/admin/health/measurables/akana.system.health/values"
    }
  ]
}

Using query parameters to define the HTTP response status code

If the health category is NORMAL, WARNING, or FAILURE, you can use an optional set of query parameters to control the HTTP status code returned for each of the thresholds. This avoids the need to parse the JSON response content, allowing decisions to be made solely on the response code.

The following query parameters are valid for specifying response HTTP status when you run the GET operation on the health category:

  • normal-status
  • warning-status
  • failure-status

In the example below, the GET call to check container readiness sets a warning status of 503 and a failure status of 503. This will return 503 until the health check is in the NORMAL state, at which point it will return 200.

GET http://acme.akana.com:9900/admin/health/measurables/akana.service.container.readiness?brief=true&warning-status=503&failure-status=503

Back to top

Custom Panel

To display a custom set of system health statistics, you can create your own panel. To create a new panel, click the plus icon. Once the panel has been created, you can then add your own combination of health monitors.

The sample custom panel below is configured on the Policy Manager container and is called Health Monitoring for ACME. The failure threshold is set to anything below the currently provisioned number of PM APIs.

The threshold below defines a normal condition where there is greater than 20% of capacity remaining in the pool.

links

The threshold below defines a warning condition where between 20% and 5% of capacity remains in the pool.

links

The threshold below defines a failure condition when less than 5% of capacity remains in the pool.

links

To check the status of this custom panel, you would simply follow the links provided, which in this case might be http://acme.akana.com:9900/admin/health/measurables/health.monitoring.for.acme.

Back to top

System Health tool: additional information

Each of the settings in the System Health tool has field-level help. This field-level help is also available in the generated documentation, in the /docs folder of your installation and online. You can review:

The Health Monitor API returns information for a specific container. When reviewing container health, make sure you review the information for the correct container, or for all containers if needed. Note that each container might not have all the health statistics, depending on the container. For example, an ND container won’t have any DB stats as it doesn’t connect to the database.

General recommendations

How you use the System Health tool is entirely your own choice. However, here are some general recommendations of key items that it might be a good idea to monitor:

  • It's a good idea to monitor CPU and memory at the VM level. Notes re com.soa.vmstats:
    • The memory usage might fluctuate a lot depending on the use of the container.
    • A high memory usage does not indicate a specific issue other than that the container is under load or processing data of some sort.
  • For containers connecting to the database, it's a good idea to monitor com.soa.db.statsavailable.pool.headroom.pct OR available.pool.headroom. This shows how many connections are being used by the DB. If it gets too high, potentially the container will run out of database connections in the pool, and errors might start occurring.
  • akana.health.lifecycle – lifecycle.state: this indicates that the container is STARTED (or not).
  • com.soa.container.config.config.stats – configuration.state: make sure that the state is CONFIGURED.
  • com.soa.transport.http.clientpool.headroom OR pool.headroom.pct: this indicates the available connections in the pool for outbound HTTP traffic from the container. For a Network Director container, this generally means connections to Policy Manager and connections to the backend service. For PM/CM, it is still for outbound HTTP traffic but less of a concern. In either case, if the available space is starting to reach its max the container potentially might not have connections available to make any outbound HTTP traffic.
  • soa.jetty.connectorconnections.open OR connections.open.max: this indicates the connections being used for each inbound listener on the container. If the connections are near the max configured number, there might not be connections available for inbound traffic.

Back to top

What's Next?

Container health data can be polled and collected into a data store. You can use the collected data to gain valuable operational insight.

Back to top