RHQ, the common services project for infrastructure management

  Dashboard > RHQ-Project > ... > Design Documentation > Design-Agent
  RHQ-Project Log In | Sign Up   View a printable version of the current page.  
  Design-Agent
Added by John Mazzitelli , last edited by Greg Hinkle on Feb 18, 2008  (view change)
Labels: 
(None)

Agent

Index

Overview

The agent is a Java process that supports local execution of management tasks and provides for a single, reliable and secure flow of data between the enterprise server and managed services. The Agent supports and deploys the Plugin Container module in which the AMPS plugins run. It also maintains the security and remote connection contexts with the server. This simplifies the development of the plugin container by abstracting away concepts like reliability and message priority and throttling. The server and agent client APIs look like simple Java calls to the service, but are actually managed over a pojo transport that delivers these features. The Agent also deals with the persisted configuration storage and the lifecycle of the Plugin Container. The Agent can restart the Plugin Container and reconfigure it as needed to support hot deployment or upgrade. (The agent does not yet support upgrades of itself, but should in the future)

Agent Features

The agent was designed with the following features described below.

Preferences

The configuration file is in XML format and the properties (called preferences) are persisted on a per-user basis in an OS-specific way (i.e. on Windows, they are stored in the registry; on UNIX in a directory located under the user's home directory). The configuration preferences will, therefore, survive agent upgrades or re-installs.

You can define different configuration preferences, allowing you to start up the agent in different configurations. You do this by specifying the -p command line argument of the agent to the name of the preferences node as specified in the node element in the configuration XML file under the parent "jboss-on-agent" node.

The XML file's schema is that which is required of the Java Preferences API. The agent configuration schema version (not to be confused with the XML Schema) defines the version level that the configuration is at. This allows future versions of the agent to be able to change the configuration preference definitions while providing a means to allow auto-migration of the old configuration data (at the old version level) to the new configuration version level. When the agent starts, its preferences are checked to see if they can be upgraded, and if so, upgrades the configuration to the latest schema. This is helpful when you upgrade the agent - when that new agent starts up, it makes sure any previous configuration is carried over with new preferences added and old, obsolete preferences getting deleted.

A default configuration file ships in the agent's jar and it is documented inline.

When a new agent that has yet to be setup starts up, it automatically begins asking configuration setup questions (just as if the setup prompt command was invoked). This happens once - thereafter, the agent will use its configuration preferences as setup by the user. You can re-setup the agent by simply executing the "setup" prompt command. The setup command allows you to perform the basic setup (which is the same as when starting a newly installed agent) or, alternatively, it allows you to perform a more advanced setup that let's you fine tune the agent's settings.

Command Line Options

The following are the command line options you can give to the agent:

-a, --advanced If setup is needed at startup, the advanced setup is run, rather than the basic
-c, --config=<filename> explicitly specifies the configuration file the agent is to use for its configuration; if the preferences node in the file is different than "default", you must specify that preference node name via --pref
-d, --daemon if specified, keyboard input will not be read by the agent. If a --input input file is specified, it will be processed
-D<name>[=<value>] overrides a configuration preference with the given name with the given value. This also sets the JVM's system property with the same name/value pair (which does not necessarily have to map to a valid agent configuration preference)
-h, --help displays the help text for the command line arguments of the agent
-i, --input=<filename> specifies an input script containing agent prompt commands that the agent should execute upon startup; if not specified, keyboard input will be read
-l, --cleanconfig Clears out any existing configuration and purges the agent's persisted data so the agent starts with a clean slate. The default configuration file will be loaded if --config is not specified. If you only wish to purge the agent's persisted data, without cleaning its configuration settings, use --purgedata.
-n, --nostart If specified, the agent will not be automatically started
-o, --output=<filename> specifies an output file that all non-log output will be written to. This does not affect the log messages - those go to the file as specified in the log4j.xml configuration
-p, --pref=<preferences name> defines the preferences node name that the agent's configuration is known as. This allows the agent to reuse persisted preferences under this preferences name (and is how you can have multiple configuration sets defined and start the agent with one of them)
-s, --setup Forces the agent to ask basic setup questions, even if the agent has already been fully configured. If --advanced was also specified, the advanced setup questions will be asked.
-t, --nonative Forces the agent to disable the native system, even if the agent was configured for it.
-u, --purgedata Purges the agent's persistent inventory and other data files. This does not erase the agent's configuration settings, use --cleanconfig if you wish to clean the agent's configuration along with purging its persisted data.

Prompt Commands

The agent can process simple prompt commands entered as either input from a file (see command line option --input) or as input from the keyboard (assuming --daemon is not specified as a command line argument). You can see the list of prompt commands that are accepted by the agent by entering "help" at the prompt. To get detailed help on a particular prompt command, enter "help" followed by the name of the prompt command you are interested in. The following are the current set of prompt commands that the agent understands:

avail Get availability of inventoried resources
config Manages the agent configuration
download Downloads a file from the RHQ Server
discovery Asks a plugin to run a server scan discovery
dumpspool Shows the entries found in the command spool file
execute Executes an external program
exit Shuts down the agent's communications services and kills the agent
getconfig Displays one, several or all agent configuration preferences
help Shows help for a given command
identify Asks to identify a remote server
inventory Provides information about the current inventory of resources
log Configures some settings for the log messages
metrics Shows the agent metrics
native Accesses native system information
pc Starts and stops the plugin container and all deployed plugins
ping Pings the RHQ Server
piql Executes a PIQL query to search for running processes
plugins Updates the agent plugins with the latest versions from the server
register Registers this agent with the RHQ Server
sender Controls the command sender to start or stop sending commands
setconfig Sets an agent configuration preference
setup Sets up the agent configuration by asking a series of questions
shutdown Shuts down all communications services without killing the agent
start Starts the agent comm services so it can accept remote requests
timer Times how long it takes to execute another prompt command
version Shows the agent version information
quit An alias for 'exit'

RHQ Server Auto-Detection and Polling

The agent can be configured to auto-detect its RHQ Server. It can do this in two different ways:

  1. Multicast detection : using JBoss/Remoting's multicast detection technology, the agent can usually detect the RHQ Server coming online or going offline within a matter of seconds (a time which is configurable). This requires your network to support multicast traffic; if it does not, then you cannot use this method of server auto-detection. The following configuration preferences affect auto-detection using the multicast detector:
    • rhq.agent.server-auto-detection must be set to true in order to enable this feature
    • rhq.communications.multicast-detector.enabled must be set to true in order to enable this feature
    • rhq.communications.multicast-detector.default-time-delay is the number of milliseconds that must pass without hearing from the RHQ Server before the RHQ Server is to be considered "offline". To quickly detect a RHQ Server going down or coming up, set this to a short time; to reduce the amount of network traffic, set this to a longer time. However, ensure that this value is longer than the server's heartbeat-time-delay, otherwise, unnecessary network traffic will result.
    • rhq.communications.multicast-detector.heartbeat-time-delay is the number of milliseconds that must pass between the agent's own heartbeat messages. This value must be shorter than the RHQ Server's default-time-delay otherwise, unnecessary network traffic will result.
  2. Server polling : this mechanism polls the RHQ Server periodically to determine if it is online or offline. This method of auto-detection does not require multicast traffic but does require the agent to periodically connect to the RHQ Server and send it a ping command. The following configuration preference affects this "manual" server detection via polling:
    • rhq.agent.client.server-polling-interval-msecs is set to the number of milliseconds that must pass before polling the server. To quickly detect the RHQ Server going down or coming up, set this to a short time; to reduce the amount of network traffic, set this to a longer time. If this value is 0 or less, server polling is disabled.

Typically one or both of these mechanisms are enabled. With the ability to auto-detect the RHQ Server going offline, the agent will be given the opportunity to persist commands that are waiting to be sent and allows the agent to shutdown its attempts to send commands. When the RHQ Server comes back up (and auto-detected), the agent can resume. If, however, both auto-detection features are disabled, then the agent, upon startup, will immediately assume the RHQ Server is online and will allow commands to be sent. If, at some point, the RHQ Server is down, the agent will continually attempt to send it commands - and receive "connection refused" errors. If the RHQ Server is down for a long period of time, this will cause the agent log file to grow very large. This is one reason why it is best to have at least one auto-detection mechanism enabled.

Throttling

The agent has several configuration preferences that define its client-side commands sender - they limit how many resources it can use and how "fast" it can perform some functions (called throttling). These configuration preferences have two main purposes: 1) to help limit the amount of resources the agent is able to claim for itself and 2) to help avoid flooding the server with large amount of commands which could put too-heavy a load on the RHQ server and/or starve other agents from being able to communicate with the RHQ Server. The following configuration preferences define the settings that enable the agent to throttle its outbound messages. Most of these settings should be configured with the other settings in mind. While these do work independently, their effects are usually determined not by their own value but by related values. For example, a queue-size should be set to a larger number if the command timeout is lengthened. This is because if commands are given more time to complete, then more commands will be in the queue waiting to be sent. But, if max-concurrent is raised, this would allow more commands to be dequeued at any one time, so an increase in the queue-size may not be needed. As you can see, all of those preferences set an independent parameter within the agent, but their effects on the agent's behavior as a whole is dependent on the other agent's preferences.

  • rhq.agent.client.queue-size defines the maximum number of commands the agent can queue up for sending to the RHQ Server. The larger the number, the more memory the agent will be able to use up. Setting this to 0 effectively sets the queue to be unbounded. Be careful when setting this to 0; if the RHQ Server is down for a long period of time, the agent may run out of memory if it attempts to queue up more commands than it has memory for.
  • rhq.agent.client.max-concurrent is the number of messages the agent can send at any one time. The larger the number, to more messages the agent can dequeue (thus freeing up space in the queue for more messages to come in). However, the higher this number is, the more messages will get sent to the server at the same time and may require the agent to use more CPU cycles.
  • rhq.agent.client.command-timeout-msecs defines the amount of time the agent will wait for the RHQ Server to reply with a response from a command before that command will be aborted. The longer this time is, the less of a chance the agent will abort a command that otherwise would have succeeded (e.g. if the server just needs alot of time to process the particular command). However, the longer this time is, the more messages have to be queued up and wait before being sent to the server.
  • rhq.agent.client.retry-interval-msecs is the amount of time the agent will wait before attempting to retry a command. Only those commands that are flagged for guaranteed delivery will be retried. Non-guaranteed commands (aka volatile commands) will not be retried and thus this setting will have no effect.
  • rhq.agent.client.send-throttling, if defined, enables send-throttling. When this is enabled, only a certain number of commands can be sent before the agent enters a quiet period. During the quiet period, no throttle-able commands are allowed to be sent to the server. The commands can resume after the quiet period ends. Send throttling only affects those commands configured as "throttle-able" - these are typically commands containing metric collection data (i.e. those commands that tend to be sent to the RHQ Server very frequently and in large numbers). Any other commands are not affected by the send-throttle. Send throttling helps in preventing message storms on the RHQ Server, thus helping to avoid the server from getting flooding with incoming messages and preventing agent starvation (that is, not locking out other agents from being able to talk to the RHQ Server). The send-throttling preference defines both the maximum number of commands that can be sent and the length of the quiet period. For example, a preference value of "50:10000" means that after 50 throttleable commands are sent, a quiet period will commence and last for 10000 milliseconds. After that time expires, 50 more commands can be sent before the next quiet period begins.
  • rhq.agent.client.queue-throttling, if defined, enables queue throttling. This limits the amount of commands that can be dequeued in a given amount of time, called the burst period. If more commands are attempted to be dequeued during the burst period than allowed, those dequeue requests will be blocked until the next burst period begins. For example, if this is set to "50:10000", it means that at most 50 commands can be dequeued in any 10000 millisecond interval. If, during a burst period, a 51st command attempts to be dequeued, that dequeue request will block until the burst period finishes (at which time a new burst period begins and the dequeue request becomes the first of the next 50 allowed dequeue requests). The purpose of queue throttling is not so much to limit the amount of requests being sent to the server (although this does have that side-effect), it really is to prohibit the agent from spinning the CPU too much as it attempts to dequeue and send commands as fast as it can. If an agent is using too much CPU cycles, you can throttle the queue thus (hopefully) reducing the amount of CPU required for the agent to send its commands. Note that if you enable queue-throttling, you must take care in ensuring your queue-size is large enough (since you are limiting the amount of commands that can be dequeued in a specific amount of time, you need to make sure you have enough space in the queue to support the extra amount of commands that get queue up).

Guaranteed Delivery

Some commands that the agent sends to the RHQ Server are not critical in the grand scheme of things. For example, if a ping request fails to make it to the RHQ Server, we don't want to retry it nor do we want to persist that command to ensure it survives an agent shutdown. These commands are called volatile commands. Volatile commands are sent once - if they fail for whatever reason to be successfully processed by the RHQ Server, the failure is logged and the agent drops the command and moves on to the next that it needs to send.

However, there are some commands that must make its way to the RHQ Server and the agent must ensure the RHQ Server processes them. The agent must guarantee that these commands are delivered - hence these are called guaranteed commands.

While the agent will do its best to guarantee the delivery of guaranteed commands; this guarantee is not 100%. That is to say, there may be rare circumstances that arise that cause a guaranteed command to fail to get delivered (e.g. if the JVM crashes suddenly while in the middle of attempting to send a guaranteed command).

Guaranteed commands are retried every X milliseconds while the agent is alive and actively sending commands to the server (where X is the rhq.agent.client.retry-interval-msecs preference setting). Guaranteed commands also survive agent shutdowns. If an agent shuts down prior to being able to deliver a guaranteed command, that command is persisted to disk in what is called the command spool file. The next time the agent starts up, it will load up commands it has spooled to disk and immediately queue them for sending to the RHQ Server.

There are a couple preferences that define the behavior of this command spool file:

  • rhq.agent.client.command-spool-file.params defines the parameters for the spool file. The value's format is defined as "max-file-size:purge-percentage". The first number is the size, in bytes, of the maximum file size threshold. If the spool file grows larger than this, a "purge" will be triggered in order to shrink the file. The second number is the purge percentage which indicates how large the file is allowed to be after a purge. This is specified as a percentage of the first parameter - the max file size threshold. For example, if the max file size is 100000 (i.e. 100KB) and the purge percentage is 90, then when the spool file grows larger than 100KB, a purge will be triggered and the file will be compressed to no more than 90% of 100KB - which is 90KB. In effect, 10KB will be freed to allow room for new commands to be spooled. When this occurs, unused space is compressed first and if that doesn't free up enough space, the oldest commands in the spool file will be sacrificed in order to make room for the newer commands.
  • rhq.agent.client.command-spool-file.compressed is a true or false flag. If this flag is true, the commands stored in the spool file will be compressed. This can potentially save about 30%-40% in disk space (give or take), however, it slows down the persistence mechanism considerably. Recommended setting for this should be false unless something on the agent's deployment box warrants disk-saving over persistence performance. The performance hit will only appear when unusual conditions occur, such as shutting down while some guaranteed commands haven't been sent yet or if the RHQ Server is down. It will not affect the agent under normal conditions (while running with the RHQ Server up and successfully communicating with the agent). In those unusual/rare conditions, having performance degradation may not be as important.

Transports

Both the RHQ Agent and RHQ Server use the same underlying communications services (based on JBoss/Remoting technology). One feature this enables is the ability for the communications layer to use different transports simply by changing configuration preferences. The following configuration preferences define the transports used by the agent:

  • rhq.agent.server.transport defines the transport protocol that the agent will use to talk to the RHQ Server
  • rhq.communications.connector.transport defines the transport that the agent, itself, expects the RHQ Server to use when the server wants to send messages to the agent.

The transports that are supported are those supported by JBoss/Remoting - which today includes: socket (raw and unencrypted socket based transport), sslsocket (encrypted and optionally authenticated SSL transport), servlet, sslservlet, http, and https.

In additional to customizing the transport, you can also provide transport parameters that help define the behavior of the connection using the configured transport.

  • rhq.agent.server.transport-params defines the transport parameters used when connecting to the RHQ Server
  • rhq.communications.connector.transport-params defines what transport parameters the RHQ Server should use when sending messages to the agent

See the JBoss/Remoting documentation for more information on configuring these transport parameters.

Secure Communications - Encryption and Authentication

The communications services used by the RHQ Server and RHQ Agent can secure the network traffic between the two by using SSL in order to encrypt and optionally authenticate the traffic. By simply using a transport that uses SSL, you automatically get encryption. Each RHQ Server and RHQ Agent can be optionally configured with a keystore and/or a truststore. You can configure the RHQ Server to authenticate RHQ Agents, RHQ Agents to authenticate the RHQ Server or both. By setting up the proper certificates in the proper keystores/truststores, you can set up a fully secured network of RHQ Servers and RHQ Agents. You can even define what encryption protocols you want to use to encrypt the network traffic and what algorithms you want to use within your keystores/truststores.

There are two configuration preferences that are important to consider:

  • rhq.communications.connector.security.client-auth-mode defines whether or not the agent's server-side components must authenticate incoming requests (that is, authenticate the RHQ Server's certificate). The client-auth-mode can be set to one of three values:
    1. none means the agent will not attempt to authenticate the RHQ Server's certificate during the SSL handshake. In this case, the agent won't even need a truststore file defined.
    2. want means that only if the RHQ Server sends a certificate will it be authenticated. If the RHQ Server does not have a certificate (thus doesn't provide one during the SSL handshake), this anonymous connection will be accepted by the agent. The agent must have a truststore file containing all its trusted certificates (which must include the RHQ Server's public certificate).
    3. need means that the agent must authenticate the RHQ Server's certificate in order for the incoming requests to be accepted. If the RHQ Server provides an untrusted certificate or if it provides no certificate at all during the SSL handshake, the agent will deny the connection request and not accept any data from that connection. The agent must have a truststore file containing all its trusted certificates (which must include the RHQ Server's public certificate).
  • rhq.agent.client.security.server-auth-mode-enabled defines whether or not the agent's client-side sender components must authenticate the RHQ Server's certificate when it sends outbound requests to the RHQ Server. When the agent initiates communicates with the RHQ Server (i.e. when the agent wants to send a command to the server), it must first engage in the SSL handshake, at which time both the agent and server swap certificates. If server-auth-mode-enabled is true, the agent must authenticate/trust the RHQ Server's certificate, otherwise, the agent will refuse to send its command to the server. If this mode is false, the server's certificate is ignored and the agent sends its command regardless of the server's trustworthiness. When this mode is enabled, the agent must have a truststore file containing all its trusted certificates (which must include the RHQ Server's public certificate).

If the agent does not yet have a keystore containing its certificate, it will create and self-sign its own. The self-generated keystore file will, by default, be stored in the agent's data directory under the filename "keystore.dat". A keystore file is required if the agent is to engage in any SSL-based transport. If you wish, you can create and assign the agent your own custom certificate stored in your own keystore file. Simply create the keystore file, put it somewhere on the local file system where the agent has access to and define your keystore configuration preferences accordingly. The same holds true for the truststore files. The agent will not create any truststore files. If you wish to enable either client-auth or server-auth, you must provide the trusted certificates to the agent by putting the truststore files somewhere where the agent can get to them and defining the appropriate truststore configuration preferences.

Native Code with Java Fallback

The RHQ Agent still loads in native code to help it with things that only a low-level native layer can perform (such as examining the operating system's process table). However if the native libraries are not available for your hardware/operating system, the agent will still run and be supported. Your agent will lose some capabilities (such as being able to auto-discover resources via process table scanning), however, your agent will still run and function. In addition, you can manually disable the native layer, in the case where the native libraries do exist but for some reason are not working properly - you can disable it to get the agent working again (albeit with the reduced set of capabilities).

Embedded RHQ Agent

The RHQ Agent now has the ability to run embedded inside the RHQ Server. You are not required to run a separate, standalone RHQ Agent on the RHQ Server machine if you want to manage things on that RHQ Server box. You can still run a standalone RHQ Agent if you wish, however, you now have the option to embed it if you wish to deploy in that configuration.

Powered by a free Atlassian Confluence Open Source Project License granted to Hyperic HQ. Evaluate Confluence today.
Powered by Atlassian Confluence 2.7.1, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators