|
RHQ currently has no notion of dependencies between resources except for the implicit parent-child notion of the resource tree, but even there, no real dependency is expressed in the form that e.g. when the parent becomes unavailable, the non-availability of the children is not explicitly reported (i.e. hidden) The classical application is often denoted like this:
We have an Application consisting of a load balancer LB, two App servers and a database. RHQ currently has no notion of the Application itself (which here is just a mixed group) and is thus not able to just tell
Situation gets worse when you add more resources in the game like a JMS broker
or even a compute grid or more applications that all have some dependencies in the form of a directed graph
Start/stop dependenciesI was recently on a call with a customer. The following comes from an email wrt that call. The customer has a strong demand to be able to start and stop apps with their dependencies. The following graphic illustrates this:
Before the Application (s) on AS can start, the compute grid must be started, as well as the DB and the load balancer. Also in compute grid and the AS also depend on some MQ broker. For app3 to start, the grid + MQ needs to RHQ actually already provides the biggest part of this puzzle: triggering start + stop operations on the managed resources. We got the idea that they could actually implement the triggering logic by implementing a server plugin, that reads an external file that hosts the dependency graph and then triggers the start and stop operations in turn. As server plugins have "controls", there are even "buttons" in the UI to trigger those start and stop "meta-operations".
So we would need a way to associate such an operation of a server plugin to an actual resource, so that the operator would go to App4, operations tab and issue a "start with dependencies" operation, that goes to the server plugin, which then does the work of scheduling all the needed start operations on plugin level (in turn and with waiting for results). There are now several options on how to represent the metadata for this
In any case the name of the resource where the server-operation is triggered from needs to be passed to the server plugin, so that it can go to the right entry in the table and do its work. Relation to other subsystemsAlertingAlerting needs to be able to understand the concept of Application, so that the user only needs to set up an alert on application level and e.g. un-availabilities on the level of dependencies will bubble up to the level of the application. UI changesApplication-centric dashboardClassifying a set of resources now allows to create dashboards that can aggregate system state like only showing a traffic light for the state of the whole application - if the user needs more knowledge, he can drill down into the application Graphical application centric viewsWe need to provide graphical views like the ones at the top, where the user can see the dependencies, alert state and via context / hover can easily see relevant metrics Some notesAbout relationship serviceThere exists already a design of a Relationship Service that could be used to implement the graph here. This is not enough though, as just following those links is not enough, as you also need a driver to sit on top to drive all the operations to be triggered. About generic lifecylce api (from Ian)We should also consider adding a first-class concept of managed resource lifecycle management. The new plugin API could look something like: interface LifecycleFacet { boolean startManagedResource(); boolean stopManagedResource(); } If we had this, along with defined relationships, either the Server or the plugin container could orchestrate starting a managed Resource and all its dependency Resources in the correct order. I think this would allow us to provide generic support for dependency-aware Resource lifecycle management, except for the case where the user wants to define a set of Resources as runtime dependencies of their "application", where that application is not represented as an RHQ Resource. That case could require a server plugin and/or even further additions to the domain model. |



