IntroductionTracker Bug: BZ 549852
Disabling PluginsRHQ already supports disabling plugins. The work for this was done under BZ 535894. Disabling a plugin immediately marks the plugin as disabled in the database, but no change happens on the agent's plugin container until the plugin container is restarted. Once the plugin container is restarted, discovery of types in the disabled plugin will be stopped. Disabling a plugin is a not an adequate solution because the meta data from the plugin remains in the system. This would prevent rolling back to a previous version of the plugin and could still result in upgrade problems. Another reason that this won't suffice is because resources already in inventory of types that have been disabled remain in inventory. When a plugin is deleted, all types and instances of those types must be removed from the system. Plugin DependenciesWe cannot delete some plugin P1 that has other plugins, P2 and P3, that depend on P1. We have to also delete P2 and P3. Let's consider a couple example. First, we attempt to delete the apache plugin. This should proceed without error because there are no other plugins that depend on the apache plugin. Now let's say we have the platform, jmx, hibernate, and jboss-as plugins in the system, and we try to delete the jmx plugin. This should not be allowed without also deleting the hibernate and jboss-as plugins since they both depend on the jmx plugin. Consider another example. Suppose we want to delete the jboss-as plugin. The rhq-server plugin depends on it. This means we would have to delete both. As a last example, consider the hibernate plugin which has an optional dependency on the jboss-as plugin. When both plugins are deployed, we create types from the hibernate plugin whose parent types come out of the jboss-as plugin. If we want to delete the jboss-as plugin, we will need to either delete the hibernate types or modify their hierarchy. What Needs to be Deleted?
Purging the Plugin ContainerNot only do we want to delete the plugin jar file on the agent, but we also want to free up resources associated with the plugin. We want to stop running discovery components, purge local inventory of resources and resource types, and we want to discard classloaders that are no longer needed. Fortunately, all of this already happens when the PC is started or restarted. We need a way for the PC to tell the agent that it needs to reboot the PC. But because the PC cannot directly use any agent APIs (since the PC does not always run inside the agent), we need to add some indirection. We can create a new PC listener type, RebootRequestListener, that the agent can implement. When the PC needs to be rebooted it notifies the listener, which turns control over to the agent. The agent can then reboot the PC. Dealing with Discovery and Inventory ReportsThere is a race condition of sorts that could occur when we start the deletion process. An agent could send up an inventory report to the discovery containing resources from the plugin being deleted while the deletion is underway. Resources could get committed into inventory while the resource type are being delete, which could potentially put the inventory into an inconsistent state. We need to effectively turn off discovery of all resource types from the plugins being deleted. This is discussed and covered to a large degree in BZ 535289. The work done resulted in turning off discovery components on a per-agent basis when a plugin component is misbehaving in some way. Ultimately, we do not want to merge inventory reports that contain resources have already been deleted or are of types that are in the process of being deleted. The server needs to check reports and throw an exception if the reports contain stale data. The exception is propagated back to the PC which informs it that it has stale types/plugins and that it needs to be restarted. In Band WorkResources will be uninventoried. Resource types will be marked for deletion. Out of Band WorkWe already have the async resource delete scheduled job (see the class AsyncResourceDeleteJob) that handles removing resources and their associated data from inventory. We will delete resource types out of band. Deleting resource types could very well be an expensive, time-consuming operation. As such, it should be done asynchronously as a scheduled job. There are a couple important preconditions we must have in place. First, discovery of and importing of resources of the types to be deleted has been turned off. Secondly, the async resource delete job will remove all of the resources along with all of their associated instance data. With these preconditions in place, our job can run independently of the resource deletion job and do the resource type deletion. When the job runs, it will first check to make sure all resources of the type to be deleted have been deleted. If there are still resources in the system, then we skip over that resource type. This work needs to take into account the resource type hierarchy. We cannot remove some type A if some type B depends on A. Each time this job is executed, we have to construct the dependency graph in memory of all resource types to be deleted, and delete them in the appropriate order respecting dependencies. TasksNote: estimates are in hours.
|