Condor configuration management framework
Overview
Condor is an extremely flexible system. It has configuration parameters regulating every subtlety of its behavior. While being its true power,
this flexibility adds to the complexity of Condor's deployment and pool administration. There are currently more than 300 parameters allowing
to change various aspects of the system. While most of them are rarely changed after the initial deployment, there are some parameters, e.g.
policy regulation settings, which have to be altered on the regular basis.
Condor configuration mechanism poses additional management problems. All Condor configuration parameters are specified in special
configuration file or a set of files. If shared file system is not available for all Condor hosts, multiple copies of the configuration files
will be maintained. This, coupled with configuration file complexity makes Condor pool administration quite difficult.
This project attempts to simplify the pool management:
- It allows to organize all pool's resources into "configuration groups", i.e. groups of
resources which have some common configuration settings. Each configuration group is associated with a set of parameters, which are automatically
inhereted by all its members. A group can be open (allow anyone to join) or closed (joining it requires permission from the group administrator)
- It hides the configuration file complexity by allowing to build macros, or "Configuration Elements", which eventually change
value of a single parameter, or a set of parameters.
- The configuration changes are regularly propagated to the hosts in the pool. If some of the hosts are down, they will get their configuration when they
come back to life.
- The centralized repository, containing all the metadata regarding group membership, as well as the set of parameters associated with each group, is
stored in a transactional database.
- GUI is provided to allow easy configuration of the parameters and groups.
For the advanced users, the framework is capable of building group hierarchy graph of any required topology and not only tree. For instance, a
host can be a member of group B and group C at the same time. Then, its configuration will be a union of two configuration elements sets defined for these groups. If in
addition, B and C are descendants of group A, then the node will get some union of the parameters in all these groups.
The challenging part is to calculate the effective parameters of the host, which are influenced also by the locally defined host configuration.
This is resolved by specifying policy for each configuration element which defines a conflict resolution scheme ( override, concatenate, AND , etc. )
In addition to making life easier to Condor pool administrator, we would like to encourage resource donors to deploy Condor without a need to configure
it. The only thing they have to specify is what configuration group they want to join, and their hosts will automatically be reconfigured according to the
group configuration.
Solution design
Notably, our configuration framework does not substitute the native Condor file-based configuration mechanism, but utilizes Condor tools to perform
configuration changes. In particular, we require additional agent to run on the Condor host, which periodically queries central configuration
repository, receives the updates, translates them to the "condor_config_val" format and invokes the command locally, followed by "condor_reconfig".
The framework is implemented in pure Java with RMI-based communications. The more detailed design will soon be available .
Another part of the project, which implements the "configuration element" abstraction, utilizes the Classad
library. The design and examples are available here.
Status
We have already implemented first prototype. Its goal was resticted to providing an easy way to assign hosts to groups, and allow to specify these groups
when submitting a job. This provides the basic ability to split resources. It can be downloaded from the download section, together with detailed
installation instructions.
Currently there are two parts of the project being implemented independently: 1) group management and configuration propagation infrastructure, 2)
configuration elements. The first part is very close to completion and soon is entering the testing phase. When ready it will be capable of creating group
graph, and will use simple configuration element with override/concatenate conflict resolution.
The second part is fully implemented, but requires more testing and is still unstable. In the next stage it will be integrated with the
first part to form
a complete solution.
Download
Group maintenance prototype
Configuration elements prototype
Group management facility - to be available soon
Contact
Mark Silberstein: marks-at-cs.technion.ac.il
Students
2002-2003: Prototype and ideas
Dmitry Kravkov, Alex Soukhman
2003-2004: Configuration elements prototype
Lena Lempert, Yarden Dar
2004: Group management and configuration propagation
Tomer Shiran, Ari Shotland, Nir Zepkowitz
Supervisors
Mark Silberstein, Gabi Kliot
Assaf Schuster