Overview and structure of the HAD and replication testing system ---------------------------------------------------------------- HAD automatic testing system is a Perl script that is intended to run on every Unix-like machine. It is submitted to NMI build and test system as a nightly test. The script is called 'job_had_basic.run'. There are several auxiliary files: * job_had_basic.cfg - defines configuration of the testing system, the checkers to run, overall running time, submission parameters etc for the HAD only features test * job_replication_basic.cfg - defines configuration of the testing system, the checkers to run, overall running time, submission parameters etc for the replication features for HAD daemons * x_param.had - defines the configuration of the Condor instances, that are to be raised on the execution machine * job_had_basic.localpostsrc - parameter file including HAD configuration variables, used by 'src/condor_scripts/CondorPersonal.pm' script as an addendum to default condor_config file to create a Condor instance; appears as 'localpostsrc' parameter value inside 'x_param.had' * x_hadutilities.pm - various utilities, used by the main script How it works ------------ The main script raises several instances of Condor on the same execution machine. The instances are differentiated by ports, on which the daemons listen. The condor instances are created inside a directory, which equals to the pId of the main script process, in the following way: The 'X' instance is created inside '/_instance' directory, containing 'execute', 'spool' and 'log' subdirectories. Then the main program generates various scenarios, based on the configuration settings, specified in '*.cfg' files, based upon the following events: * Raising machines * Failing machines * Submitting jobs Once upon a predefined time a configured (inside *.cfg files) set of checkers is applied to test the validity of the pool state. Each of these checkers return either failure or success and report the result into respective testing system log file. The testing system log files are generated in 'TestingSystem/Logs' directory: * Error.log - contains error messages that were found by the testing system during the test * Success.log - contains success messages that were found by the testing system during the test * Control.log - contains messages, describing the way, the test has been held At the end of the test, the script moves all the produced files into 'job_had_basic.saveme' directory in order to retrieve it as a tarball to the user. The exit value of the test is determined by the number of errors, that it produced. If this value exceeds 'FAILURES_TOLERANCE' parameter of the configuration file (job_had_basic.cfg or job_replication_basic.cfg), the exit value is FALSE, otherwise - TRUE. Checkers -------- HAD checkers: ------------- * Only1Neg - checks that there is only one negotiator in the pool * PrimaryTakesLead - checks that after the primary HAD was raised, it becomes active within stabilization period * AlwaysTheSameNeg - checks if the negotiator passes without reason from one to another machine * FailureDetection - checks that after election the negotiator is raised * TestingSystemConsistency - checks that number of alive HADs is equal to the number of alive schedds, startds and collectors * MultipleSend - checks whether negotiator publishes itself in all the collectors HAD and replication checkers: ----------------------------- * AccountantsNotOld - checks that alive replication machines have got updated replicas of the state file * AccountantsSynchronization - checks that alive replication machines replicas last modification times do not differ by more than one REPLICATION_INTERVAL * CorrectAccountantFormat - checks that the state file's format is not corrupted * NoTemporaryFiles - checks that there is no temporary .down and .up files inside 'spool' subdirectory Tests kinds ----------- There are two kinds of tests, which are going to be run in NMI testing system. One of them tests HAD features, and the second tests replication-only features. The corresponding configuration files for these tests are 'job_had_basic.cfg' and 'job_replication_basic.cfg' How it could be tested before nightly build ------------------------------------------- In order to check that the test is valid, one must set up a personal condor environment on its execution machine. This could be done following the next steps: * Building Condor on specific platform and getting the 'release.tar' file (either by submitting the job to build in NMI or by issuing 'make public' command in 'src' directory of the compiled Condor code) * Issuing the 'src/condor_scripts/condor_configure' command. The following flags are used: % --make-personal-condor - specifies that the personal condor environment is to be built % --install - specifies the path to 'release.tar' file, containing all Condor binaries, libraries, configuration files etc % --install-dir - specifies the directory, where the 'release.tar' is to be opened % --local-dir - specifies the directory, where the local definitions are to be stored (containing 'condor_config.local' file) Example: ./condor_configure --make-personal-condor --local-dir=/home/dsl_il/local --install=/home/dsl_il/condor- 6.7.14/public/v6.7/condor-6.7.14/release.tar --install-dir=/home/dsl_il/condor_personal --verbose * Changing the CONDOR_CONFIG variable to point to the 'condor_config' file, which is present in the directory, specified as a value of '--install-dir' flag. This could be also seen in the output of 'condor_configure' script, after the script finishes Example: export CONDOR_CONFIG=/home/dsl_il/condor_personal/condor_config This sets the new condor pool of 1 node only and is practically similar to the pool, that is built by nightly tests program. Test, which succeeds/fails in such an environment, must succeed/fail in the overnight tests. Installing the new script into testing suite -------------------------------------------- There are different classes of tests, that can be ran in Wisconsin pool. In order to become part of the nightly build the test must be ran as TESTCLASS(quick). The only file that has to be extended in this case is 'src/condor_tests/Imakefile'. Suppose, we want to add a test, called 'job_had_basic': * DESC(job_had_basic," Basic High Availability Daemon Test") RUN(job_had_basic) TESTCLASS(job_had_basic,core) TESTCLASS(job_had_basic,quick) Besides, it is always desired to be able to run some specific test on demand, without waiting for the nightly build. In order to do so, one must declare a new TESTCLASS, say 'dsldebug', inside 'src/condor_tests/Imakefile.common' and extend the 'src/condor_tests/Imakefile' by the following line: * TESTCLASS(job_had_basic,dsldebug) After the changes has been made there are two options: 1. Submit the changed files into CVS 2. Follow the tip inside http://www.cs.wisc.edu/condor/developers/test-nmi.html