System Tests

Overview

System tests are high-level tests, which check that Mantid is able to reproduce accepted, standardised results as part of its calculations, when executing user stories. The system test suite is written against Mantid’s Python API.

As part of our nightly-build and nightly-test procedure, Mantid’s system tests are run as acceptance tests. The nightly-test jobs deploy a packaged version of Mantid to the target OS, before executing the system tests scripts on that environment.

Writing a Test

The (Python) code for the system tests can be found in the git repository at mantidproject/mantid, under the Testing/SystemTests directory.

System tests inherit from the systemtesting.MantidSystemTest class. The methods that need to be overridden are runTest(self), where the Python code that runs the test should be placed, and validate(self), which should simply return a pair of strings: the name of the final workspace that results from the runTest method and the name of a nexus file that should be saved in the ReferenceResults sub-directory in the repository. The test code itself is likely to be the output of a Save History command, though it can be any Python code. In the unlikely case of files being used during a system test, implement the method requiredFiles which should return a list of filenames without paths. The file to validate against should be included as well. If any of those files are missing the test will be marked as skipped.

The tests should be added to the Testing/SystemTests/tests/framework, with the template result going in the reference sub-folder. It will then be included in the suite of tests from the following night.

Alternatively, any tests relating to testing qt interfaces should be added to the Testing/SystemTests/tests/qt directory.

Specifying Validation

You may need to inform the System Test Suite about the format of that the benchmark workspace you wish to validate against. By default, the system tests assume that the second argument returned by the validate tuple is the name of a nexus file to validate against. However you can override the validateMethod on a test with any one of three options.

  • WorkspaceToNexus (Benchmark workspace is stored as a Nexus file) (default)

  • WorkspaceToWorkspace (Benchmark workspace is stored as a workspace)

  • ValidateAscii (Benchmark workspace is stored as an ascii file)

For example:

def validateMethod(self):
    return 'WorkspaceToNeXus'

No Workspace Validation

If the system test does not need comparison/validation against a standard workpace, then this step can be skipped. Simply omitting the

def validate(self):
    pass

method from the system test is sufficient.

Skipping tests

Tests can be skipped based on arbitrary criteria by implementing the skipTests method and returning True if your criteria are met, and False otherwise. Examples are the availability of a data file or of certain Python modules (e.g. for the XML validation tests).

Target Platform Based on Free Memory

Some tests consume a large amount memory resources, and are therefore best executed on hardware where enough memory is available. You can set a minimum RAM specification by overriding requiredMemoryMB:

def requiredMemoryMB(self):
    return 2000

The above function limits the test to run on a machine where there is at least 2GB of free memory.

Target Platform Based on Free Memory

Some tests require very large files that cannot be placed in the shared repository. The requiredFiles() method returns a list of these files so that they test can check that they are all available. If all files are not available then the tests are skipped.

def requiredFiles(self):
    return ['a.nxs', 'b.nxs']

The above function limits the test to run on a machine that can find the files ‘a.nxs’ & ‘b.nxs’

Set the Tolerance

You may specialise the tolerance used by CompareWorkspace in your system test.

self.tolerance = 0.00000001

By default the tolerance is absolute. It can be changed to relative by another flag in the systemtesting.MantidSystemTest class.

self.tolerance_rel_err = True

Disable Some Checks

You may disable some checks performed by the CompareWorkspaces algorithm by appending them to the disableChecking list, which, by default, is empty.

# A list of things not to check when validating
self.disableChecking = []

Assertions

Additional assertions can be used as the basis for your own comparison tests. The following assertions are already implemented in the base class.

def assertTrue(self, value, msg=""):
def assertEqual(self, value, expected, msg=""):
def assertDelta(self, value, expected, delta, msg=""):
def assertLessThan(self, value, expected, msg=""):
def assertGreaterThan(self, value, expected, msg=""):

Running Tests Locally

CMake configures a script file called systemtest (systemtest.bat on Windows) in the root of the build directory. This file is the driver script to execute the system tests that runs the lower-level Testing/SystemTests/scripts/runSystemTests.py script but ensures that the environment is set up correctly for that particular build and that the required test data has been updated. The script accepts a -h option to print out the standard usage information.

Usage differs depending on whether you are using a single-configuration generator with CMake, for example Makefiles/Ninja, or a multi-configuration generator such as Visual Studio or Xcode.

Downloading the test data

The systemtest script will automatically attempt to download any missing data files but will time-out after 2 minutes. The time out limit can be set in two variables ExternalData_TIMEOUT_INACTIVITY and ExternalData_TIMEOUT_ABSOLUTE. If using CMake these will need to be added as new string entries (value is in seconds).

Visual Studio/Xcode

The user must first open command-prompt from, the build directory. The script requires the developer to select the configuration that will be used to execute the tests, one of: Release, Debug, RelWithDebInfo or ‘MinSizeRelease’’. Note that the script does not build the code so the chosen configuration must have already been built. An example to execute all of the tests for the release configuration would be (in the command-prompt):

> systemtest -C Release

Makefile-like Generators

The script requires no additional arguments as the configuration is fixed when running CMake, e.g.

cd build
systemtest

Selecting Tests to Run From IDE

System tests can be run from the MSVC IDE using the SystemTests target, which behaves in a similar way to unit test targets. One key advantage is that it allows you to start Mantid in a debug environment rather than attach to one midway through.

To select an individual test, or range of tests, go to the SystemTests properties, go to `Command Arguments` and append flags as appropriate.

For example, adding -R ISIS will run any tests which match the regular expression ISIS.

Debugging System Tests in Pycharm

System tests can be debugged from Pycharm without finding and attaching to the process, or using a remote debugger.

To do this, create a new Configuration (Run -> Edit Configurations), and add a new Python configuration with the script path set to runSystemtests.py This is found in /Testing/SystemTests/Scripts/runSystemTests.py

The parameters for the configuration can be set just like the command line args when running the tests from the systemtest.bat/systemtest script, e.g pass -R="EnginX" to run all tests containing the string EnginX in their name.

Note that running the system tests this way will not update the system test data, so if your data need to be updated, the system tests should be called via the normal method in the first case.

Do not use the multiprocessing -j flag in your configuration parameters as this will render you unable to debug the system tests directly, as they will no longer be running under the parent Python process.

N.B. Windows users do not need to specify the configuration with the -C flag as when using the systemtest.bat script, as this is not passed to runSystemtests.py and will result in an error.

Selecting Tests To Run

The most important option on the script is the -R option. This restricts the tests that will run to those that match the given regex, e.g.

cd build
systemtest -R SNS
# or for msvc/xcode
systemtest -C <cfg> -R SNS

would run all of the tests whose name contains SNS.

Running the tests on multiple cores

Running the System Tests can be sped up by distributing the list of tests across multiple cores. This is done in a similar way to ctest using the -j N option, where N is the number of cores you want to use, e.g.

./systemtest -j 8

would run the tests on 8 cores.

Some tests write or delete in the same directories, using the same file names, which causes issues when running in parallel. To resolve this, a global list of test modules (= different Python files in the Testing/SystemTests/tests/framework directory) is first created. Now we scan each test module line by line and list all the data files that are used by that module. The possible ways files are being specified are: 1. if the extensions .nxs, .raw or .RAW are present 2. if there is a sequence of at least 4 digits inside a string In case number 2, we have to search for strings starting with 4 digits, i.e. “0123, or strings ending with 4 digits 0123”. This might over-count, meaning some sequences of 4 digits might not be used for a file name specification, but it does not matter if it gets identified as a filename as the probability of the same sequence being present in another Python file is small, and it would therefore not lock any other tests. A dict is created with an entry for each module name that contains the list of files that this module requires. An accompanying dict with an entry for each data file stores a lock status for that particular datafile.

Finally, a scheduler spawns N threads who each start a loop and gather a first test module from the master test list which is stored in a shared dictionary, starting with the number in the module list equal to the process id.

Each process then checks if all the data files required by the current test module are available (i.e. have not been locked by another thread). If all files are unlocked, the thread locks all these files and proceeds with that test module. If not, it goes further down the list until it finds a module whose files are all available.

Once it has completed the work in the current module, it unlocks the data files and checks if the number of modules that remains to be executed is greater than 0. If there is some work left to do, the thread finds the next module that still has not been executed (searches through the tests_lock array and finds the next element that has a 0 value). This aims to have all threads end calculation approximately at the same time.

Reducing the size of console output

The systemtests can be run in “quiet” mode using the -q or --quiet option. This will print only one line per test instead of the full log.

./systemtest --quiet
Updating testing data...
[100%] Built target StandardTestData
[100%] Built target SystemTestData
Running tests...
FrameworkManager-[Notice] Welcome to Mantid 3.13.20180820.2132
FrameworkManager-[Notice] Please cite: http://dx.doi.org/10.1016/j.nima.2014.07.029 and this release: http://dx.doi.org/10.5286/Software/Mantid
[  0%]   1/435 : DOSTest.DOSCastepTest ............................................... (success: 0.05s)
[  0%]   2/435 : ISISIndirectBayesTest.JumpCETest .................................... (success: 0.06s)
[  0%]   3/435 : ISISIndirectInelastic.IRISCalibration ............................... (success: 0.03s)
[  0%]   4/435 : HFIRTransAPIv2.HFIRTrans1 ........................................... (success: 1.30s)
[  1%]   5/435 : DOSTest.DOSIRActiveTest ............................................. (success: 0.04s)
[  1%]   6/435 : ISISIndirectBayesTest.JumpFickTest .................................. (success: 0.06s)
[  1%]   7/435 : AbinsTest.AbinsBinWidth ............................................. (success: 1.65s)
[  1%]   8/435 : ISIS_PowderPearlTest.CreateCalTest .................................. (success: 1.65s)
[  2%]   9/435 : ISISIndirectInelastic.IRISConvFit ................................... (success: 0.56s)
[  2%]  10/435 : LiquidsReflectometryReductionWithBackgroundTest.BadDataTOFRangeTest . (success: 2.94s)
[  2%]  11/435 : DOSTest.DOSPartialCrossSectionScaleTest ............................. (success: 0.23s)
[  2%]  12/435 : ISISIndirectBayesTest.JumpHallRossTest .............................. (success: 0.07s)
[  2%]  13/435 : ISISIndirectInelastic.IRISDiagnostics ............................... (success: 0.03s)
[  3%]  14/435 : HFIRTransAPIv2.HFIRTrans2 ........................................... (success: 0.83s)
[  3%]  15/435 : DOSTest.DOSPartialSummedContributionsCrossSectionScaleTest .......... (success: 0.15s)
[  3%]  16/435 : ISISIndirectBayesTest.JumpTeixeiraTest .............................. (success: 0.07s)
[  3%]  17/435 : ISISIndirectInelastic.IRISElwinAndMSDFit ............................ (success: 0.29s)
[  4%]  18/435 : MagnetismReflectometryReductionTest.MRFilterCrossSectionsTest ....... (success: 5.30s)
[  4%]  19/435 : DOSTest.DOSPartialSummedContributionsTest ........................... (success: 0.16s)

One can recover the full log when a test fails by using the --ouptut-on-failure option.

Running a cleanup run

A cleanup run will go through all the tests and call the .cleanup() function for each test. It will not run the tests (i.e. call the execute() function) themselves. This is achieved by using the -c or --clean option, e.g.

./systemtest -c

This is useful if some old data is left over from a previous run, where some tests were not cleanly exited.

Adding New Data & References Files

The data is managed by CMake’s external data system that is described by Data Files for Testing. Please see Adding A New File(s) for how to add new files.

Best Practice

  • Always check your test works locally before making it public.

  • User stories should come from the users themselves where possible.

  • Take care to set the tolerance to an acceptable level.