Base package details¶
Configuration system¶
The Overwatch configuration system is based around hierarchical overridden values stored in YAML configuration files. Each package within Overwatch has its own YAML file. For each property, the system looks for possible values in each configuration file. If the multiple configuration files define the property, the further down the hierarchy overrides the base value. By doing so, the loaded configuration provides just one value per property (as opposed to combine every value which was defined for the property).
As a concrete example, consider two modules, where module A
depends on module B
, and they both define a
property named x
. For module A
, x=5
, while for module B
, x=3
. In this case, after all of the
configurations were loaded, the configuration system would provide 5
when property x
is retrieved.
As of August 2018, the configuration override order is as follows:
- Current working directory
- User home directory
overwatch.webApp
overwatch.processing
overwatch.receiver
overwatch.api
overwatch.base
where 1 has the highest priority and 7 has the lowest priority. In each folder, the system looks for a file
named config.yaml
. In addition to the base configuration system, a number of YAML constructor plugins are
also provided to improve the user configuration experience. For example, there is a constructor to facilitate
the joining of paths, in a similar way to os.path.join
.
For additional technical details regarding the configuration system as well as an exhaustive list of
constructor plug-ins, see the documentation for functions within the overwatch.base.config
module.
Additional files¶
storageWrapper.py
is an idea for how files could be accessed through XRootD
instead of needing to preside
on the local disk. This would be particularly useful because we would then be able to take advantage of EOS
directly. However, as of August 2018, this is still very much a work in progress and isn’t fully operation.
Deployment system¶
To facilitate configuring and launching tasks (particularly in docker containers), Overwatch includes a
configuration and launching module. It is available as overwatchDeploy
. To operate, it requires a YAML based
config file. All executables inherit from the executable
class. As of September 2018, it can execute:
- Environment setup
autossh
for SSH tunnels.ZODB
for the Overwatch Database- Overwatch ZMQ receiver
- Overwatch ZMQ receiver monitoring
- Overwatch receiver data transfer
- Overwatch data replay (for local development, as well as larger number of files for data transfer)
- Overwatch DQM receiver
- Via
uswgi
,uwsgi
behindnginx
or directly.
- Via
- Overwatch processing
- Overwatch web app
- Via
uswgi
,uwsgi
behindnginx
or directly.
- Via
For a comprehensive set of options, see the docstrings of the module, as well as the reference
configuration, overwatch/base/deployReference.yaml
. Note that all executable are disabled by default,
so one may leave the configuration for all objects in a deployment configuration, and then just enable the
parts that you want for a particular execution. However, it is also important to note that the deploy system
doesn’t support default values or overriding parts of the configuration like the Overwatch configuration
system - every option of interest must be in the config passed to overwatchDeploy
.
Why all of the complication in setting up the environment instead of just passing existing certificate and configuration files? Some possible operating envrionments would have precluded the possibility of mounting those files into the container. So the approach of the deploy module allows for them to be passed in as strings.
Steps to add a new executable¶
- Write the new executable.
- Create a new deployment executable. If it is an Overwatch executable, it can probably leverage the existing Overwatch deployment classes. Otherwise, you’ll need to write a class.
- Add this executable to the
_available_executables
list. Overwatch executables are often added viafunctools.partial
so we don’t have to define an entirely new class. - Add an entry for this executable into the documentation in the class.
- Add an entry for this executable into the
overwatch/base/deployReference.yaml
file, specifying all of the options that may be used with your new executable. Remember that the task should be disabled by default!
Then, to actually use this new executable, you should:
- Add an entry for the executable into your
deployConfig.yaml
, specifying all options as necessary. - Add an entry into your
docker-compose.yaml
to create a container in which the executable will run.
Monitoring for errors¶
It is important to monitor Overwatch for errors and other issues. For the data transfer module, this
monitoring is provided by sentry
, which hooks into exceptions, logging, and other parts of the app to
automatically provide alerts and information when issues occur.
For successful monitoring, the environment must export the DSN
as export SENTRY_DSN_DATA_TRANSFER=<value>
.
The value will be some sort of unique URL. Note that this endpoint is just for Overwatch data transfer (called
overwatch-datatransfer
on sentry
). If it is not available, it will look for the general environment
variable SENTRY_DSN
.
Data transfer¶
Data must be moved from the ZMQ and DQM receivers to other Overwatch sites, as well as exported to EOS. All of
these transfers are handled by the data transfer module. It will transfer the data in a robust manner, retry
on failures, and then notifying the admin if the issues continue. For further information on configuration,
see the dataTransfer
module.
Data replay¶
In order to fully test the entire Overwatch processing and visualization chain, as well as test trending
values as they evolve, data must be replayed over some time as if it was actually being received from the
receivers. In order to fully simulate data this arrival, Overwatch provides a dataReplay
module. This module
will take an existing run directory, and replay all of the files within one by one.
This module can be configured via a number of parameters:
dataReplayTimeToSleep
: Time to sleep between each replay execution.dataReplaySourceDirectory
: Select which Run directory will be replayed. This must be the path to the full run directory. For example, it may be “data/Run123456”. “Run” must be in the directory name. It is null be default because we don’t want to unexpected begin replaying, which could lead to data loss.dataReplayDestinationDirectory
: Where the data should be replayed to. Usually, this is just the data folder, because Overwatch will then process the files from there.dataReplayTempStorageDirectory
: Location where directories and files are temporarily stored when replaying a run.dataReplayMaxFilesPerReplay
: Maximum number of files to move per replay.nMaxFiles
defaults to one, which will ensure that files are transferred one by one, which is the desired behavior if one wants to test the evolution of dataset. Such an approach is the best possible simulation of actually receiving data.
This module can also be utilized to generically transform processed Overwatch data to appear as if it hasn’t
been processed yet by moving and renaming the underlying ROOT
files. This is particularly useful if one
wants to transfer processed data via the dataTransfer
module. Simply set the replay destination directory as
the data transfer input directory, and the data will be transferred as if it was just received from the HLT.
Common Issues¶
When replaying data, if you receive an error similar to:
File "/overwatch/overwatch/base/replay.py", line 86, in availableFiles
name = convertProcessedOverwatchNameToUnprocessed(dirPrefix = root, name = name)
File "/overwatch/overwatch/base/replay.py", line 45, in convertProcessedOverwatchNameToUnprocessed
runNumber = int(prefixAndRunDir[runDirLocation:])
ValueError: invalid literal for int() with base 10: 'ta/tempReplayData/testDirectory/input'
you should closely check your directory structure. This is likely caused by a root file being located outside
of the standard Overwatch directory structure. For example, a root file in Run123/.
will cause this issue.
Moving the root files to the proper location should resolve the issue.