Overview
Two command-line scripts are provided: iPlotIt and pyPlotIt.
Both have a similar interface as the plotIt executable: they take a YAML
configuration file as a positional argument, and optional --histodir and
--eras arguments, to pass a different histograms directory (in case they are
not in the same directory as the configuration file) and set of data-taking
periods (eras) to consider.
pyPlotIt mimics the plotIt batch plot production, but is currently
not very useful, given the much more limited support for styling options.
iPlotIt is the best place to get started: it loads a configuration
file and then opens an IPython shell to inspect it, and interactively load
and manipulate histograms.
Usually it can be used as
iPlotIt plots.yml
The available objects are:
config, theConfigurationobject corresponding to the top level of the YAML file (excluding the sections that are parsed separately)samples, a list ofGroupor ungroupedFileobjects (stateful, see below), which correspond to thegroupsandfilessections of the configuration file and can be used to retrieve the histograms for a plotplots, a list ofPlotobjects, which corresponds to theplotssection of the configuration filesystematics, a list of systematic uncertainties (SystVarobjects), which corresponds to thesystematicssection of the configuration filelegend, the parsedlegendsection, with the list of entries
From a script the same objects can be obtained by calling the
loadFromYAML() method.
There is one difference: this method returns a list of plots,
whereas iPlotIt provides a dictionary where each plot is stored
with its name attribute as a key—so they are equivalent,
the latter is only done for convenience.
Each file contains a histogram (possibly with systematic variations) for every plot. These are combined in groups if the file belongs to a group, or directly added as a contribution to a stack in the plot. The following example illustrates how to retrieve the histograms, and construct the expected and observed stacks for a plot:
mcSamples = [smp for smp in samples if smp.cfg.type == "MC"]
dataSamples = [smp for smp in samples if smp.cfg.type == "DATA"]
expStack = Stack(entries=[smp.getHist(plot) for smp in mcSamples])
obsStack = Stack(entries=[smp.getHist(plot) for smp in dataSamples])
The drawing of the stacks depends on the type: for MC the contributions,
which can be accessed as expStack.entries are usually drawn stacked
in different colours; for data only the sum is drawn.
The getHist method of the samples returns a
FileHist for File or a
GroupHist for Group,
which are a smart pointer to a TH1F object or the on-demand constructed
sum of them for the different files in the group, respectively.
These are described in more detail in the next section.
Architecture
This package was designed to potentially replace plotIt in the long run, so a few design choices were made with performance in mind, and others slightly over-engineered to provide maximal flexibility for future development. The two main distinctions to keep in mind are between configuration and stateful classes, and between raw histogram pointers and smart pointers.
The former is relatively straightforward, but causes some duplication:
the configuration file is initially parsed to classes that represent
the configuration, but carry no additional state; they are essentially
the dictionaries from the YAML parsing, but with some additional structure
based on the type information.
For many things this is sufficient, but for loading histograms from files
the files need to be opened, and for efficiency a pointer to the open file
should be stored.
This is why stateful File and
Group classes exist in plotit.plotit,
which carry the configuration-only part as their cfg attribute.
Smart histogram pointers are introduced for performance reasons: the most
time-consuming part of running plotIt in practice is opening ROOT files
and retrieving histograms (this can be hundreds of histograms spread out over
dozens of files for a single plot, with typical runs producing hundreds of
plots), and these histograms are also what drives the memory usage
when producing histograms in batch mode.
The FileHist class allows to control when histograms
are read from the file: it provides a handle to the histogram, but postpones
loading it from disk until the contents is first accessed.
It is also possible to force loading and unloading the TH1 objects,
which allows a simple implementation of the strategy adopted by plotIt,
where all histograms needed for a set of plots are loaded from each file in one
go, and cleaned up after the plots are produced.
FileHist is part of a class hierarchy, with
BaseHist defining the common interface and basic
functionality, and MemHist and
SumHist implementing the same interface as
FileHist for histograms that are not loaded from
a file and groups of histograms that should be added, respectively.
Stack is an extension of
SumHist that represents a stack of groups and files.
The common interface provides direct access to the TH1 objects, as well as
access to the contents and sumw2 arrays as NumPy arrays, which allows to
adopt a very pythonic style for implementing custom plots or other scripts.