Ascertainment correction¶
Ascertainment correction refers to conditioning likelihood calculations on the fact that certain patterns of data are guaranteed not to occur in the data for some reason. If ascertainment correction is not peformed in an analysis where some kinds of data are guaranteed not to occur, BEAST will think the absence of that kind of data is simply due to it never having evolved, and this can bias e.g. the timing of trees.
Ascertainment correction is relevant to BEASTling analyses in two cases. BEASTling tries to automatically do “the right thing” when it can, but the interaction between different considerations can be confusing, so this page attempts to lay everything out clearly.
Constant feature ascertainment correction¶
One is the case of constant features. It is common to remove constant features
from analyses because they cannot inform the tree topology or timing. E.g. if
all the languages in your dataset use SOV word order, there is no point
including this because it can’t help to separate out clades. In fact, if you
are using any non-binary substitution model (like LewisMk or BSVS), you have no
choice but to remove it, because these models infer their state space from
looking at the data, and they do not make any sense for a state space of just
one value. BEASTling will remove these features for you automatically and there
is no way to override this, as it makes no sense. If you are using a binary
substitution model, like BinaryCovarion, it is actually possible to leave
constant features in. The model defines its own state space, so even if some
feature only has values of 0 in your dataset, BEAST knows that the alternative
value 1 exists. BEASTling will still remove constant features by default to
make your anlyses smaller and faster, but if in this case you can override this
behaviour by setting remove_constant_features=False
in the [model]
section. You might like to do this if you are inferring per-feature rates, for
instance.
You can explicitly tell BEASTling to include ascertainment correction for the
absence of constant features (or not) by setting ascertained
to True
or
False
in a [model]
section. BEASTling will do as it is told. However,
if you remain silent, the following will happen.
Since ascertainment correction only affects the timing of the inferred trees, if you have not placed any time calibrations in your analysis, BEASTling will assume you are primarily interested in tree topology and will not perform ascertainment correction. Feel free to tell it otherwise!
If you have included time calibrations, BEASTling will include constant feature ascetainment correction if it thinks its needs to, according to the following logic.
If you are using a substitution model which does not define its own statespace, then constant features make no sense and will be forcibly removed from the analysis, so you definitely need the correction and it will be included.
What if you are using a substitution model which does define its own statespace, such as BinaryCovarion?
If BEASTling finds constant features in your dataset and removes them (as is its
default behaviour), then you definitely need the correction and it will be
included. If you use remove_constant_features=False
and BEASTling notices
that constant features do in fact exist and have been let through into the
anslysis, then you definitely do not need the correction and so it will not be
included.
Where things get a little tricky is when there are no constant features in your
dataset. If BEASTling hasn’t seen any constant features for itself (whether it
subsequently removed them or not) it can’t make an educated guess as to whether
or not the ascertainment correction is necessary. In this case, BEASTling errs
on the side of caution and enables the correction based on the developers’
beliefs that most data sources without constant features probably are
deliberately collected in such a way as to exclude constant features. If this
is not true in your case and you want to disable the correction, use
ascertained=False
.
Binarised data ascertainment correction¶
The second case of ascertainment correction relevant to BEASTling is ascertainment correction for binarised data. It is common to model lexical evolution by treating each cognate class associated with a particular meaning slot as an indendent binary feature, and setting a datapoint to 1 or 0 depending upon whether a language’s word for that meaning does or does not belong to the corresponding cognate class. A consequence of this representation is that your data will never contain a feature whose value is 0 for every language, because every cognate class must have at least one word belonging to it.
BEASTling offers less control over whether or not this kind of ascertainment correction is performed (this is purely for historical/backwared compatibility reasons and may change in future). Regardless of whether or not you have any timing calibrations in place, the correction will be performed if BEASTling believes that your data is in this binarised format. This is true if:
- BEASTling has done the binarisation for you
- You have done the binarisation yourself and used
binarised=True
to inform BEASTling of this fact.
Otherwise it will not be performed.