tracts.phase_type.dioecious.PhTDioecious#
- class PhTDioecious(migration_matrix_f, migration_matrix_m, rho_f, rho_m, X_chromosome=False, X_chromosome_male=False, sex_model='DC', TPED=0, setting_TP=None)#
Bases:
PhaseTypeDistributionA subclass of
PhaseTypeDistributionproviding the specific Phase-Type tools for the Dioecious Fine (DF) and Dioecious Coarse (DC) Markov approximations.- X_chr#
Whether admixture is considered on the X chromosome. Set to the value given as input by the X_chromosome parameter.
- Type:
bool
- X_chr_male#
If X_chr is True, whether the sex of the individual at generation 0 is male. Set to the value given as input by the X_chromosome_male parameter. If X_chr is False, this attribute is ignored.
- Type:
bool
- rho_f#
The female-specific recombination rate \(\rho_f\), given by the input parameter rho_f.
- rho_m#
The male-specific recombination rate \(\rho_m\), given by the input parameter rho_m.
- migration_matrix_f#
A transformed version of the female migration matrix given as input. For internal use only.
- Type:
npt.ArrayLike
- migration_matrix_m#
A transformed version of the male migration matrix given as input. For internal use only.
- Type:
npt.ArrayLike
- num_populations#
The number of populations considered in the demographic model.
- Type:
int
- num_generations#
The number of generations considered in the demographic model.
- Type:
int
- t0_proportions_f#
The ancestry proportion in the present population from each ancestry among all the haploid copies inherited from a female parent. For autosomes, computed using Eq. (F-27) in the manuscript. For the X chromosome, computing using the recursive equations (F-28) and (F-29) in the manuscript.
- Type:
npt.ArrayLike
- t0_proportions_m#
The ancestry proportion in the present population from each ancestry among all the haploid copies inherited from a male parent. For autosomes, computed using Eq. (F-27) in the manuscript. For the X chromosome, computing using the recursive equations (F-28) and (F-29) in the manuscript.
- Type:
npt.ArrayLike
- sex_model#
The Dioecious approximation that is being used. Taken from the input parameter sex_model, that is one in ‘DF’, ‘DC’.
- full_transition_matrix_f#
The intensity matrix \(\mathbf{S}_f\) of the Dioecious (Fine or Coarse) Markov odel. Corresponds to Eq. (EQ) and (EQ) in the manuscript for DF and DC, respectively. This submodel corresponds to the maternally inherited allele (\(\xi=f\)).
- Type:
npt.ArrayLike
- full_transition_matrix_m#
The intensity matrix \(\mathbf{S}_m\) of the Dioecious (Fine or Coarse) Markov model. Corresponds to Eq. (EQ) and (EQ) in the manuscript for DF and DC, respectively. This submodel corresponds to the paternally inherited allele (\(\xi=m\)).
- Type:
npt.ArrayLike
- alpha_list_f#
A list containing, for each ancestral population, the initial state of the Phase-type distribution for maternally inherited tracts (\(\xi=f\)). Corresponds to Eq. (EQ) in the manuscript.
- Type:
list
- alpha_list_m#
A list containing, for each ancestral population, the initial state of the Phase-type distribution for paternally inherited tracts (\(\xi=m\)). Corresponds to Eq. (EQ) in the manuscript.
- Type:
list
- transition_matrices_f#
A list containing, for each ancestral population, the submatrix of full_transition_matrix_f corresponding to transitions within that population. It is used to compute the distribution of tract lengths of maternally (\(\xi=f\)) inherited alleles.
- Type:
list
- transition_matrices_m#
A list containing, for each ancestral population, the submatrix of full_transition_matrix_m corresponding to transitions within that population. It is used to compute the distribution of tract lengths of paternally (\(\xi=m\)) inherited alleles.
- Type:
list
- S0_list_f#
A list containing the sum across columns of every transition matrix in transition_matrices_f.
- Type:
list
- S0_list_m#
A list containing the sum across columns of every transition matrix in transition_matrices_m.
- Type:
list
- inverse_S0_list_f#
A list containing the sum across columns of the inverse of every transition matrix in transition_matrices_f.
- Type:
list
- inverse_S0_list_m#
A list containing the sum across columns of the inverse of every transition matrix in transition_matrices_m.
- Type:
list
- Parameters:
migration_matrix_f (
ndarray) – An array containing the female migration proportions from a discrete number of populations over the last generations. Each row is a time, each column is a population. Row zero corresponds to the current generation. The \((i,j)\) element of this matrix specifies the proportion of female individuals from the admixed population that are replaced by female individuals from population \(j\) at generation \(i\). The migration rate at the last generation (migration_matrix_f[-1,:]) must sum up to 1.migration_matrix_m (
ndarray) – Counterpart of migration_matrix_f for male migration rates.rho_f (
float) – The female-specific recombination rate.rho_m (
float) – The male-specific recombination rate. For X chromosome admixture, this value is ignored and set to 0.X_chromosome (
bool) – Whether admixture is considered on the X chromosome. If False, the model considers autosomal admixture.X_chromosome_male (
bool) – If X_chromosome is True, whether the individual at generation 0 is a male. In that case, only maternally inherited alleles are taken into account. If X_chromosome is False, this parameter is ignored.sex_model (
str) – The Dioecious model to be considered. Takes the value ‘DF’ for Dioecious Fine and ‘DC’ for Dioecious Coarse.
Notes
The Dioecious Coarse model (sex_model = ‘DC’) should be preferred over the Dioecious Fine model (sex_model = ‘DF’) due to its computational efficiency. Both models produce very similar or identical phase-type densities unless there is a strong sex-bias in migration or recombination rates. For autosomal admixture, the Monoecious model should be used instead, for the same reasons, unless the sex bias is exceptionally strong.
Non-listed parameters are for internal use only.
- PhT_CDF(x, population_number, s1=None)#
Computes a Phase-type CDF at a given point \(x\) in \((0, \infty)\). The Phase-type parameters (initial state, transition matrix) are taken from a PhTDioecious object togther with the specification of a population of interest.
- Parameters:
x (
float) – A point in \((0, \infty)\) where the density function is evaluted.population_number (
int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.s1 (
int|None) – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.
- Returns:
The CDF value at \(x\).
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- PhT_CDF_windowed(S, alpha, S0_inv, bins, L, s1, pop_number, exp_Sx_per_bin=None, hybrid_pedigree=False)#
Computes a Phase-Type CDF on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object (together with the specification of a population of interest) but also directly introduced as an input.
- Parameters:
S (npt.ArrayLike) – The transition submatrix.
alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.
S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.
bins (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – A point grid on (0, L) where the CDF has to be evaluated.L (
float) – The length of the finite chromosome.s1 (
float) – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.hybrid_pedigree (bool, default False) – For internal use only. This parameter indicates whether a hybrid pedigree model is being used.
pop_number (
int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.exp_Sx_per_bin (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.
- Return type:
tuple[ndarray,float,float,float]- Returns:
npt.ArrayLike – The CDF evaluated on bins.
float – The tract length expectation of the corresponding model considering an infinite chromosome.
float – The normalization factor Z of the corresponding model.
float – The tract length expectation on the finite chromosome of the corresponding model.
- PhT_density(x, population_number, s1=None)#
Computes a Phase-type density at a given point \(x\) in \((0, \infty)\). The Phase-type parameters (initial state, transition matrix) are taken from a PhTDioecious object together with the specification of a population of interest.
- Parameters:
x (
float) – A point in \((0, \infty)\) where the density function is evaluated.population_number (
int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.s1 (
int|None) – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.
- Returns:
The density value at \(x\).
- Return type:
float
- PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None, hybrid_pedigree=False)#
Computes a Phase-type density on a finite chromosome of length \(L\) and evaluates it on a point grid. The Phase-type parameters (initial state, transition matrix) are taken from a
PhTDioeciousobject (together with the specification of a population of interest) but also directly introduced as an input.- Parameters:
population_number (
int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.S (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – The transition submatrix.alpha (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – The initial state of the Phase-type distribution.S0_inv (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – The sum across columns of the inverse of the transition submatrix.bins (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – A point grid on \((0, L)\) where the density has to be evaluated.L (
float) – The length of the finite chromosome.s1 (
int|None) – The sex of the individual at generation \(t=1\). If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.hybrid_pedigree (bool, default False) – For internal use only. This parameter indicates whether a hybrid pedigree model is being used.
exp_Sx_per_bin (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – The precomputed values of \(e^{\mathbf{S}x}\) for every \(x\) in bins. Used internally to speed up computation.
- Returns:
npt.ArrayLike – The corrected bins grid as described in Notes.
npt.ArrayLike – The density evaluated on bins.
float – The tract length expectation of the corresponding model.
Notes
The code truncates bins to the interval \([0,L]\) and adds the point \(L\) if it is not included in bins. This is done because the density is defined on the finite chromosome \([0,L]\) as a mixture of a continuous density on \([0,L)\) and a Dirac measure at \(L\). Consequently, the function returns as a first argument the transformed grid, that can be used as x-axis to plot the density.
Don’t run this function directly. To get a Phase-type density on a finite chromosome, use
tractlength_histogram_windowed()setting density=True.
- PhT_parameters_DC(parent_sex, T_pedigree=0, migration_setting_at_TP=None)#
Computes the parameters of the Dioecious-Coarse model, given the sex of the parent at generation \(t=1\), that is, the value of \(\xi\).
- Parameters:
parent_sex (
int) – The sex of the individual at generation \(t=1\). If parent_sex=0 (resp. parent_sex=1), paternally- (resp. maternally-) inherited tracts are considered.T_pedigree (
int) – The number of generations in the pedigree when computing the hybrid-pedigree refinement of this model. If the hybrid-pedigree refinement is not being computed, this parameter is ignored.migration_setting_at_TP (
ndarray) – A binary matrix of shape (T_pedigree, number of populations) describing the migration setting at generation T_pedigree for the hybrid-pedigree refinement of this model. The entry at row t and column p is 0 if the ancestor from population p is admixed at generation t, and 1 otherwise. If the hybrid-pedigree refinement is not being computed, this parameter is ignored.
- Returns:
S (npt.ArrayLike) – The transition matrix of the Dioecious-Coarse model.
source_pops (npt.ArrayLike) – The populations from which tracts can be drawn in the model.
sub_matrices (list of npt.ArrayLike) – The list of transition sub-matrices corresponding to tracts drawn from each source population.
alpha_list (list of npt.ArrayLike) – The list of initial states corresponding to tracts drawn from each source population.
Notes
Transition probabilities for the Dioecious-Coarse model are computed by appropriately averaging transition probabilities under the Dioecious-Fine model, as the DC model is built as a quotient Markov chain of the DF model.
- PhT_parameters_DF(parent_sex, computing_coarse=False, pulses=None, T_pedigree=0, migration_setting_at_TP=None)#
Computes the parameters of the Phase-type distribution under the Dioecious-Fine admixture model.
- Parameters:
parent_sex (
int) – The sex of the parent from which the tract is inherited. Must be either 0 (paternal inheritance) or 1 (maternal inheritance).computing_coarse (
bool) – Whether the parameters are computed for the coarse model, as this function is called in that case. If False, the parameters are computed for the fine model.pulses (
ndarray) – The pulse matrix of the model, of shape (number of pulses, 4), as specified indiscrete_prob_DF(). If not provided, it is computed from the migration matrices.T_pedigree (
int) – If the hybrid-pedigree refinement is used, the number of generations included in the pedigree. Ignored otherwise.migration_setting_at_TP (
ndarray) – If the hybrid-pedigree refinement is used, the migration setting at the time of pedigree truncation. Seehybrid_pedigree_distribution()for details. Ignored otherwise.
- Returns:
npt.ArrayLike – The transition matrix of the Phase-type distribution.
npt.ArrayLike – The source populations involved in the model.
npt.ArrayLike – The transition submatrices for each source population.
npt.ArrayLike – The initial state of the Phase-type distribution.
- S_matrix(states, pulses, T_ped, D_model='DF')#
Compute the transition matrix of the TCMC defined by the Dioecious admixture model. Corresponds to Equation (EQ) for the Dioecious-Fine model and to Equation (EQ) for the Dioecious-Coarse model.
- Parameters:
states (npt.ArrayLike) – The state space of the process, represented as a matrix of shape (number of states, 3 + T), where T is the maximum generation of migration. Each row corresponds to a state and is of the form [\(p\), \({\delta}\), \(m_p^{\delta}(t)\), \(s_0\), \(s_1\), …, \(s_{t-1}\)], where \(p\) is the ancestral population, \({\delta}\) is the sex of the ancestor.
pulses (np.ndarray) – The pulse matrix of the model, of shape (number of pulses, 4), as specified in
discrete_prob_DF().T_ped (int) – If the hybrid-pedigree refinement is used, the number of generations included in the pedigree. Ignored otherwise.
D_model (str) – The type of discrete model for which the transition matrix is computed. Must be either ‘DF’ for the Dioecious-Fine model or ‘DC’ for the Dioecious-Coarse model.
- Returns:
The transition matrix of the TCMC defined by the Dioecious admixture model, in compressed sparse row format. The order of the states in the matrix corresponds to the order of the states in the input states matrix.
- Return type:
sparse.csr_matrix
- __init__(migration_matrix_f, migration_matrix_m, rho_f, rho_m, X_chromosome=False, X_chromosome_male=False, sex_model='DC', TPED=0, setting_TP=None)#
Initializes the PhTDioecious object by constructing the transition matrix and the initial state of the Phase-Type distribution.
- discrete_prob_DF(pulses, state_left, state_right, T_ped=0)#
Compute the transition probabilities between a pair of states for the embedded Dioecious-Fine Markov model, which is a discrete TCMC. This corresponds to Equation (EQ) in the manuscript.
- Parameters:
pulses (
ndarray) – Pulse matrix of the model, with shape(number of pulses, 4). Each row is of the form \((p, \delta, t, m_p^{\delta}(t))\).state_left (
ndarray) – Current state of the embedded process; see Notes.state_right (
ndarray) – Next state of the embedded process; see Notes.T_ped (
int) – Number of generations included in the pedigree when the hybrid-pedigree refinement is used. Ignored otherwise.
Notes
States in the DF model are of the form \((\delta, p, \vec{s})\), where \(\delta\) is the sex of the ancestor, \(p\) is their ancestral population, and \(\vec{s}\) contains the sexes of all individuals that carried the haplotype from the present time up to the ancestor.
In the code, states are represented as vectors of length \(3 + t\) of the form \((p, \delta, m_p^{\delta}(t), s_0, s_1, \ldots, s_{t-1})\), where \(t\) is the generation of the ancestor. See Section (SEC) in the manuscript for details on the DF model.
- submodel_probabilities(population_number, s1)#
Computes the probability for a tract at generation \(t=0\) to be drawn from the connected Markov model or the single-state isolated models that yield tracts of full length. See Appendix G in the manuscript for details.
- Parameters:
population_number (
int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.s1 (
int|None) – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.
- Returns:
float – The probability for a tract at generation \(t=0\) to be drawn from the single-state isolated model that yields tracts of full length.
float – The probability for a tract at generation \(t=0\) to be drawn from the connected Markov model.
- tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)#
Calculates the tract length histogram on multiple chromosomes of different lengths.
- Parameters:
population_number (
int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.bins (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – A point grid where the histogram has to be computed. The same grid is used for all chromosomes, and should be defined on the interval (0, max(chrom_lengths)).chrom_lengths (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – A list of chromosome lengths.
- Returns:
The histogram values on the intervals defined by bins, summed across all chromosomes.
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- tractlength_histogram_windowed(population_number, bins, L, exp_Sx_per_bin_f=None, exp_Sx_per_bin_m=None, density=False, freq=False, return_only=None, hybrid_ped=False)#
Calculates the tractlength histogram or density function on a finite chromosome, using the
PhTDioeciousadmixture model.- Parameters:
population_number (
int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.bins (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – A point grid where the CDF or density have to be computed.L (
float) – The length of the finite chromosome.exp_Sx_per_bin_f (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – The precomputed values of \(e^{\mathbf{S}x}\) for every \(x\) in bins, for the maternally inherited alleles. Used internally to speed up computation.exp_Sx_per_bin_m (
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]) – The precomputed values of \(e^{\mathbf{S}x}\) for every \(x\) in bins, for the paternally inherited alleles. Used internally to speed up computation.density (
bool) – If True, computes the Phase-type density values evaluated on the grid. Else, returns the histogram values on the grid.freq (
bool) – If density is True, whether to return density on the frequency scale. If True, the density values are scaled so that their integral over \((0,L)\) is equal to the expected number of tracts on \((0,L)\). If False, the density values integrate to 1 over \((0,L)\).return_only (
int|None) – For internal use only. Manages the combination of maternally and paternally inherited fracts. If set to 0 (resp. 1), only paternally (resp. maternally) inherited tracts are considered. If None, tracts from both parents are combined. If the X chromosome is considered and the individual at generation 0 is male (X_chromosome_male = True), this parameter is ignored and only maternally inherited tracts are computed.hybrid_ped (
bool) – For internal use only. Whether the hybrid pedigree model is being used. If True, no scale corrections are performed and densities or CDFs corresponding to connected components are returned, to be combined in the hybrid_pedigree module.
- Return type:
tuple[ndarray,ndarray,float]- Returns:
npt.ArrayLike – If density is True, the corrected bins grid as described in Notes. Else, the bins provided as input.
npt.ArrayLike – If density is True, the Phase-type density evaluated on the corrected bins grid. Returned on the frequency scale if freq = True. If density is False, the histogram values on the intervals defined by bins.
float – The tract length expectation of the corresponding model.
Notes
When density is True, the first returned argument is a transformed version of the input bins grid. This is because the density is defined on the finite chromosome \([0,L]\) as a mixture of a density with support on \((0,L)\) and point masses at 0 and L. The returned bins grid removes the points \(0\) and \(L\) if they were included in the input bins grid, since the density is not defined at these points. The returned density values correspond to this transformed bins grid.
For details on the scale factors and the transformation of the Phase-type densities into histograms, see Appendix F.3 of the manuscript.
The return_only parameter is used to select only maternally or paternally inherited tracts. Besides controlling the case of allosomal admixture on male individuals, it is used to return distributions corresponding to connected components in the hybrid pedigree model, that need to be combined a posteriori: connected components corresponding to maternally (resp. paternally) -inherited alleles are first combined into one maternally (resp. paternally) -inherited distribution. Then, the resulting pair of Phase-type mixtures is combined at the end. See hybrid_pedigree.py for details.
- unnormalized_prob_sex_vector(pulses, state_left, T_ped=0)#
Compute the unnormalized probability of the sex vector. Corresponds to Equation (EQ) in the manuscript.
- Parameters:
pulses (
ndarray) – The pulse matrix of the model, of shape (number of pulses, 4), as specified indiscrete_prob_DF().state_left (
ndarray) – The current state of the embedded process.T_ped (
int) – If the hybrid-pedigree refinement is used, the number of generations included in the pedigree. Ignored otherwise.