tracts.phase_type_distribution
Functions
|
Takes a migration matrix of T generations and returns a list of length T, which is the probability of a migrant allele from that generation surviving to the present. |
Classes
|
A subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Dioecious Fine (DF) and Dioecious Coarse (DC) Markov approximations. |
|
A subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Monoecious Markov approximation. |
|
A class representing the phase-type distribution of tract lengths generated by a given (pair of) migration matrix (matrices). |
- class tracts.phase_type_distribution.PhTDioecious(migration_matrix_f, migration_matrix_m, rho_f, rho_m, X_chromosome=False, X_chromosome_male=False, sex_model='DC', TPED=0, setting_TP=None)
Bases:
PhaseTypeDistributionA subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Dioecious Fine (DF) and Dioecious Coarse (DC) Markov approximations.
- X_chr
Whether admixture is considered on the X chromosome. Set to the value given as input by the X_chromosome parameter.
- Type:
bool
- X_chr_male
If X_chr is True, whether the sex of the individual at generation 0 is male. Set to the value given as input by the X_chromosome_male parameter. If not X_chr, this attribute is ignored.
- Type:
bool
- rho_f
The female-specific recombination rate, given by the input parameter rho_f.
- rho_m
The male-specific recombination rate, given by the input parameter rho_m.
- migration_matrix_f
A transformed version of the female migration matrix given as input. For internal use only.
- Type:
npt.ArrayLike
- migration_matrix_m
A transformed version of the male migration matrix given as input. For internal use only.
- Type:
npt.ArrayLike
- num_populations
The number of populations considered in the demographic model.
- Type:
int
- num_generations
The number of generations considered in the demographic model.
- Type:
int
- t0_proportions_f
The ancestry proportion in the present population from each ancestry among all the haploid copies inherited from a female parent. For autosomes, computed using Eq. (F-27) in the manuscript. For the X chromosome, computing using the recursive equations (F-28) and (F-29) in the manuscript.
- Type:
npt.ArrayLike
- t0_proportions_m
The ancestry proportion in the present population from each ancestry among all the haploid copies inherited from a male parent. For autosomes, computed using Eq. (F-27) in the manuscript. For the X chromosome, computing using the recursive equations (F-28) and (F-29) in the manuscript.
- Type:
npt.ArrayLike
- sex_model
The Dioecious approximation that is being used. Taken from the input parameter sex_model, that is one in ‘DF’, ‘DC’.
- full_transition_matrix_f
The intensity matrix S(f) of the Dioecious (Fine or Coarse) Markov Model. Corresponds to Eq. (EQ) and (EQ) in the manuscript for DF and DC respectively. This submodel corresponds to the maternally (s1=xi=f) inherited alleles.
- Type:
npt.ArrayLike
- full_transition_matrix_m
Counterpart of full_transition_matrix_f for the paternally (s1=xi=m) inherited alleles.
- Type:
npt.ArrayLike
- alpha_list_f
A list containing, for each ancestral population p, the initial state of the (DF or DC) Phase-Type distribution for the tract length of maternally (s1=xi=f) inherited alleles of population p. Correspond to Eq. (EQ) in the manuscript.
- Type:
list
- alpha_list_m
Counterpart of alpha_list_f for the paternally (s1=xi=m) inherited alleles.
- Type:
list
- transition_matrices_f
A list containing, for each ancestral population p, the submatrix of full_transition_matrix_f corresponding to transitions within p. It is used to compute the distribution of tract lengths of maternally (s1=xi=f) inherited alleles from population p.
- Type:
list
- transition_matrices_m
Counterpart of transition_matrices_f for the paternally (s1=xi=m) inherited alleles.
- Type:
list
- S0_list_f
A list containing the sum across columns of every transition matrix in transition_matrices_f.
- Type:
list
- S0_list_m
Counterpart of S0_list_f for the paternally (s1=xi=m) inherited alleles.
- Type:
list
- inverse_S0_list_f
A list containing the sum across columns of the inverse of every transition matrix in transition_matrices_f.
- Type:
list
- inverse_S0_list_m
Counterpart of inverse_S0_list_f for the paternally (s1=xi=m) inherited alleles.
- Type:
list
- Parameters:
migration_matrix_f (npt.ArrayLike) – An array containing the female migration proportions from a discrete number of populations over the last generations. Each row is a time, each column is a population. Row zero corresponds to the current generation. T The (i,j) element of this matrix specifies the proportion of female individuals from the admixed population that are replaced by female individuals from population j at generation i. The migration rate at the last generation (migration_matrix_f[-1,:]) is the “founding generation” and should sum up to 1.
migration_matrix_m (npt.ArrayLike) – Counterpart of migration_matrix_f for male migration rates.
rho_f (float, default 1) – The female-specific recombination rate (positive real number).
rho_m (float, default 1) – The male-specific recombination rate (positive real number). For X chromosome admixture, this value is ignored and set to 0.
X_chromosome (bool, default False) – Whether admixture is considered on the X chromosome. If False, the model considers autosomal admixture.
X_chromosome_male (bool, default False) – If X_chromosome is True, whether the individual at generation 0 is a male. In that case, only maternally inherited alleles are taken into account. If not X_chromosome, set to False.
sex_model (default 'DC') – The Dioecious model to be considered. Takes the value ‘DF’ for Dioecious Fine and ‘DC’ for Dioecious Coarse.
TPED (int, default 0) – For internal use only.
setting_TP (default None) – For internal use only.
Notes
The Dioecious Coarse model (sex_model = ‘DC’) should be preferred over the Dioecious Fine model (sex_model = ‘DF’) due to its computational efficiency. Both models produce very similar or identical phase-type densities unless there is a strong sex bias in migration or recombination rates. For autosomal admixture, the Monoecious model should be used instead, for the same reasons, unless the sex bias is exceptionally strong.
- PhT_CDF(x, population_number, s1=None)
Computes a Phase-Type CDF at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object togther with the specification of a population of interest.
- Parameters:
x (float) – A point in (0, infinity) where the density function is evaluted.
population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.
s1 – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.
- Returns:
The CDF value at x.
- Return type:
float
- PhT_CDF_windowed(S, alpha, S0_inv, bins, L, s1, pop_number, exp_Sx_per_bin=None, hybrid_pedigree=False)
Computes a Phase-Type CDF on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object (together with the specification of a population of interest) but also directly introduced as an input.
- Parameters:
S (npt.ArrayLike) – The transition submatrix.
alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.
S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.
bins (npt.ArrayLike) – A point grid on (0, L) where the CDF has to be evaluated.
L (float) – The length of the finite chromosome.
s1 (float) – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.
hybrid_pedigree (bool, default False) – For internal use only. This parameter indicates whether a hybrid pedigree model is being used.
pop_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.
exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]- Returns:
npt.ArrayLike – The CDF evaluated on bins.
float – The tract length expectation of the corresponding model considering an infinite chromosome.
float – The normalization factor Z of the corresponding model.
float – The tract length expectation on the finite chromosome of the corresponding model.
- PhT_density(x, population_number, s1=None)
Computes a Phase-Type density at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object together with the specification of a population of interest.
- Parameters:
x (float) – A point in (0, infinity) where the density function is evaluated.
population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.
s1 – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.
- Returns:
The density value at x.
- Return type:
float
- PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None, hybrid_pedigree=False)
Computes a Phase-Type density on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object (together with the specification of a population of interest) but also directly introduced as an input.
- Parameters:
S (npt.ArrayLike) – The transition submatrix.
alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.
S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.
bins (npt.ArrayLike) – A point grid on (0, L) where the density has to be evaluated.
L (float) – The length of the finite chromosome.
s1 – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.
hybrid_pedigree (bool, default False) – For internal use only. This parameter indicates whether a hybrid pedigree model is being used.
exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.
- Returns:
npt.ArrayLike – The corrected bins grid as described in Notes.
npt.ArrayLike – The density evaluated on bins.
float – The tract length expectation of the corresponding model.
Notes
The code truncates bins to the interval [0,L] and adds the point L if it is not included in bins. This is done because the density is defined on the finite chromosome [0,L] as a mixture of a continuous density on [0,L) and a Dirac measure at L. Consequently, the function returns as a first argument the transformed grid, that can be used as x-axis to plot the density.
Don’t run this function directly. To get a PhT density on a finite chromosome, use tractlength_histogram_windowed setting density=True.
- PhT_parameters_DC(parent_sex, T_pedigree=0, migration_setting_at_TP=None)
- PhT_parameters_DF(parent_sex, computing_coarse=False, pulses=None, T_pedigree=0, migration_setting_at_TP=None)
- S_matrix(states, pulses, xi, T_ped, D_model='DF')
- __init__(migration_matrix_f, migration_matrix_m, rho_f, rho_m, X_chromosome=False, X_chromosome_male=False, sex_model='DC', TPED=0, setting_TP=None)
- calculate_probabilities_at_population(population_number, s1)
- discrete_prob_DF(pulses, state_left, state_right, T_ped=0)
- full_CDF(L, S, exp_SL=None, alpha=None, S0_inv=None)
Computes the length distribution of tract lengths spanning the whole chromosome of length L.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_SL (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^SL, alpha and S0_inv.
- static initialize_CDF_values(bins, S0_inv, alpha, L)
- static initialize_density_bins(bins, L, alpha, S0_inv)
- inner_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)
Calculates the CDF of tract lengths fully contained within the chromosome of length L.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_Sx (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^Sx, e^SL, and alpha.
- loglik(bins, Ls, data, num_samples, cutoff=0)
Calculates the maximum-likelihood in a Poisson Random Field. Used to fit model parameters.
- normalization_factor(L, S, S0_inv=None, alpha=None, exp_SL=None)
Computes the normalization factor Z from S0_inv and chromosome length L.
- outer_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)
Calculates the length distribution of tract lengths hitting a single chromosome edge.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_Sx (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^Sx, e^SL, and alpha.
- populate_CDF_values(bins, CDF_values, prop_isolated, prop_connected, exp_Sx_per_bin, S, alpha, S0_inv, L, ET, ETL, Z)
- populate_density_bins(bins, population_number, ETL, prop_connected, prop_isolated, exp_Sx_per_bin, L, Z, s1, alpha, S, S0_inv)
- tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)
Calculates the tract length histogram on multiple chromosomes of lengths chrom_lengths.
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- tractlength_histogram(population_number, bins, density=False)
Gets the tractlength histogram or density on evaluated on a point grid using a PhT object. This function considers an infinite chromosome.
- Parameters:
population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.
bins (npt.ArrayLike) – A point grid on (0, Inf) where the CDF or density have to be evaluated.
density (bool, default False) – If True, computes the PhT density. Else, returns the histogram values on the grid.
- Returns:
If density, the density evaluated on bins. If not density, the histogram values on every interval defined by bins.
- Return type:
npt.ArrayLike
- tractlength_histogram_windowed(population_number, bins, L, exp_Sx_per_bin_f=None, exp_Sx_per_bin_m=None, density=False, freq=False, return_only=None, hybrid_ped=False)
Calculates the tractlength histogram or density function on a finite chromosome, using the PhTDioecious admixture model.
- Parameters:
population_number (int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.
bins (npt.ArrayLike) – A point grid where the CDF or density have to be computed.
L (float) – The length of the finite chromosome.
exp_Sx_per_bin_f (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins, for the maternally inherited alleles. Used internally to speed up computation.
exp_Sx_per_bin_m (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins, for the paternally inherited alleles. Used internally to speed up computation.
density (bool, default False) – If True, computes the PhT density values evaluated on the grid. Else, returns the histogram values on the grid.
freq (bool, default False) – If density is True, whether to return density on the frequency scale. If True, the density values are scaled so that their integral over (0,L) is equal to the expected number of tracts on (0,L). If False, the density values integrate to 1 over (0,L).
return_only (int, default None) – For internal use only, to manage the combination of maternally and paternally inherited fracts. If set to 0 (resp. 1), only paternally (resp. maternally) inherited tracts are considered. If None, tracts from both parents are combined. If the X chromosome is considered and the individual at generation 0 is male (X_chromosome_male = True), this parameter is ignored and only maternally inherited tracts are computed.
hybrid_ped (bool, default False) – For internal use only. Whether the hybrid pedigree model is being used. If True, no scale corrections are performed and densities or CDFs corresponding to connected components are returned, to be combined in the hybrid_pedigree module.
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]- Returns:
npt.ArrayLike – If density is True, the corrected bins grid as described in Notes. Else, the bins provided as input.
npt.ArrayLike – If density is True, the PhT density evaluated on the corrected bins grid. Returned on the frequency scale if freq = True. If density is False, the histogram values on the intervals defined by bins.
float – The tract length expectation of the corresponding model.
Notes
When density=True, the first returned argument is a transformed version of the input bins grid. This is because the density is defined on the finite chromosome [0,L] as a mixture of a density with support on (0,L) and point masses at 0 and L. The returned bins grid removes the points 0 and L if they were included in the input bins grid, since the density is not defined at these points. The returned density values correspond to this transformed bins grid.
For details on the scale factors and the transformation of the PhT densities into histograms, see Appendix F.3 of the manuscript.
- unnormalized_prob_sex_vector(pulses, state_left, T_ped=0)
- class tracts.phase_type_distribution.PhTMonoecious(migration_matrix, rho=1)
Bases:
PhaseTypeDistributionA subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Monoecious Markov approximation.
- migration_matrix
The migration matrix given as input without contributions at generations 0 and 1.
- Type:
npt.ArrayLike
- num_populations
The number of populations considered in the demographic model.
- Type:
int
- num_generations
The number of generations considered in the demographic model.
- Type:
int
- t0_proportions
The total contribution from each ancestral population. Corresponds to Eq. (EQ) in the manuscript.
- Type:
npt.ArrayLike
- full_transition_matrix
The intensity matrix S^M of the Monoecious Markov Model. Corresponds to Eq. (EQ) in the manuscript.
- Type:
npt.ArrayLike
- equilibrium_distribution
The equilibrium distribution of the Monoecious Markov Model. Corresponds to Eq. (EQ) in the manuscript.
- Type:
npt.ArrayLike
- alpha_list
A list containing, for each ancestral population p, the initial state of the Phase-Type distribution for the tract length of population p. Correspond to Eq. (EQ) in the manuscript.
- Type:
list
- transition_matrices
A list containing, for each ancestral population p, the submatrix of full_transition_matrix corresponding to transitions within p. It is used to compute the distribution of tract lengths from population p.
- Type:
list
- S0_list
A list containing the sum across columns of every transition matrix in transition_matrices.
- Type:
list
- inverse_S0_list
A list containing the sum across columns of the inverse of every transition matrix in transition_matrices.
- Type:
list
- Parameters:
migration_matrix (npt.ArrayLike) – An array containing the migration proportions from a discrete number of populations over the last generations. Each row is a time, each column is a population. Row zero corresponds to the current generation. The migration rate at the last generation (migration_matrix[-1,:]) is the “founding generation” and should sum up to 1.
rho (float, default 1) – The recombination rate (positive real number).
Notes
Non-listed attributes are for internal use only.
- PhT_CDF(x, population_number, s1=None)
Computes a Phase-Type CDF at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious object togther with the specification of a population of interest.
- Parameters:
x (float) – A point in (0, infinity) where the density function is evaluted.
population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.
s1 – Not used in the Monoecious model.
- Returns:
The CDF value at x.
- Return type:
float
- PhT_CDF_windowed(S, alpha, S0_inv, bins, L, s1, pop_number, exp_Sx_per_bin=None)
Computes a Phase-Type CDF on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious object (together with the specification of a population of interest) but also directly introduced as an input.
- Parameters:
S (npt.ArrayLike) – The transition submatrix.
alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.
S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.
bins (npt.ArrayLike) – A point grid on (0, L) where the CDF has to be evaluated.
L (float) – The length of the finite chromosome.
s1 (float) – Not used in the Monoecious model.
pop_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.
exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]- Returns:
npt.ArrayLike – The CDF evaluated on bins.
float – The tract length expectation of the corresponding model considering an infinite chromosome.
float – The normalization factor Z of the corresponding model.
float – The tract length expectation on the finite chromosome of the corresponding model.
- PhT_density(x, population_number, s1=None)
Computes a Phase-Type density at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious or PhTDioecious object together with the specification of a population of interest.
- Parameters:
x (float) – A point in (0, infinity) where the density function is evaluated.
population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.
s1 – Not used in the Monoecious model.
- Returns:
The density value at x.
- Return type:
float
- PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None)
Computes a Phase-Type density on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious object (together with the specification of a population of interest) but also directly introduced as an input.
- Parameters:
S (npt.ArrayLike) – The transition submatrix.
alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.
S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.
bins (npt.ArrayLike) – A point grid on (0, L) where the density has to be evaluated.
L (float) – The length of the finite chromosome.
s1 – Not used in the Monoecious model.
exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.
- Returns:
npt.ArrayLike – The corrected bins grid as described in Notes.
npt.ArrayLike – The density evaluated on bins.
float – The tract length expectation of the corresponding model.
Notes
The code truncates bins to the interval [0,L] and adds the point L if it is not included in bins. This is done because the density is defined on the finite chromosome [0,L] as a mixture of a continuous density on [0,L) and a Dirac measure at L. Consequently, the function returns as a first argument the transformed grid, that can be used as x-axis to plot the density.
Don’t run this function directly. To get a PhT density on a finite chromosome, use tractlength_histogram_windowed setting density=True.
- __init__(migration_matrix, rho=1)
- distribution_scaling_factor(population_number)
This is equal to 2 times the ancestry proportion divided by the expected length of a tract on an infinite chromosome.
- full_CDF(L, S, exp_SL=None, alpha=None, S0_inv=None)
Computes the length distribution of tract lengths spanning the whole chromosome of length L.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_SL (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^SL, alpha and S0_inv.
- get_TpopTau(t, pop, Tau)
- get_discrete_transition_matrix()
- get_equilibrium_distribution()
- get_equilibrium_distribution_v2()
- get_time_transition_factor(initial_time, final_time)
- get_transition_matrix()
- static initialize_CDF_values(bins, S0_inv, alpha, L)
- static initialize_density_bins(bins, L, alpha, S0_inv)
- inner_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)
Calculates the CDF of tract lengths fully contained within the chromosome of length L.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_Sx (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^Sx, e^SL, and alpha.
- loglik(bins, Ls, data, num_samples, cutoff=0)
Calculates the maximum-likelihood in a Poisson Random Field. Used to fit model parameters.
- normalization_factor(L, S, S0_inv=None, alpha=None, exp_SL=None)
Computes the normalization factor Z from S0_inv and chromosome length L.
- outer_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)
Calculates the length distribution of tract lengths hitting a single chromosome edge.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_Sx (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^Sx, e^SL, and alpha.
- populate_CDF_values(bins, CDF_values, prop_isolated, prop_connected, exp_Sx_per_bin, S, alpha, S0_inv, L, ET, ETL, Z)
- populate_density_bins(bins, population_number, ETL, prop_connected, prop_isolated, exp_Sx_per_bin, L, Z, s1, alpha, S, S0_inv)
- tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)
Calculates the tract length histogram on multiple chromosomes of lengths chrom_lengths.
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- tractlength_histogram(population_number, bins, density=False)
Gets the tractlength histogram or density on evaluated on a point grid using a PhT object. This function considers an infinite chromosome.
- Parameters:
population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.
bins (npt.ArrayLike) – A point grid on (0, Inf) where the CDF or density have to be evaluated.
density (bool, default False) – If True, computes the PhT density. Else, returns the histogram values on the grid.
- Returns:
If density, the density evaluated on bins. If not density, the histogram values on every interval defined by bins.
- Return type:
npt.ArrayLike
- tractlength_histogram_windowed(population_number, bins, L, exp_Sx_per_bin=None, density=False, freq=False)
Calculates the tractlength histogram or density function on a finite chromosome, using the Monoecious (M) admixture model.
- Parameters:
population_number (int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.
bins (npt.ArrayLike) – A point grid where the CDF or density have to be computed.
L (float) – The length of the finite chromosome.
exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.
density (bool, default False) – If True, computes the PhT density values evaluated on the grid. Else, returns the histogram values on the grid.
freq (bool, default False) – If density is True, whether to return density on the frequency scale.
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]- Returns:
npt.ArrayLike – If density is True, the corrected bins grid as described in Notes. Else, the bins introduced as input.
npt.ArrayLike – If density is True, the PhT density evaluated on the corrected bins grid. Returned on the frequency scale if freq = True. If density is False, the histogram values on the intervals defined by bins.
float – The tract length expectation of the corresponding model.
- class tracts.phase_type_distribution.PhaseTypeDistribution(max_remaining_tracts=1e-05)
Bases:
ABCA class representing the phase-type distribution of tract lengths generated by a given (pair of) migration matrix (matrices).
- abstractmethod PhT_CDF(x, population_number, s1=None)
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- abstractmethod PhT_CDF_windowed(S, alpha, S0_inv, bins, L, s1, pop_number, exp_Sx_per_bin=None)
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- abstractmethod PhT_density(x, population_number, s1=None)
- abstractmethod PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None)
- __init__(max_remaining_tracts=1e-05)
- full_CDF(L, S, exp_SL=None, alpha=None, S0_inv=None)
Computes the length distribution of tract lengths spanning the whole chromosome of length L.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_SL (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^SL, alpha and S0_inv.
- static initialize_CDF_values(bins, S0_inv, alpha, L)
- static initialize_density_bins(bins, L, alpha, S0_inv)
- inner_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)
Calculates the CDF of tract lengths fully contained within the chromosome of length L.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_Sx (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^Sx, e^SL, and alpha.
- loglik(bins, Ls, data, num_samples, cutoff=0)
Calculates the maximum-likelihood in a Poisson Random Field. Used to fit model parameters.
- normalization_factor(L, S, S0_inv=None, alpha=None, exp_SL=None)
Computes the normalization factor Z from S0_inv and chromosome length L.
- outer_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)
Calculates the length distribution of tract lengths hitting a single chromosome edge.
- Parameters:
x (float) – The tract length at which the CDF is evaluated.
S (npt.ArrayLike) – The transition submatrix.
L (float) – The chromosome length.
exp_Sx (npt.ArrayLike, default None)
alpha (npt.ArrayLike, default None)
S0_inv (npt.ArrayLike, default None)
Notes
Accepts precomputed values for e^Sx, e^SL, and alpha.
- populate_CDF_values(bins, CDF_values, prop_isolated, prop_connected, exp_Sx_per_bin, S, alpha, S0_inv, L, ET, ETL, Z)
- populate_density_bins(bins, population_number, ETL, prop_connected, prop_isolated, exp_Sx_per_bin, L, Z, s1, alpha, S, S0_inv)
- abstractmethod tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- tractlength_histogram(population_number, bins, density=False)
Gets the tractlength histogram or density on evaluated on a point grid using a PhT object. This function considers an infinite chromosome.
- Parameters:
population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.
bins (npt.ArrayLike) – A point grid on (0, Inf) where the CDF or density have to be evaluated.
density (bool, default False) – If True, computes the PhT density. Else, returns the histogram values on the grid.
- Returns:
If density, the density evaluated on bins. If not density, the histogram values on every interval defined by bins.
- Return type:
npt.ArrayLike
- abstractmethod tractlength_histogram_windowed(population_number, bins, L, density=False, freq=False, exp_Sx_per_bin=None, exp_Sx_per_bin_f=None, exp_Sx_per_bin_m=None, return_only=None, hybrid_ped=False)
- Return type:
Union[Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[bool|int|float|complex|str|bytes]]
- tracts.phase_type_distribution.get_survival_factors(migration_matrix)
Takes a migration matrix of T generations and returns a list of length T, which is the probability of a migrant allele from that generation surviving to the present. Valid only under the monoecious model, that is, assuming unbiased migration and recombination rates for autosomal admixture.