tracts.phase_type_distribution

Functions

get_survival_factors(migration_matrix)

Takes a migration matrix of T generations and returns a list of length T, which is the probability of a migrant allele from that generation surviving to the present.

Classes

PhTDioecious(migration_matrix_f, ...[, ...])

A subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Dioecious Fine (DF) and Dioecious Coarse (DC) Markov approximations.

PhTMonoecious(migration_matrix[, rho])

A subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Monoecious Markov approximation.

PhaseTypeDistribution([max_remaining_tracts])

A class representing the phase-type distribution of tract lengths generated by a given (pair of) migration matrix (matrices).

class tracts.phase_type_distribution.PhTDioecious(migration_matrix_f, migration_matrix_m, rho_f, rho_m, X_chromosome=False, X_chromosome_male=False, sex_model='DC', TPED=0, setting_TP=None)

Bases: PhaseTypeDistribution

A subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Dioecious Fine (DF) and Dioecious Coarse (DC) Markov approximations.

X_chr

Whether admixture is considered on the X chromosome. Set to the value given as input by the X_chromosome parameter.

Type:

bool

X_chr_male

If X_chr is True, whether the sex of the individual at generation 0 is male. Set to the value given as input by the X_chromosome_male parameter. If not X_chr, this attribute is ignored.

Type:

bool

rho_f

The female-specific recombination rate, given by the input parameter rho_f.

rho_m

The male-specific recombination rate, given by the input parameter rho_m.

migration_matrix_f

A transformed version of the female migration matrix given as input. For internal use only.

Type:

npt.ArrayLike

migration_matrix_m

A transformed version of the male migration matrix given as input. For internal use only.

Type:

npt.ArrayLike

num_populations

The number of populations considered in the demographic model.

Type:

int

num_generations

The number of generations considered in the demographic model.

Type:

int

t0_proportions_f

The ancestry proportion in the present population from each ancestry among all the haploid copies inherited from a female parent. For autosomes, computed using Eq. (F-27) in the manuscript. For the X chromosome, computing using the recursive equations (F-28) and (F-29) in the manuscript.

Type:

npt.ArrayLike

t0_proportions_m

The ancestry proportion in the present population from each ancestry among all the haploid copies inherited from a male parent. For autosomes, computed using Eq. (F-27) in the manuscript. For the X chromosome, computing using the recursive equations (F-28) and (F-29) in the manuscript.

Type:

npt.ArrayLike

sex_model

The Dioecious approximation that is being used. Taken from the input parameter sex_model, that is one in ‘DF’, ‘DC’.

full_transition_matrix_f

The intensity matrix S(f) of the Dioecious (Fine or Coarse) Markov Model. Corresponds to Eq. (EQ) and (EQ) in the manuscript for DF and DC respectively. This submodel corresponds to the maternally (s1=xi=f) inherited alleles.

Type:

npt.ArrayLike

full_transition_matrix_m

Counterpart of full_transition_matrix_f for the paternally (s1=xi=m) inherited alleles.

Type:

npt.ArrayLike

alpha_list_f

A list containing, for each ancestral population p, the initial state of the (DF or DC) Phase-Type distribution for the tract length of maternally (s1=xi=f) inherited alleles of population p. Correspond to Eq. (EQ) in the manuscript.

Type:

list

alpha_list_m

Counterpart of alpha_list_f for the paternally (s1=xi=m) inherited alleles.

Type:

list

transition_matrices_f

A list containing, for each ancestral population p, the submatrix of full_transition_matrix_f corresponding to transitions within p. It is used to compute the distribution of tract lengths of maternally (s1=xi=f) inherited alleles from population p.

Type:

list

transition_matrices_m

Counterpart of transition_matrices_f for the paternally (s1=xi=m) inherited alleles.

Type:

list

S0_list_f

A list containing the sum across columns of every transition matrix in transition_matrices_f.

Type:

list

S0_list_m

Counterpart of S0_list_f for the paternally (s1=xi=m) inherited alleles.

Type:

list

inverse_S0_list_f

A list containing the sum across columns of the inverse of every transition matrix in transition_matrices_f.

Type:

list

inverse_S0_list_m

Counterpart of inverse_S0_list_f for the paternally (s1=xi=m) inherited alleles.

Type:

list

Parameters:
  • migration_matrix_f (npt.ArrayLike) – An array containing the female migration proportions from a discrete number of populations over the last generations. Each row is a time, each column is a population. Row zero corresponds to the current generation. T The (i,j) element of this matrix specifies the proportion of female individuals from the admixed population that are replaced by female individuals from population j at generation i. The migration rate at the last generation (migration_matrix_f[-1,:]) is the “founding generation” and should sum up to 1.

  • migration_matrix_m (npt.ArrayLike) – Counterpart of migration_matrix_f for male migration rates.

  • rho_f (float, default 1) – The female-specific recombination rate (positive real number).

  • rho_m (float, default 1) – The male-specific recombination rate (positive real number). For X chromosome admixture, this value is ignored and set to 0.

  • X_chromosome (bool, default False) – Whether admixture is considered on the X chromosome. If False, the model considers autosomal admixture.

  • X_chromosome_male (bool, default False) – If X_chromosome is True, whether the individual at generation 0 is a male. In that case, only maternally inherited alleles are taken into account. If not X_chromosome, set to False.

  • sex_model (default 'DC') – The Dioecious model to be considered. Takes the value ‘DF’ for Dioecious Fine and ‘DC’ for Dioecious Coarse.

  • TPED (int, default 0) – For internal use only.

  • setting_TP (default None) – For internal use only.

Notes

The Dioecious Coarse model (sex_model = ‘DC’) should be preferred over the Dioecious Fine model (sex_model = ‘DF’) due to its computational efficiency. Both models produce very similar or identical phase-type densities unless there is a strong sex bias in migration or recombination rates. For autosomal admixture, the Monoecious model should be used instead, for the same reasons, unless the sex bias is exceptionally strong.

PhT_CDF(x, population_number, s1=None)

Computes a Phase-Type CDF at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object togther with the specification of a population of interest.

Parameters:
  • x (float) – A point in (0, infinity) where the density function is evaluted.

  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.

  • s1 – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.

Returns:

The CDF value at x.

Return type:

float

PhT_CDF_windowed(S, alpha, S0_inv, bins, L, s1, pop_number, exp_Sx_per_bin=None, hybrid_pedigree=False)

Computes a Phase-Type CDF on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object (together with the specification of a population of interest) but also directly introduced as an input.

Parameters:
  • S (npt.ArrayLike) – The transition submatrix.

  • alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.

  • S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.

  • bins (npt.ArrayLike) – A point grid on (0, L) where the CDF has to be evaluated.

  • L (float) – The length of the finite chromosome.

  • s1 (float) – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.

  • hybrid_pedigree (bool, default False) – For internal use only. This parameter indicates whether a hybrid pedigree model is being used.

  • pop_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

Returns:

  • npt.ArrayLike – The CDF evaluated on bins.

  • float – The tract length expectation of the corresponding model considering an infinite chromosome.

  • float – The normalization factor Z of the corresponding model.

  • float – The tract length expectation on the finite chromosome of the corresponding model.

PhT_density(x, population_number, s1=None)

Computes a Phase-Type density at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object together with the specification of a population of interest.

Parameters:
  • x (float) – A point in (0, infinity) where the density function is evaluated.

  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.

  • s1 – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.

Returns:

The density value at x.

Return type:

float

PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None, hybrid_pedigree=False)

Computes a Phase-Type density on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTDioecious object (together with the specification of a population of interest) but also directly introduced as an input.

Parameters:
  • S (npt.ArrayLike) – The transition submatrix.

  • alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.

  • S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.

  • bins (npt.ArrayLike) – A point grid on (0, L) where the density has to be evaluated.

  • L (float) – The length of the finite chromosome.

  • s1 – The sex of the individual at generation 1. If s1 = 0 (resp. 1), only alleles paternally (resp. maternally) inherited alleles are considered. If set to None, tracts on both copies are combined.

  • hybrid_pedigree (bool, default False) – For internal use only. This parameter indicates whether a hybrid pedigree model is being used.

  • exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.

Returns:

  • npt.ArrayLike – The corrected bins grid as described in Notes.

  • npt.ArrayLike – The density evaluated on bins.

  • float – The tract length expectation of the corresponding model.

Notes

The code truncates bins to the interval [0,L] and adds the point L if it is not included in bins. This is done because the density is defined on the finite chromosome [0,L] as a mixture of a continuous density on [0,L) and a Dirac measure at L. Consequently, the function returns as a first argument the transformed grid, that can be used as x-axis to plot the density.

Don’t run this function directly. To get a PhT density on a finite chromosome, use tractlength_histogram_windowed setting density=True.

PhT_parameters_DC(parent_sex, T_pedigree=0, migration_setting_at_TP=None)
PhT_parameters_DF(parent_sex, computing_coarse=False, pulses=None, T_pedigree=0, migration_setting_at_TP=None)
S_matrix(states, pulses, xi, T_ped, D_model='DF')
__init__(migration_matrix_f, migration_matrix_m, rho_f, rho_m, X_chromosome=False, X_chromosome_male=False, sex_model='DC', TPED=0, setting_TP=None)
calculate_probabilities_at_population(population_number, s1)
discrete_prob_DF(pulses, state_left, state_right, T_ped=0)
full_CDF(L, S, exp_SL=None, alpha=None, S0_inv=None)

Computes the length distribution of tract lengths spanning the whole chromosome of length L.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_SL (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^SL, alpha and S0_inv.

static initialize_CDF_values(bins, S0_inv, alpha, L)
static initialize_density_bins(bins, L, alpha, S0_inv)
inner_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)

Calculates the CDF of tract lengths fully contained within the chromosome of length L.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_Sx (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^Sx, e^SL, and alpha.

loglik(bins, Ls, data, num_samples, cutoff=0)

Calculates the maximum-likelihood in a Poisson Random Field. Used to fit model parameters.

normalization_factor(L, S, S0_inv=None, alpha=None, exp_SL=None)

Computes the normalization factor Z from S0_inv and chromosome length L.

outer_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)

Calculates the length distribution of tract lengths hitting a single chromosome edge.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_Sx (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^Sx, e^SL, and alpha.

populate_CDF_values(bins, CDF_values, prop_isolated, prop_connected, exp_Sx_per_bin, S, alpha, S0_inv, L, ET, ETL, Z)
populate_density_bins(bins, population_number, ETL, prop_connected, prop_isolated, exp_Sx_per_bin, L, Z, s1, alpha, S, S0_inv)
tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)

Calculates the tract length histogram on multiple chromosomes of lengths chrom_lengths.

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

tractlength_histogram(population_number, bins, density=False)

Gets the tractlength histogram or density on evaluated on a point grid using a PhT object. This function considers an infinite chromosome.

Parameters:
  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • bins (npt.ArrayLike) – A point grid on (0, Inf) where the CDF or density have to be evaluated.

  • density (bool, default False) – If True, computes the PhT density. Else, returns the histogram values on the grid.

Returns:

If density, the density evaluated on bins. If not density, the histogram values on every interval defined by bins.

Return type:

npt.ArrayLike

tractlength_histogram_windowed(population_number, bins, L, exp_Sx_per_bin_f=None, exp_Sx_per_bin_m=None, density=False, freq=False, return_only=None, hybrid_ped=False)

Calculates the tractlength histogram or density function on a finite chromosome, using the PhTDioecious admixture model.

Parameters:
  • population_number (int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • bins (npt.ArrayLike) – A point grid where the CDF or density have to be computed.

  • L (float) – The length of the finite chromosome.

  • exp_Sx_per_bin_f (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins, for the maternally inherited alleles. Used internally to speed up computation.

  • exp_Sx_per_bin_m (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins, for the paternally inherited alleles. Used internally to speed up computation.

  • density (bool, default False) – If True, computes the PhT density values evaluated on the grid. Else, returns the histogram values on the grid.

  • freq (bool, default False) – If density is True, whether to return density on the frequency scale. If True, the density values are scaled so that their integral over (0,L) is equal to the expected number of tracts on (0,L). If False, the density values integrate to 1 over (0,L).

  • return_only (int, default None) – For internal use only, to manage the combination of maternally and paternally inherited fracts. If set to 0 (resp. 1), only paternally (resp. maternally) inherited tracts are considered. If None, tracts from both parents are combined. If the X chromosome is considered and the individual at generation 0 is male (X_chromosome_male = True), this parameter is ignored and only maternally inherited tracts are computed.

  • hybrid_ped (bool, default False) – For internal use only. Whether the hybrid pedigree model is being used. If True, no scale corrections are performed and densities or CDFs corresponding to connected components are returned, to be combined in the hybrid_pedigree module.

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

Returns:

  • npt.ArrayLike – If density is True, the corrected bins grid as described in Notes. Else, the bins provided as input.

  • npt.ArrayLike – If density is True, the PhT density evaluated on the corrected bins grid. Returned on the frequency scale if freq = True. If density is False, the histogram values on the intervals defined by bins.

  • float – The tract length expectation of the corresponding model.

Notes

When density=True, the first returned argument is a transformed version of the input bins grid. This is because the density is defined on the finite chromosome [0,L] as a mixture of a density with support on (0,L) and point masses at 0 and L. The returned bins grid removes the points 0 and L if they were included in the input bins grid, since the density is not defined at these points. The returned density values correspond to this transformed bins grid.

For details on the scale factors and the transformation of the PhT densities into histograms, see Appendix F.3 of the manuscript.

unnormalized_prob_sex_vector(pulses, state_left, T_ped=0)
class tracts.phase_type_distribution.PhTMonoecious(migration_matrix, rho=1)

Bases: PhaseTypeDistribution

A subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Monoecious Markov approximation.

migration_matrix

The migration matrix given as input without contributions at generations 0 and 1.

Type:

npt.ArrayLike

num_populations

The number of populations considered in the demographic model.

Type:

int

num_generations

The number of generations considered in the demographic model.

Type:

int

t0_proportions

The total contribution from each ancestral population. Corresponds to Eq. (EQ) in the manuscript.

Type:

npt.ArrayLike

full_transition_matrix

The intensity matrix S^M of the Monoecious Markov Model. Corresponds to Eq. (EQ) in the manuscript.

Type:

npt.ArrayLike

equilibrium_distribution

The equilibrium distribution of the Monoecious Markov Model. Corresponds to Eq. (EQ) in the manuscript.

Type:

npt.ArrayLike

alpha_list

A list containing, for each ancestral population p, the initial state of the Phase-Type distribution for the tract length of population p. Correspond to Eq. (EQ) in the manuscript.

Type:

list

transition_matrices

A list containing, for each ancestral population p, the submatrix of full_transition_matrix corresponding to transitions within p. It is used to compute the distribution of tract lengths from population p.

Type:

list

S0_list

A list containing the sum across columns of every transition matrix in transition_matrices.

Type:

list

inverse_S0_list

A list containing the sum across columns of the inverse of every transition matrix in transition_matrices.

Type:

list

Parameters:
  • migration_matrix (npt.ArrayLike) – An array containing the migration proportions from a discrete number of populations over the last generations. Each row is a time, each column is a population. Row zero corresponds to the current generation. The migration rate at the last generation (migration_matrix[-1,:]) is the “founding generation” and should sum up to 1.

  • rho (float, default 1) – The recombination rate (positive real number).

Notes

Non-listed attributes are for internal use only.

PhT_CDF(x, population_number, s1=None)

Computes a Phase-Type CDF at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious object togther with the specification of a population of interest.

Parameters:
  • x (float) – A point in (0, infinity) where the density function is evaluted.

  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.

  • s1 – Not used in the Monoecious model.

Returns:

The CDF value at x.

Return type:

float

PhT_CDF_windowed(S, alpha, S0_inv, bins, L, s1, pop_number, exp_Sx_per_bin=None)

Computes a Phase-Type CDF on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious object (together with the specification of a population of interest) but also directly introduced as an input.

Parameters:
  • S (npt.ArrayLike) – The transition submatrix.

  • alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.

  • S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.

  • bins (npt.ArrayLike) – A point grid on (0, L) where the CDF has to be evaluated.

  • L (float) – The length of the finite chromosome.

  • s1 (float) – Not used in the Monoecious model.

  • pop_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

Returns:

  • npt.ArrayLike – The CDF evaluated on bins.

  • float – The tract length expectation of the corresponding model considering an infinite chromosome.

  • float – The normalization factor Z of the corresponding model.

  • float – The tract length expectation on the finite chromosome of the corresponding model.

PhT_density(x, population_number, s1=None)

Computes a Phase-Type density at a given point x in (0, infinity). The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious or PhTDioecious object together with the specification of a population of interest.

Parameters:
  • x (float) – A point in (0, infinity) where the density function is evaluated.

  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.

  • s1 – Not used in the Monoecious model.

Returns:

The density value at x.

Return type:

float

PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None)

Computes a Phase-Type density on a finite chromosome of length L and evaluates it on a point grid. The PhT parameters (initial state, transition matrix) are taken from a PhTMonoecious object (together with the specification of a population of interest) but also directly introduced as an input.

Parameters:
  • S (npt.ArrayLike) – The transition submatrix.

  • alpha (npt.ArrayLike) – The initial state of the Phase-Type distribution.

  • S0_inv (npt.ArrayLike) – The sum across columns of the inverse of the transition submatrix.

  • bins (npt.ArrayLike) – A point grid on (0, L) where the density has to be evaluated.

  • L (float) – The length of the finite chromosome.

  • s1 – Not used in the Monoecious model.

  • exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.

Returns:

  • npt.ArrayLike – The corrected bins grid as described in Notes.

  • npt.ArrayLike – The density evaluated on bins.

  • float – The tract length expectation of the corresponding model.

Notes

The code truncates bins to the interval [0,L] and adds the point L if it is not included in bins. This is done because the density is defined on the finite chromosome [0,L] as a mixture of a continuous density on [0,L) and a Dirac measure at L. Consequently, the function returns as a first argument the transformed grid, that can be used as x-axis to plot the density.

Don’t run this function directly. To get a PhT density on a finite chromosome, use tractlength_histogram_windowed setting density=True.

__init__(migration_matrix, rho=1)
distribution_scaling_factor(population_number)

This is equal to 2 times the ancestry proportion divided by the expected length of a tract on an infinite chromosome.

full_CDF(L, S, exp_SL=None, alpha=None, S0_inv=None)

Computes the length distribution of tract lengths spanning the whole chromosome of length L.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_SL (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^SL, alpha and S0_inv.

get_TpopTau(t, pop, Tau)
get_discrete_transition_matrix()
get_equilibrium_distribution()
get_equilibrium_distribution_v2()
get_time_transition_factor(initial_time, final_time)
get_transition_matrix()
static initialize_CDF_values(bins, S0_inv, alpha, L)
static initialize_density_bins(bins, L, alpha, S0_inv)
inner_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)

Calculates the CDF of tract lengths fully contained within the chromosome of length L.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_Sx (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^Sx, e^SL, and alpha.

loglik(bins, Ls, data, num_samples, cutoff=0)

Calculates the maximum-likelihood in a Poisson Random Field. Used to fit model parameters.

normalization_factor(L, S, S0_inv=None, alpha=None, exp_SL=None)

Computes the normalization factor Z from S0_inv and chromosome length L.

outer_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)

Calculates the length distribution of tract lengths hitting a single chromosome edge.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_Sx (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^Sx, e^SL, and alpha.

populate_CDF_values(bins, CDF_values, prop_isolated, prop_connected, exp_Sx_per_bin, S, alpha, S0_inv, L, ET, ETL, Z)
populate_density_bins(bins, population_number, ETL, prop_connected, prop_isolated, exp_Sx_per_bin, L, Z, s1, alpha, S, S0_inv)
tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)

Calculates the tract length histogram on multiple chromosomes of lengths chrom_lengths.

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

tractlength_histogram(population_number, bins, density=False)

Gets the tractlength histogram or density on evaluated on a point grid using a PhT object. This function considers an infinite chromosome.

Parameters:
  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • bins (npt.ArrayLike) – A point grid on (0, Inf) where the CDF or density have to be evaluated.

  • density (bool, default False) – If True, computes the PhT density. Else, returns the histogram values on the grid.

Returns:

If density, the density evaluated on bins. If not density, the histogram values on every interval defined by bins.

Return type:

npt.ArrayLike

tractlength_histogram_windowed(population_number, bins, L, exp_Sx_per_bin=None, density=False, freq=False)

Calculates the tractlength histogram or density function on a finite chromosome, using the Monoecious (M) admixture model.

Parameters:
  • population_number (int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • bins (npt.ArrayLike) – A point grid where the CDF or density have to be computed.

  • L (float) – The length of the finite chromosome.

  • exp_Sx_per_bin (npt.ArrayLike, default None) – The precomputed values of e^(S*x) for every x in bins. Used internally to speed up computation.

  • density (bool, default False) – If True, computes the PhT density values evaluated on the grid. Else, returns the histogram values on the grid.

  • freq (bool, default False) – If density is True, whether to return density on the frequency scale.

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

Returns:

  • npt.ArrayLike – If density is True, the corrected bins grid as described in Notes. Else, the bins introduced as input.

  • npt.ArrayLike – If density is True, the PhT density evaluated on the corrected bins grid. Returned on the frequency scale if freq = True. If density is False, the histogram values on the intervals defined by bins.

  • float – The tract length expectation of the corresponding model.

class tracts.phase_type_distribution.PhaseTypeDistribution(max_remaining_tracts=1e-05)

Bases: ABC

A class representing the phase-type distribution of tract lengths generated by a given (pair of) migration matrix (matrices).

abstractmethod PhT_CDF(x, population_number, s1=None)
Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

abstractmethod PhT_CDF_windowed(S, alpha, S0_inv, bins, L, s1, pop_number, exp_Sx_per_bin=None)
Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

abstractmethod PhT_density(x, population_number, s1=None)
abstractmethod PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None)
__init__(max_remaining_tracts=1e-05)
full_CDF(L, S, exp_SL=None, alpha=None, S0_inv=None)

Computes the length distribution of tract lengths spanning the whole chromosome of length L.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_SL (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^SL, alpha and S0_inv.

static initialize_CDF_values(bins, S0_inv, alpha, L)
static initialize_density_bins(bins, L, alpha, S0_inv)
inner_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)

Calculates the CDF of tract lengths fully contained within the chromosome of length L.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_Sx (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^Sx, e^SL, and alpha.

loglik(bins, Ls, data, num_samples, cutoff=0)

Calculates the maximum-likelihood in a Poisson Random Field. Used to fit model parameters.

normalization_factor(L, S, S0_inv=None, alpha=None, exp_SL=None)

Computes the normalization factor Z from S0_inv and chromosome length L.

outer_CDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)

Calculates the length distribution of tract lengths hitting a single chromosome edge.

Parameters:
  • x (float) – The tract length at which the CDF is evaluated.

  • S (npt.ArrayLike) – The transition submatrix.

  • L (float) – The chromosome length.

  • exp_Sx (npt.ArrayLike, default None)

  • alpha (npt.ArrayLike, default None)

  • S0_inv (npt.ArrayLike, default None)

Notes

Accepts precomputed values for e^Sx, e^SL, and alpha.

populate_CDF_values(bins, CDF_values, prop_isolated, prop_connected, exp_Sx_per_bin, S, alpha, S0_inv, L, ET, ETL, Z)
populate_density_bins(bins, population_number, ETL, prop_connected, prop_isolated, exp_Sx_per_bin, L, Z, s1, alpha, S, S0_inv)
abstractmethod tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)
Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

tractlength_histogram(population_number, bins, density=False)

Gets the tractlength histogram or density on evaluated on a point grid using a PhT object. This function considers an infinite chromosome.

Parameters:
  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • bins (npt.ArrayLike) – A point grid on (0, Inf) where the CDF or density have to be evaluated.

  • density (bool, default False) – If True, computes the PhT density. Else, returns the histogram values on the grid.

Returns:

If density, the density evaluated on bins. If not density, the histogram values on every interval defined by bins.

Return type:

npt.ArrayLike

abstractmethod tractlength_histogram_windowed(population_number, bins, L, density=False, freq=False, exp_Sx_per_bin=None, exp_Sx_per_bin_f=None, exp_Sx_per_bin_m=None, return_only=None, hybrid_ped=False)
Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

tracts.phase_type_distribution.get_survival_factors(migration_matrix)

Takes a migration matrix of T generations and returns a list of length T, which is the probability of a migrant allele from that generation surviving to the present. Valid only under the monoecious model, that is, assuming unbiased migration and recombination rates for autosomal admixture.