tracts.phase_type.monoecious.PhTMonoecious#

class PhTMonoecious(migration_matrix, rho=1)#

Bases: PhaseTypeDistribution

A subclass of PhaseTypeDistribution providing the specific Phase-Type tools for the Monoecious Markov approximation.

migration_matrix#

The migration matrix given as input without contributions at generations 0 and 1.

Type:

npt.ArrayLike

num_populations#

The number of populations considered in the demographic model.

Type:

int

num_generations#

The number of generations considered in the demographic model.

Type:

int

t0_proportions#

The total contribution from each ancestral population.

Type:

npt.ArrayLike

full_transition_matrix#

The intensity matrix \(\mathbf{S}^M\) of the Monoecious Markov Model.

Type:

npt.ArrayLike

equilibrium_distribution#

The equilibrium distribution of the Monoecious Markov Model.

Type:

npt.ArrayLike

alpha_list#

A list containing, for each ancestral population, the initial state of the population-specific Phase-Type distribution.

Type:

list

transition_matrices#

A list containing, for each ancestral population, the submatrix of full_transition_matrix corresponding to transitions within the population. It is used to compute the population-specificdistribution of tract lengths.

Type:

list

S0_list#

A list containing the sum across columns of every transition matrix in transition_matrices.

Type:

list

inverse_S0_list#

A list containing the sum across columns of the inverse of every transition matrix in transition_matrices.

Type:

list

Parameters:
  • migration_matrix (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – An array containing the migration proportions from a discrete number of populations over the last generations. Each row is a time, each column is a population. Row zero corresponds to the current generation. The migration rate at the last generation (migration_matrix[-1,:]) is the founding generation and should sum up to 1.

  • rho (float) – The recombination rate.

Notes

Non-listed attributes are for internal use only.

PhT_CDF(x, population_number, s1=None)#

Computes a Phase-type CDF at a given point \(x\) in \((0, \infty)\). The Phase-type parameters (initial state, transition matrix) are taken from a PhTMonoecious object togther with the specification of a population of interest.

Parameters:
  • x (float) – A point in \((0, \infty)\) where the density function is evaluted.

  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.

  • s1 – Not used in the Monoecious model.

Returns:

The CDF value at \(x\).

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

PhT_CDF_windowed(S, alpha, S0_inv, bins, L, pop_number, s1=None, exp_Sx_per_bin=None)#

Computes a Phase-type CDF on a finite chromosome of length \(L\) and evaluates it on a point grid. The Phase-type parameters (initial state, transition matrix) are taken from a PhTMonoecious object (together with the specification of a population of interest) but also directly introduced as an input.

Parameters:
  • S (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The transition submatrix.

  • alpha (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The initial state of the Phase-type distribution.

  • S0_inv (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The sum across columns of the inverse of the transition submatrix.

  • bins (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – A point grid on \((0, L)\) where the CDF has to be evaluated.

  • L (float) – The length of the finite chromosome.

  • s1 (float | None) – Not used in the Monoecious model.

  • pop_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • exp_Sx_per_bin (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The precomputed values of \(e^{Sx}\) for every \(x\) in bins. Used internally to speed up computation.

Return type:

tuple[ndarray, float, float, float]

Returns:

  • npt.ArrayLike – The CDF evaluated on bins.

  • float – The tract length expectation of the corresponding model considering an infinite chromosome.

  • float – The normalization factor \(Z\) of the corresponding model.

  • float – The tract length expectation on the finite chromosome of the corresponding model.

PhT_density(x, population_number, s1=None)#

Computes a Phase-type density at a given point \(x\) in \((0, \infty)\). The Phase-type parameters (initial state, transition matrix) are taken from a PhTMonoecious object together with the specification of a population of interest.

Parameters:
  • x (float) – A point in \((0, \infty)\) where the density function is evaluated.

  • population_number (int) – The population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1.

  • s1 – Not used in the Monoecious model.

Returns:

The density value at \(x\).

Return type:

float

PhT_density_windowed(population_number, S, alpha, S0_inv, bins, L, s1=None, exp_Sx_per_bin=None)#

Computes a Phase-type density on a finite chromosome of length \(L\) and evaluates it on a point grid. The Phase-type parameters (initial state, transition matrix) are taken from a PhTMonoecious object (together with the specification of a population of interest) but also directly introduced as an input.

Parameters:
  • S (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The transition submatrix.

  • alpha (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The initial state of the Phase-type distribution.

  • S0_inv (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The sum across columns of the inverse of the transition submatrix.

  • bins (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – A point grid on \((0, L)\) where the density has to be evaluated.

  • L (float) – The length of the finite chromosome.

  • s1 (float, default None) – Not used in the Monoecious model.

  • exp_Sx_per_bin (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The precomputed values of \(e^{Sx}\) for every \(x\) in bins. Used internally to speed up computation.

Returns:

  • npt.ArrayLike – The corrected bins grid as described in Notes.

  • npt.ArrayLike – The density evaluated on bins.

  • float – The tract length expectation of the corresponding model.

Notes

The code truncates bins to the interval \([0, L]\) and adds the point \(L\) if it is not included in bins. This is done because the density is defined on the finite chromosome \([0, L]\) as a mixture of a continuous density on \([0, L)\) and a Dirac measure at \(L\). Consequently, the function returns as a first argument the transformed grid, that can be used as x-axis to plot the density.

Don’t run this function directly. To get a Phase-type density on a finite chromosome, use tractlength_histogram_windowed() setting density=True.

__init__(migration_matrix, rho=1)#

Initializes the PhTMonoecious object by constructing the transition matrix and the initial state of the Phase-Type distribution.

distribution_scaling_factor(population_number)#

Computes the scaling factor to transform the CDF values into counts.

Parameters:

population_number (int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

Returns:

The scaling factor to transform the CDF values into counts.

Return type:

float

get_equilibrium_distribution()#

Computes the equilibrium distribution of the Monoecious Phase-type model.

Returns:

The equilibrium distribution of the Monoecious Phase-type model.

Return type:

npt.ArrayLike

get_transition_matrix()#

Computes the transition matrix of the Monoecious Phase-type model.

Returns:

The transition matrix of the Monoecious Phase-type model. Each entry \((i,j)\) corresponds to the transition rate from state \(i\) to state \(j\).

Return type:

npt.ArrayLike

tract_length_histogram_multi_windowed(population_number, bins, chrom_lengths)#

Calculates the tract length histogram on multiple chromosomes of different lengths.

Parameters:
  • population_number (int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • bins (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – A point grid where the histogram has to be computed. The same grid is used for all chromosomes, and should be defined on the interval (0, max(chrom_lengths)).

  • chrom_lengths (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – A list of chromosome lengths.

Returns:

The histogram values on the intervals defined by bins, summed across all chromosomes.

Return type:

Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]

tractlength_histogram_windowed(population_number, bins, L, exp_Sx_per_bin=None, density=False, freq=False)#

Calculates the tractlength histogram or density function on a finite chromosome, using the Monoecious (M) admixture model.

Parameters:
  • population_number (int) – The index of the population of interest whose tract length distribution has to be computed. An integer from 0 to the number of populations - 1, corresponding to the column of the migration matrix.

  • bins (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – A point grid where the CDF or density have to be computed.

  • L (float) – The length of the finite chromosome.

  • exp_Sx_per_bin (Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]]) – The precomputed values of \(e^{Sx}\) for every \(x\) in bins. Used internally to speed up computation.

  • density (bool, default False) – If density is True, computes the PhT density values evaluated on the grid. Else, returns the histogram values on the grid.

  • freq (bool, default False) – If density is True, whether to return density on the frequency scale.

Return type:

tuple[Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]], Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[bool | int | float | complex | str | bytes]], float]

Returns:

  • npt.ArrayLike – If density is True, the corrected bins grid as described in Notes. Else, the bins introduced as input.

  • npt.ArrayLike – If density is True, the Phase-type density evaluated on the corrected bins grid. Returned on the frequency scale if freq = True. If density is False, the histogram values on the intervals defined by bins.

  • float – The tract length expectation of the corresponding model.