Phase-type models for tract length distributions#
In this tutorial, we illustrate how to use the functions in tracts to compute and depict tract length distributions under the different models of admixture introduced in the paper. These are, in increasing order of complexity:
The Monoecious (M) model (only defined for autosomal admixture),
The Dioecious-Coarse (DC) model,
The Dioecious-Fine (DF) model,
The hybrid-pedidree refinements of the DC (H-DC) and the DF (H-DF) models.
We start by importing the required python libraries:
[ ]:
import tracts.phase_type_distribution as PhT # The phase type distribution module, containing the functions to implement the admixture models.
import tracts.hybrid_pedigree as HP # The hybrid pedigree module, containing the functions to implement the hybrid pedigree refinements.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import matplotlib as mpl
mpl.rcParams.update(mpl.rcParamsDefault)
Throughout this notebook, we consider two source populations and a continuous pulse from the first to the second. This can be represented with the following migration matrix. The whichpop parameter sets the population of interest (0 or 1).
[13]:
whichpop = 1
mig_matrix = np.array([[0, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0, 1]])
In what follows, we compute Phase-Type densities and histograms under autosomal and X chromosome admixture, using all the models presented in tracts. We consider different levels of sex-bias, in both migration and recombination rates.
Note
The previous migration matrix represents a demographic model of 6 generations. This is a toy framework where all the presented models are computationally efficient and can be quickly computed and compared.
Important
For more realistic demographic models going up to 15–18 generations, the following models should be used depending on the context:
For autosomal admixture: Dioecious-Coarse (DC) or Monoecious (M) model.
For X chromosome admixture: Dioecious-Coarse (DC) model or its Hybrid-Pedigree refinement (H-DC). Typically, the number of pedigree generations should be set to
TP=2.
An example under both settings is presented at the end of the notebook.
1. Choose autosomal or X chromosome admixture#
[14]:
X_chr = False # Set to True for admixture on the X chromosome
X_chr_male = False # Set to True if admixture is considered on the X chromosome of a male individual (only maternally inherited alleles).
[15]:
which_L = 1.7928357829 # Second chromosome
#which_L = 1.96 # For the X chromosome
# Set a point grid on the finite chromosome
bins = np.linspace(0, which_L+0.1, num = 50)
2. Set sex-bias#
The following parameters set the sex-bias for migration and recombination rates. The user can modify these parameters to observe how differences appear between all the considered models.
[ ]:
prop_of_female_migration = 0.5 # Set 0.5 for unbiased migration, 1 for only female migrants, 0 for only male migrants.
mig_m = 2*mig_matrix*(1-prop_of_female_migration)
mig_f = 2*mig_matrix*prop_of_female_migration
mig_m[-1,:] = mig_matrix[-1,:]
mig_f[-1,:] = mig_matrix[-1,:]
rec_f = 1 # Female-specific recombination rate
rec_m = 1 # Male-specific recombination rate
3. Compute model parameters and distributions#
For the Monoecious (M), Dioecious Fine (DF), and Dioecious Coarse (DC) models, we first compute a PhTMonoecious or a PhTDioecious object, and then compute the corresponding phase-type density or histogram using the tractlength_histogram_windowed method. For the Pedigree-Dioecious models, densities and histograms are computed directly from the model parameters using the hybrid_pedigree_distribution function.
For comparison with the other settings, the Monoecious model is always computed using the mean migration matrix and the mean recombination rate.
3.1 Monoecious model (don’t run for X chromosome)#
[17]:
PTmodel_mono = PhT.PhTMonoecious(mig_matrix, rho = 0.5*(rec_f+rec_m)) # M object
# Density (set freq = True for frequency scale)
newbins, density_m, E = PTmodel_mono.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = True, freq = True)
# Histogram
bins, hist_m, E = PTmodel_mono.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = False, freq = False)
3.2 Dioecious Fine and Coarse models#
[ ]:
# DF and DC objects
PTmodel_DF = PhT.PhTDioecious(mig_f, mig_m, rho_f = rec_f, rho_m = rec_m, sex_model = 'DF', X_chromosome = X_chr, X_chromosome_male = X_chr_male)
PTmodel_DC = PhT.PhTDioecious(mig_f , mig_m, rho_f = rec_f, rho_m = rec_m, sex_model = 'DC', X_chromosome = X_chr, X_chromosome_male = X_chr_male)
# Densities (set freq = True for frequency scale)
newbins, density_df, E = PTmodel_DF.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = True, freq = True)
newbins, density_dc, E = PTmodel_DC.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = True, freq = True)
# Histograms
bins, hist_df, E = PTmodel_DF.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = False, freq = False)
bins, hist_dc, E = PTmodel_DC.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = False, freq = False)
3.3 Hybrid-pedigree refinements#
[19]:
# Phase-Type density for the Pedigree-DF model (set freq = True for frequency scale)
newbins, density_hp_df = HP.hybrid_pedigree_distribution(mig_matrix_f = mig_f, mig_matrix_m = mig_m, TP = 2, Dioecious_model = 'DF', L = which_L, bingrid = bins, whichpop = whichpop, rr_f = rec_f, rr_m = rec_m, X_chr = X_chr, X_chr_male = X_chr_male, N_cores = 5, density = True, freq = True)
# Phase-Type density for the Pedigree-DC model (set freq = True for frequency scale)
newbins, density_hp_dc = HP.hybrid_pedigree_distribution(mig_matrix_f = mig_f, mig_matrix_m = mig_m, TP = 2, Dioecious_model = 'DC', L = which_L, bingrid = bins, whichpop = whichpop, rr_f = rec_f, rr_m = rec_m, X_chr = X_chr, X_chr_male = X_chr_male, N_cores = 5, density = True, freq = True)
# Phase-Type histogram for the Pedigree-DF model
bins, hist_hp_df = HP.hybrid_pedigree_distribution(mig_matrix_f = mig_f, mig_matrix_m = mig_m, TP = 2, Dioecious_model = 'DF', L = which_L, bingrid = bins, whichpop = whichpop, rr_f = rec_f, rr_m = rec_m, X_chr = X_chr, X_chr_male = X_chr_male, N_cores = 5, density = False, freq = False)
# Phase-Type histogram for the Pedigree-DC model
bins, hist_hp_dc = HP.hybrid_pedigree_distribution(mig_matrix_f = mig_f, mig_matrix_m = mig_m, TP = 2, Dioecious_model = 'DC', L = which_L, bingrid = bins, whichpop = whichpop, rr_f = rec_f, rr_m = rec_m, X_chr = X_chr, X_chr_male = X_chr_male, N_cores = 5, density = False, freq = False)
-------------------------------------------------------------------
Computing pedigrees probabilities...
-------------------------------------------------------------------
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 7.3s remaining: 12.8s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 7.3s remaining: 6.1s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 7.4s remaining: 2.8s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 7.9s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.013139724731445312s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 0.0s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.04860877990722656s.) Setting batch_size=2.
-------------------------------------------------------------------
Done! Computing phase-type densities conditioned to each pedigree...
-------------------------------------------------------------------
9 densities to compute using the DF model.
[Parallel(n_jobs=5)]: Done 2 out of 9 | elapsed: 0.1s remaining: 0.5s
[Parallel(n_jobs=5)]: Done 3 out of 9 | elapsed: 0.8s remaining: 1.7s
[Parallel(n_jobs=5)]: Done 4 out of 9 | elapsed: 0.9s remaining: 1.2s
[Parallel(n_jobs=5)]: Done 5 out of 9 | elapsed: 0.9s remaining: 0.7s
[Parallel(n_jobs=5)]: Done 6 out of 9 | elapsed: 0.9s remaining: 0.5s
[Parallel(n_jobs=5)]: Done 7 out of 9 | elapsed: 1.0s remaining: 0.3s
[Parallel(n_jobs=5)]: Done 9 out of 9 | elapsed: 1.2s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.01572418212890625s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 0.0s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.01648736000061035s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 0.0s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.024530410766601562s.) Setting batch_size=2.
-------------------------------------------------------------------
Done!
-------------------------------------------------------------------
-------------------------------------------------------------------
Computing pedigrees probabilities...
-------------------------------------------------------------------
-------------------------------------------------------------------
Done! Computing phase-type densities conditioned to each pedigree...
-------------------------------------------------------------------
9 densities to compute using the DC model.
[Parallel(n_jobs=5)]: Done 2 out of 9 | elapsed: 0.2s remaining: 0.6s
[Parallel(n_jobs=5)]: Done 3 out of 9 | elapsed: 0.9s remaining: 1.7s
[Parallel(n_jobs=5)]: Done 4 out of 9 | elapsed: 0.9s remaining: 1.2s
[Parallel(n_jobs=5)]: Done 5 out of 9 | elapsed: 0.9s remaining: 0.7s
[Parallel(n_jobs=5)]: Done 6 out of 9 | elapsed: 1.0s remaining: 0.5s
[Parallel(n_jobs=5)]: Done 7 out of 9 | elapsed: 1.0s remaining: 0.3s
[Parallel(n_jobs=5)]: Done 9 out of 9 | elapsed: 1.1s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.018271446228027344s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 0.0s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.013619422912597656s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 0.0s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.017940282821655273s.) Setting batch_size=2.
-------------------------------------------------------------------
Done!
-------------------------------------------------------------------
-------------------------------------------------------------------
Computing pedigrees probabilities...
-------------------------------------------------------------------
-------------------------------------------------------------------
Done! Computing phase-type densities conditioned to each pedigree...
-------------------------------------------------------------------
9 densities to compute using the DF model.
[Parallel(n_jobs=5)]: Done 2 out of 9 | elapsed: 0.1s remaining: 0.4s
[Parallel(n_jobs=5)]: Done 3 out of 9 | elapsed: 0.6s remaining: 1.3s
[Parallel(n_jobs=5)]: Done 4 out of 9 | elapsed: 0.7s remaining: 0.9s
[Parallel(n_jobs=5)]: Done 5 out of 9 | elapsed: 0.7s remaining: 0.6s
[Parallel(n_jobs=5)]: Done 6 out of 9 | elapsed: 0.7s remaining: 0.4s
[Parallel(n_jobs=5)]: Done 7 out of 9 | elapsed: 0.8s remaining: 0.2s
/home/jgonzale/.local/lib/python3.10/site-packages/tracts/phase_type_distribution.py:171: ComplexWarning: Casting complex values to real discards the imaginary part
CDF_values[bin_number] = prop_connected * ((self.inner_CDF(bin_val, L, S, exp_Sx, alpha, S0_inv) +
[Parallel(n_jobs=5)]: Done 9 out of 9 | elapsed: 1.1s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.01439213752746582s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 0.0s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.015961170196533203s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 4 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 6 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 8 out of 11 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=5)]: Done 11 out of 11 | elapsed: 0.0s finished
[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Batch computation too fast (0.016501665115356445s.) Setting batch_size=2.
[Parallel(n_jobs=5)]: Done 2 out of 9 | elapsed: 0.1s remaining: 0.4s
-------------------------------------------------------------------
Done!
-------------------------------------------------------------------
-------------------------------------------------------------------
Computing pedigrees probabilities...
-------------------------------------------------------------------
-------------------------------------------------------------------
Done! Computing phase-type densities conditioned to each pedigree...
-------------------------------------------------------------------
9 densities to compute using the DC model.
[Parallel(n_jobs=5)]: Done 3 out of 9 | elapsed: 0.6s remaining: 1.1s
[Parallel(n_jobs=5)]: Done 4 out of 9 | elapsed: 0.6s remaining: 0.7s
[Parallel(n_jobs=5)]: Done 5 out of 9 | elapsed: 0.6s remaining: 0.5s
[Parallel(n_jobs=5)]: Done 6 out of 9 | elapsed: 0.6s remaining: 0.3s
[Parallel(n_jobs=5)]: Done 7 out of 9 | elapsed: 0.7s remaining: 0.2s
-------------------------------------------------------------------
Done!
-------------------------------------------------------------------
[Parallel(n_jobs=5)]: Done 9 out of 9 | elapsed: 0.8s finished
4. Plot tract length distributions#
[20]:
# Color palette and line styles
colors = {
'M' : 'gray',
'DC': 'green',
'DF': 'blue',
'P': 'orange',
'HP' : 'orange',
'HP_DC': 'orange',
}
line_styles = {
'M' : '-',
'DF': '-',
'DC': '-',
'P': '-',
'HP':'-',
'HP_DC':'-'
}
line_widths = {
'M' : 2,
'DF': 2,
'DC': 2,
'P': 2,
'HP':2,
'HP_DC':2
}
4.1 Plot densities#
4.1.1 Plot for autosomal admixture#
The following code produces a figure depicting all the computed Phase-Type densities for autosomal admixture. Run if X_chr = False.
[ ]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, axes = plt.subplots(1, 1, figsize=(5,4))
ax = axes
# Plot lines with improved color and line style
line1, = ax.plot(np.delete(newbins, whichbin), (density_df[~(np.asarray(newbins) == outbin)]), color=colors['DF'], linestyle=line_styles['DF'], linewidth = line_widths['DF'], label='Dioecious Fine (DF)', zorder = 1)
line2, = ax.plot(np.delete(newbins, whichbin), (density_dc[~(np.asarray(newbins) == outbin)]), color=colors['DC'], linestyle=line_styles['DC'], linewidth = line_widths['DC'], label='Dioecious Coarse (DC)', zorder = 1)
line3, = ax.plot(np.delete(newbins, whichbin), (density_m[~(np.asarray(newbins) == outbin)]), color=colors['M'], linestyle=line_styles['M'], linewidth = line_widths['M'], label='Monoecious (M)', zorder = 1)
line4, = ax.plot(np.delete(newbins, whichbin), density_hp_df[~(np.asarray(newbins) == outbin)], color=colors['HP'], linestyle=line_styles['HP'], linewidth = line_widths['HP'], label='Ped-DF (TP = 2)', zorder = 1)
line5, = ax.plot(np.delete(newbins, whichbin), density_hp_df[~(np.asarray(newbins) == outbin)], color=colors['DF'], linestyle=(2, (2, 2)), linewidth = line_widths['HP'], label='Ped-DF (TP = 2)', zorder = 1)
line6, = ax.plot(np.delete(newbins, whichbin), density_hp_dc[~(np.asarray(newbins) == outbin)], color=colors['HP_DC'], linestyle=line_styles['HP_DC'], linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)', zorder = 1)
line7, = ax.plot(np.delete(newbins, whichbin), density_hp_dc[~(np.asarray(newbins) == outbin)], color=colors['DC'], linestyle=(2, (2, 2)), linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)', zorder = 1)
# Customize x and y labels
ax.set_xlabel('Tract length on the second chromosome', fontsize=10)
ax.set_ylabel('Density', fontsize=10)
# Add outlier point to the plot
ax.scatter(outbin, (density_df[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['DF'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
ax.scatter(outbin, (density_dc[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['DC'], s=30, label='Outlier',marker='o', facecolor='white', alpha = 1, zorder = 3)
ax.scatter(outbin, (density_m[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['M'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
ax.scatter(outbin, (density_hp_df[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['HP'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
ax.scatter(outbin, (density_hp_dc[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['HP_DC'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
# Add outlier point to the plot
ax.scatter(outbin, density_df[np.where(newbins == outbin)], edgecolor=colors['DF'], s=30, label='Outlier', marker='o', facecolor=colors['DF'], zorder = 4)
ax.scatter(outbin, density_dc[np.where(newbins == outbin)], edgecolor=colors['DC'], s=30, label='Outlier',marker='o', facecolor=colors['DC'], zorder = 4)
ax.scatter(outbin, density_m[np.where(newbins == outbin)], edgecolor=colors['M'], s=30, label='Outlier', marker='o', facecolor=colors['M'], zorder = 4)
ax.scatter(outbin, density_hp_df[np.where(newbins == outbin)], edgecolor=colors['HP'], s=30, label='Outlier', marker='o', facecolor=colors['HP'], zorder = 4)
ax.scatter(outbin, density_hp_df[np.where(newbins == outbin)], edgecolor=colors['HP'], s=15, label='Outlier', marker='o', facecolor=colors['DF'], zorder = 4)
ax.scatter(outbin, density_hp_dc[np.where(newbins == outbin)], edgecolor=colors['HP_DC'], s=30, label='Outlier', marker='o', facecolor=colors['HP_DC'], zorder = 4)
ax.scatter(outbin, density_hp_dc[np.where(newbins == outbin)], edgecolor=colors['HP_DC'], s=15, label='Outlier', marker='o', facecolor=colors['DC'], zorder = 4)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
handles = []
labels = []
handles.extend([line1, line2, line3, (line4, line5), (line6, line7)])
labels.extend([line1.get_label(), line2.get_label(), line3.get_label(), line4.get_label(), line6.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
4.1.2 Plot for X chromosome admixture#
The following code produces a figure depicting all the computed Phase-Type densities for admixture on the X chromosome. Run if X_chr = True.
[22]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, axes = plt.subplots(1, 1, figsize=(5,4)) # 3x3 grid of subplots
ax = axes
# Plot lines with improved color and line style
line1, = ax.plot(np.delete(newbins, whichbin), (density_df[~(np.asarray(newbins) == outbin)]), color=colors['DF'], linestyle=line_styles['DF'], linewidth = line_widths['DF'], label='Dioecious Fine (DF)', zorder = 1)
line2, = ax.plot(np.delete(newbins, whichbin), (density_dc[~(np.asarray(newbins) == outbin)]), color=colors['DC'], linestyle=line_styles['DC'], linewidth = line_widths['DC'], label='Dioecious Coarse (DC)', zorder = 1)
line4, = ax.plot(np.delete(newbins, whichbin), density_hp_df[~(np.asarray(newbins) == outbin)], color=colors['HP'], linestyle=line_styles['HP'], linewidth = line_widths['HP'], label='Ped-DF (TP = 2)', zorder = 1)
line5, = ax.plot(np.delete(newbins, whichbin), density_hp_df[~(np.asarray(newbins) == outbin)], color=colors['DF'], linestyle=(2, (2, 2)), linewidth = line_widths['HP'], label='Ped-DF (TP = 2)', zorder = 1)
line6, = ax.plot(np.delete(newbins, whichbin), density_hp_dc[~(np.asarray(newbins) == outbin)], color=colors['HP_DC'], linestyle=line_styles['HP_DC'], linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)', zorder = 1)
line7, = ax.plot(np.delete(newbins, whichbin), density_hp_dc[~(np.asarray(newbins) == outbin)], color=colors['DC'], linestyle=(2, (2, 2)), linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)', zorder = 1)
# Customize x and y labels
ax.set_xlabel('Tract length on the X chromosome', fontsize=10)
ax.set_ylabel('Density', fontsize=10)
# Add outlier point to the plot
ax.scatter(outbin, (density_df[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['DF'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
ax.scatter(outbin, (density_dc[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['DC'], s=30, label='Outlier',marker='o', facecolor='white', alpha = 1, zorder = 3)
ax.scatter(outbin, (density_hp_df[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['HP'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
ax.scatter(outbin, (density_hp_dc[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['HP_DC'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
# Add outlier point to the plot
ax.scatter(outbin, density_df[np.where(newbins == outbin)], edgecolor=colors['DF'], s=30, label='Outlier', marker='o', facecolor=colors['DF'], zorder = 4)
ax.scatter(outbin, density_dc[np.where(newbins == outbin)], edgecolor=colors['DC'], s=30, label='Outlier',marker='o', facecolor=colors['DC'], zorder = 4)
ax.scatter(outbin, density_hp_df[np.where(newbins == outbin)], edgecolor=colors['HP'], s=30, label='Outlier', marker='o', facecolor=colors['HP'], zorder = 4)
ax.scatter(outbin, density_hp_df[np.where(newbins == outbin)], edgecolor=colors['HP'], s=15, label='Outlier', marker='o', facecolor=colors['DF'], zorder = 4)
ax.scatter(outbin, density_hp_dc[np.where(newbins == outbin)], edgecolor=colors['HP_DC'], s=30, label='Outlier', marker='o', facecolor=colors['HP_DC'], zorder = 4)
ax.scatter(outbin, density_hp_dc[np.where(newbins == outbin)], edgecolor=colors['HP_DC'], s=15, label='Outlier', marker='o', facecolor=colors['DC'], zorder = 4)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
handles = []
labels = []
handles.extend([line1, line2, (line4, line5), (line6, line7)])
labels.extend([line1.get_label(), line2.get_label(), line4.get_label(), line6.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
4.2 Plot histograms#
4.2.1 Plot for autosomal admixture#
The following code produces a figure depicting all the computed Phase-Type histograms for autosomal admixture. Run if X_chr = False.
[23]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, ax = plt.subplots(figsize=(5,4))
# Plot lines with improved color and line style
line1, = ax.step(bins[:-1], hist_df, color=colors['DF'], linestyle=line_styles['DF'], linewidth = line_widths['DF'], label='Dioecious Fine (DF)')
line2, = ax.step(bins[:-1], hist_dc, color=colors['DC'], linestyle=line_styles['DC'], linewidth = line_widths['DC'], label='Dioecious Coarse (DC)')
line3, = ax.step(bins[:-1], hist_m, color=colors['M'], linestyle=line_styles['M'], linewidth = line_widths['M'], label='Monoecious (M)')
line4, = ax.step(bins[:-1], hist_hp_df, color=colors['HP'], linestyle=line_styles['HP'], linewidth = line_widths['HP'], label='Ped-DF (TP = 2)')
line5, = ax.step(bins[:-1], hist_hp_df, color=colors['DF'], linestyle=(2, (2, 2)), linewidth = line_widths['HP'], label='Ped-DF (TP = 2)')
line6, = ax.step(bins[:-1], hist_hp_dc, color=colors['HP_DC'], linestyle=line_styles['HP_DC'], linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)')
line7, = ax.step(bins[:-1], hist_hp_dc, color=colors['DC'], linestyle=(2, (2, 2)), linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)')
# Customize x and y labels
ax.set_xlabel('Tract length on the second chromosome', fontsize=10)
ax.set_ylabel('Expected number of tracts per interval', fontsize=10)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
handles = []
labels = []
handles.extend([line1, line2, line3, (line4, line5), (line6, line7)])
labels.extend([line1.get_label(), line2.get_label(), line3.get_label(), line4.get_label(), line6.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
4.2.2 Plot for X chromosome admixture#
The following code produces a figure depicting all the computed Phase-Type histograms for admixture on the X chromosome. Run if X_chr = True.
[24]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, ax = plt.subplots(figsize=(5,4))
# Plot lines with improved color and line style
line1, = ax.step(bins[:-1], hist_df, color=colors['DF'], linestyle=line_styles['DF'], linewidth = line_widths['DF'], label='Dioecious Fine (DF)')
line2, = ax.step(bins[:-1], hist_dc, color=colors['DC'], linestyle=line_styles['DC'], linewidth = line_widths['DC'], label='Dioecious Coarse (DC)')
line4, = ax.step(bins[:-1], hist_hp_df, color=colors['HP'], linestyle=line_styles['HP'], linewidth = line_widths['HP'], label='Ped-DF (TP = 2)')
line5, = ax.step(bins[:-1], hist_hp_df, color=colors['DF'], linestyle=(2, (2, 2)), linewidth = line_widths['HP'], label='Ped-DF (TP = 2)')
line6, = ax.step(bins[:-1], hist_hp_dc, color=colors['HP_DC'], linestyle=line_styles['HP_DC'], linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)')
line7, = ax.step(bins[:-1], hist_hp_dc, color=colors['DC'], linestyle=(2, (2, 2)), linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)')
# Customize x and y labels
ax.set_xlabel('Tract length on the X chromosome', fontsize=10)
ax.set_ylabel('Expected number of tracts per interval', fontsize=10)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
handles = []
labels = []
handles.extend([line1, line2, (line4, line5), (line6, line7)])
labels.extend([line1.get_label(), line2.get_label(), line4.get_label(), line6.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
5. Tract length distribution for a continuous pulse during 15 generations#
We consider the following migration matrix, and let the user specify the sex-bias parameters as in the previous example. Once again, whichpop sets the population of interest (0 or 1).
[25]:
whichpop = 1
mig_matrix = np.array([[0, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0.2, 0],[0, 1]])
print(np.shape(mig_matrix)[0]-1) # Number of generations
15
[26]:
prop_of_female_migration = 0.5 # 0.5 for unbiased migration, 1 for only female migrants, 0 for only male migrants.
mig_m = 2*mig_matrix*(1-prop_of_female_migration)
mig_f = 2*mig_matrix*prop_of_female_migration
mig_m[-1,:] = mig_matrix[-1,:]
mig_f[-1,:] = mig_matrix[-1,:]
rec_f = 1 # Female-specific recombination rate
rec_m = 1 # Male-specific recombination rate
5.1 Monoecious model for autosomal admixture#
[27]:
PTmodel_mono = PhT.PhTMonoecious(mig_matrix, rho = 0.5*(rec_f+rec_m)) # M object
# Density (set freq = True for frequency scale)
newbins, density_m, E = PTmodel_mono.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = True, freq = True)
# Histogram
bins, hist_m, E = PTmodel_mono.tractlength_histogram_windowed(whichpop, bins, L = which_L, density = False, freq = False)
5.1.1 Plot density#
[ ]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, axes = plt.subplots(1, 1, figsize=(5,4))
ax = axes
# Plot lines with improved color and line style
line3, = ax.plot(np.delete(newbins, whichbin), (density_m[~(np.asarray(newbins) == outbin)]), color=colors['M'], linestyle=line_styles['M'], linewidth = line_widths['M'], label='Monoecious (M)', zorder = 1)
# Customize x and y labels
ax.set_xlabel('Tract length on the second chromosome', fontsize=10)
ax.set_ylabel('Density', fontsize=10)
# Add outlier point to the plot
ax.scatter(outbin, (density_m[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['M'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
# Add outlier point to the plot
ax.scatter(outbin, density_m[np.where(newbins == outbin)], edgecolor=colors['M'], s=30, label='Outlier', marker='o', facecolor=colors['M'], zorder = 4)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
handles = []
labels = []
handles.extend([line3])
labels.extend([line3.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
5.1.2 Plot histogram#
[29]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, ax = plt.subplots(figsize=(5,4))
# Plot lines with improved color and line style
line3, = ax.step(bins[:-1], hist_m, color=colors['M'], linestyle=line_styles['M'], linewidth = line_widths['M'], label='Monoecious (M)')
# Customize x and y labels
ax.set_xlabel('Tract length on the second chromosome', fontsize=10)
ax.set_ylabel('Expected number of tracts per interval', fontsize=10)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
handles = []
labels = []
handles.extend([line3])
labels.extend([line3.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
5.2 Pedigree-DC model for X chromosome admixture#
[ ]:
# Uncomment the following lines to run the hybrid pedigree distribution for the DF and DC models. Note that these can take a while to run, especially for the DC model, depending on the number of cores used and the size of the grid.
# Phase-Type density for the Pedigree-DC model (set freq = True for frequency scale)
#newbins, density_hp_dc = HP.hybrid_pedigree_distribution(mig_matrix_f = mig_f, mig_matrix_m = mig_m, TP = 2, Dioecious_model = 'DC', L = which_L, bingrid = bins, whichpop = whichpop, rr_f = rec_f, rr_m = rec_m, X_chr = True, X_chr_male = False, N_cores = 5, density = True, freq = True)
# Phase-Type histogram for the Pedigree-DC model
#bins, hist_hp_dc = HP.hybrid_pedigree_distribution(mig_matrix_f = mig_f, mig_matrix_m = mig_m, TP = 2, Dioecious_model = 'DC', L = which_L, bingrid = bins, whichpop = whichpop, rr_f = rec_f, rr_m = rec_m, X_chr = True, X_chr_male = False, N_cores = 5, density = False, freq = False)
5.2.1 Plot density#
[ ]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, axes = plt.subplots(1, 1, figsize=(5,4)) # 3x3 grid of subplots
ax = axes
# Plot lines with improved color and line style
line6, = ax.plot(np.delete(newbins, whichbin), density_hp_dc[~(np.asarray(newbins) == outbin)], color=colors['HP_DC'], linestyle=line_styles['HP_DC'], linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)', zorder = 1)
line7, = ax.plot(np.delete(newbins, whichbin), density_hp_dc[~(np.asarray(newbins) == outbin)], color=colors['DC'], linestyle=(2, (2, 2)), linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)', zorder = 1)
# Customize x and y labels
ax.set_xlabel('Tract length on the X chromosome', fontsize=10)
ax.set_ylabel('Density', fontsize=10)
# Add outlier point to the plot
ax.scatter(outbin, (density_hp_dc[np.where(newbins == outbin)[0] - 1]), edgecolor=colors['HP_DC'], s=30, label='Outlier', marker='o', facecolor='white', alpha = 1, zorder = 3)
# Add outlier point to the plot
ax.scatter(outbin, density_hp_dc[np.where(newbins == outbin)], edgecolor=colors['HP_DC'], s=30, label='Outlier', marker='o', facecolor=colors['HP_DC'], zorder = 4)
ax.scatter(outbin, density_hp_dc[np.where(newbins == outbin)], edgecolor=colors['HP_DC'], s=15, label='Outlier', marker='o', facecolor=colors['DC'], zorder = 4)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
handles = []
labels = []
handles.extend([(line6, line7)])
labels.extend([line6.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
5.2.2 Plot histogram#
[ ]:
outbin = np.max(np.asarray(newbins)[np.asarray(newbins) == which_L])
whichbin = int(np.where(np.asarray(newbins) == outbin)[0][0])
fig, ax = plt.subplots(figsize=(5,4))
# Plot lines with improved color and line style
line6, = ax.step(bins[:-1], hist_hp_dc, color=colors['HP_DC'], linestyle=line_styles['HP_DC'], linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)')
line7, = ax.step(bins[:-1], hist_hp_dc, color=colors['DC'], linestyle=(2, (2, 2)), linewidth = line_widths['HP_DC'], label='Ped-DC (TP = 2)')
# Customize x and y labels
ax.set_xlabel('Tract length on the X chromosome', fontsize=10)
ax.set_ylabel('Expected number of tracts per interval', fontsize=10)
# Add gridlines
ax.grid(True, linestyle='--', alpha=0.6)
# Set limits for x-axis and y-axis if necessary
ax.set_xlim(0, which_L+0.1)
# Customize the ticks
ax.tick_params(axis='both', which='major', labelsize=10)
handles = []
labels = []
handles.extend([(line6, line7)])
labels.extend([line6.get_label()])
fig.legend(handles=handles, labels=labels, loc='lower center', bbox_to_anchor=(0.55, -0.2), ncol=2, fontsize=10)
# Adjust spacing between subplots for better layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()