Core#
API#
- core.Z2(L, pop)#
the normalizing factor, to ensure that the tract density is 1.
- core.choose_model(migration_matrix, use_PTD=False)#
- core.inner_PDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)#
Calculate the CDF of tractlengths on a window L S is the transition submatrix Z is the normalization factor Accepts precomputed values for e^Sx, e^SL, and Z
- core.optimize(p0, bins, Ls, data, nsamp, model_func, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=0.5, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1)#
Optimize params to fit model to data using the BFGS method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
- p0:
Initial parameters.
- data:
Spectrum with data.
- model_function:
Function to evaluate model spectrum. Should take arguments (params, pts)
- out_of_bounds_fun:
A funtion evaluating to True if the current parameters are in a forbidden region.
- cutoff:
the number of bins to drop at the beginning of the array. This could be achieved with masks.
- verbose:
If greater than zero, print optimization status every <verbose> steps.
- flush_delay:
Standard output will be flushed once every <flush_delay> minutes. This is useful to avoid overloading I/O on clusters.
- epsilon:
Step-size to use for finite-difference derivatives.
- gtol:
- Convergence criterion for optimization. For more info, see
help(scipy.optimize.fmin_bfgs)
- maxiter:
Maximum iterations to run for.
- full_output:
If True, return full outputs as described in help. (scipy.optimize.fmin_bfgs)
- func_args:
List of additional arguments to model_func. It is assumed that model_func’s first argument is an array of parameters to optimize.
- fixed_params:
(Not yet implemented) If not None, should be a list used to fix model parameters at particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale:
The bfgs algorithm may fail if your initial log-likelihood is too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- core.optimize_bfgs(p0, bins, Ls, data, nsamp, model_func, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=0.5, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1)#
Optimize params to fit model to data using the BFGS method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
- p0:
Initial parameters.
- data:
Spectrum with data.
- model_function:
Function to evaluate model spectrum. Should take arguments (params, pts)
- out_of_bounds_fun:
A funtion evaluating to True if the current parameters are in a forbidden region.
- cutoff:
the number of bins to drop at the beginning of the array. This could be achieved with masks.
- verbose:
If greater than zero, print optimization status every <verbose> steps.
- flush_delay:
Standard output will be flushed once every <flush_delay> minutes. This is useful to avoid overloading I/O on clusters.
- epsilon:
Step-size to use for finite-difference derivatives.
- gtol:
- Convergence criterion for optimization. For more info, see
help(scipy.optimize.fmin_bfgs)
- maxiter:
Maximum iterations to run for.
- full_output:
If True, return full outputs as described in help. (scipy.optimize.fmin_bfgs)
- func_args:
List of additional arguments to model_func. It is assumed that model_func’s first argument is an array of parameters to optimize.
- fixed_params:
(Not yet implemented) If not None, should be a list used to fix model parameters at particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale:
The bfgs algorithm may fail if your initial log-likelihood is too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- core.optimize_brute_fracs2(bins, Ls, data, nsamp, model_func, fracs, searchvalues, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=1, full_output=True, func_args=None, fixed_params=None, ll_scale=1)#
Optimize params to fit model to data using the brute force method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
p0: Initial parameters. data: Spectrum with data. model_function: Function to evaluate model spectrum. Should take arguments
(params, pts)
out_of_bounds_fun: A funtion evaluating to True if the current parameters are in a forbidden region. cutoff: the number of bins to drop at the beginning of the array. This could be achieved with masks.
verbose: If > 0, print optimization status every <verbose> steps. flush_delay: Standard output will be flushed once every <flush_delay>
minutes. This is useful to avoid overloading I/O on clusters.
epsilon: Step-size to use for finite-difference derivatives. gtol: Convergence criterion for optimization. For more info,
see help(scipy.optimize.fmin_bfgs)
- full_output: If True, return full outputs as in described in
help(scipy.optimize.fmin_bfgs)
- func_args: Additional arguments to model_func. It is assumed that
model_func’s first argument is an array of parameters to optimize, that its second argument is an array of sample sizes for the sfs, and that its last argument is the list of grid points to use in evaluation.
- fixed_params: If not None, should be a list used to fix model parameters at
particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale: The bfgs algorithm may fail if your initial log-likelihood is
too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- core.optimize_brute_multifracs(bins, Ls, data_list, nsamp_list, model_func, fracs_list, searchvalues, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=1, full_output=True, func_args=None, fixed_params=None, ll_scale=1)#
Optimize params to fit model to data using the brute force method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
p0: Initial parameters. data: Spectrum with data. model_function: Function to evaluate model spectrum. Should take arguments
(params, pts)
out_of_bounds_fun: A funtion evaluating to True if the current parameters are in a forbidden region. cutoff: the number of bins to drop at the beginning of the array. This could be achieved with masks.
verbose: If > 0, print optimization status every <verbose> steps. flush_delay: Standard output will be flushed once every <flush_delay>
minutes. This is useful to avoid overloading I/O on clusters.
epsilon: Step-size to use for finite-difference derivatives. gtol: Convergence criterion for optimization. For more info,
see help(scipy.optimize.fmin_bfgs)
- full_output: If True, return full outputs as in described in
help(scipy.optimize.fmin_bfgs)
- func_args: Additional arguments to model_func. It is assumed that
model_func’s first argument is an array of parameters to optimize, that its second argument is an array of sample sizes for the sfs, and that its last argument is the list of grid points to use in evaluation.
- fixed_params: If not None, should be a list used to fix model parameters at
particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale: The bfgs algorithm may fail if your initial log-likelihood is
too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- core.optimize_cob(p0, bins, Ls, data, nsamp, model_func, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=1, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1, reset_counter=True, modelling_method=<class 'tracts.demography.demographic_model.DemographicModel'>)#
Optimize params to fit model to data using the cobyla method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
- p0:
Initial parameters.
- data:
Spectrum with data.
- model_function:
Function to evaluate model spectrum. Should take arguments (params, pts)
- out_of_bounds_fun:
A funtion evaluating to True if the current parameters are in a forbidden region.
- cutoff:
the number of bins to drop at the beginning of the array. This could be achieved with masks.
- verbose:
If > 0, print optimization status every <verbose> steps.
- flush_delay:
Standard output will be flushed once every <flush_delay> minutes. This is useful to avoid overloading I/O on clusters.
- epsilon:
Step-size to use for finite-difference derivatives.
- gtol:
- Convergence criterion for optimization. For more info, see
help(scipy.optimize.fmin_bfgs)
- maxiter:
Maximum iterations to run for.
- full_output:
If True, return full outputs as in described in help(scipy.optimize.fmin_bfgs)
- func_args:
Additional arguments to model_func. It is assumed that model_func’s first argument is an array of parameters to optimize.
- fixed_params:
If not None, should be a list used to fix model parameters at particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale:
The bfgs algorithm may fail if your initial log-likelihood is too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- reset_counter:
Defaults to true, resets the iteration counter to zero. Set to False to continue iteration count (e.g., if optimization continues from previous point)
- core.optimize_cob_fracs(p0, bins, Ls, data, nsamp, model_func, fracs, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=1, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1)#
Optimize params to fit model to data using the COBYLA method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
p0: Initial parameters. data: Spectrum with data. model_function: Function to evaluate model spectrum. Should take arguments
(params, pts)
out_of_bounds_fun: A funtion evaluating to True if the current parameters are in a forbidden region. cutoff: the number of bins to drop at the beginning of the array. This could be achieved with masks.
verbose: If > 0, print optimization status every <verbose> steps. flush_delay: Standard output will be flushed once every <flush_delay>
minutes. This is useful to avoid overloading I/O on clusters.
epsilon: Step-size to use for finite-difference derivatives. gtol: Convergence criterion for optimization. For more info,
see help(scipy.optimize.fmin_bfgs)
maxiter: Maximum iterations to run for. full_output: If True, return full outputs as in described in
help(scipy.optimize.fmin_bfgs)
- func_args: Additional arguments to model_func. It is assumed that
model_func’s first argument is an array of parameters to optimize, that its second argument is an array of sample sizes for the sfs, and that its last argument is the list of grid points to use in evaluation.
- fixed_params: If not None, should be a list used to fix model parameters at
particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale: The bfgs algorithm may fail if your initial log-likelihood is
too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- core.optimize_cob_fracs2(p0, bins, Ls, data, nsamp, model_func, fracs, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=1, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1, reset_counter=True)#
Optimize params to fit model to data using the cobyla method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
p0: Initial parameters. data: Spectrum with data. model_function: Function to evaluate model spectrum. Should take arguments
(params, pts)
out_of_bounds_fun: A funtion evaluating to True if the current parameters are in a forbidden region. cutoff: the number of bins to drop at the beginning of the array. This could be achieved with masks.
verbose: If > 0, print optimization status every <verbose> steps. flush_delay: Standard output will be flushed once every <flush_delay>
minutes. This is useful to avoid overloading I/O on clusters.
epsilon: Step-size to use for finite-difference derivatives. gtol: Convergence criterion for optimization. For more info,
see help(scipy.optimize.fmin_bfgs)
maxiter: Maximum iterations to run for. full_output: If True, return full outputs as in described in
help(scipy.optimize.fmin_bfgs)
- func_args: Additional arguments to model_func. It is assumed that
model_func’s first argument is an array of parameters to optimize, that its second argument is an array of sample sizes for the sfs, and that its last argument is the list of grid points to use in evaluation.
- fixed_params: If not None, should be a list used to fix model parameters at
particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale: The bfgs algorithm may fail if your initial log-likelihood is
too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- reset_counter:
Defaults to true, resets the iteration counter to zero. Set to False to continue iteration count (e.g., if optimization continues from previous point)
- core.optimize_cob_multifracs(p0, bins, Ls, data_list, nsamp_list, model_func, fracs_list, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=1, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1)#
Optimize params to fit model to data using the cobyla method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
p0: Initial parameters. data: Spectrum with data. model_function: Function to evaluate model spectrum. Should take arguments
(params, pts)
out_of_bounds_fun: A funtion evaluating to True if the current parameters are in a forbidden region. cutoff: the number of bins to drop at the beginning of the array. This could be achieved with masks.
verbose: If > 0, print optimization status every <verbose> steps. flush_delay: Standard output will be flushed once every <flush_delay>
minutes. This is useful to avoid overloading I/O on clusters.
epsilon: Step-size to use for finite-difference derivatives. gtol: Convergence criterion for optimization. For more info,
see help(scipy.optimize.fmin_bfgs)
maxiter: Maximum iterations to run for. full_output: If True, return full outputs as in described in
help(scipy.optimize.fmin_bfgs)
- func_args: Additional arguments to model_func. It is assumed that
model_func’s first argument is an array of parameters to optimize, that its second argument is an array of sample sizes for the sfs, and that its last argument is the list of grid points to use in evaluation.
- fixed_params: If not None, should be a list used to fix model parameters at
particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale: The bfgs algorithm may fail if your initial log-likelihood is
too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- core.optimize_slsqp(p0, bins, Ls, data, nsamp, model_func, outofbounds_fun=None, cutoff=0, bounds=None, verbose=0, flush_delay=1, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1, reset_counter=True)#
Optimize params to fit model to data using the slsq method.
This optimization method works well when we start reasonably close to the optimum. It is best at burrowing down a single minimum.
It should also perform better when parameters range over scales.
- p0:
Initial parameters.
- data:
Spectrum with data.
- model_function:
Function to evaluate model spectrum. Should take arguments (params, pts)
- out_of_bounds_fun:
A funtion evaluating to True if the current parameters are in a forbidden region.
- cutoff:
the number of bins to drop at the beginning of the array. This could be achieved with masks.
- verbose:
If > 0, print optimization status every <verbose> steps.
- flush_delay:
Standard output will be flushed once every <flush_delay> minutes. This is useful to avoid overloading I/O on clusters.
- epsilon:
Step-size to use for finite-difference derivatives.
- gtol:
- Convergence criterion for optimization. For more info, see
help(scipy.optimize.fmin_bfgs)
- maxiter:
Maximum iterations to run for.
- full_output:
If True, return full outputs as in described in help(scipy.optimize.fmin_bfgs)
- func_args:
List of additional arguments to model_func. It is assumed that model_func’s first argument is an array of parameters to optimize.
- fixed_params:
If not None, should be a list used to fix model parameters at particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
- ll_scale:
The bfgs algorithm may fail if your initial log-likelihood is too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.
- reset_counter:
Defaults to true, resets the iteration counter to zero. Set to False to continue iteration count (e.g., if optimization continues from previous point)
- core.outer_PDF(x, L, S, exp_Sx=None, alpha=None, S0_inv=None)#
Calculate the length distribution of tract lengths hitting a single chromosome edge.
- core.plotmig(mig, colordict=None, order=None)#
- core.test_model_func(model_func, parameters, fracs_list=None, time_params=True, time_scale=100)#
Given a demographic model function, run a few debugging tests to ensure that it behaves as expected, namely: 1-That migration matrices sum to less than one (exactly one for the last generation) 2-That it behaves continuously relative to time parameters.
model_func: a migration model. It takes in parameters and outputs a migration matrix. parameters: parameters for which the model will be tested. fracs_list: parameters required by some demographic models corresponding to the observed proportion of ancestry from each source population time_params: if True, test all parameters for continuity as if they were time parameters.
if a list of boolean values of the same length of parameters, only test parameters corresponding to True values.
time_scale: the scaling of the time variables: time (in generations) = time_parameter*time_scale. This is used to test continuity around integer values. returns violation score (negative means that a violation has occurred) and the migration matrix value as well