tracts.legacy

Functions

`eprint`(args, *kwargs)
`optimize`(p0, bins, Ls, data, nsamp, model_func)	Optimizes parameters to fit model to data using the BFGS method.
`optimize_bfgs`(p0, bins, Ls, data, nsamp, ...)	Optimizes parameters to fit model to data using the BFGS method.
`optimize_brute_fracs2`(bins, Ls, data, nsamp, ...)	Optimizes params to fit model to data using the brute force method.
`optimize_brute_multifracs`(bins, Ls, ...[, ...])	Optimizes params to fit model to data using the brute force method.
`optimize_cob`(p0, bins, Ls, data, nsamp, ...)	Optimizes params to fit model to data using the cobyla method.
`optimize_cob_fracs`(p0, bins, Ls, data, ...)	Optimizes params to fit model to data using the COBYLA method.
`optimize_cob_fracs2`(p0, bins, Ls, data, ...)	Optimizes params to fit model to data using the cobyla method.
`optimize_cob_multifracs`(p0, bins, Ls, ...[, ...])	Optimizes params to fit model to data using the cobyla method.
`optimize_slsqp`(p0, bins, Ls, data, nsamp, ...)	Optimizes params to fit model to data using the slsq method.
`plotmig`(mig[, colordict, order])
`test_model_func`(model_func, parameters[, ...])	Given a demographic model function, run a few debugging tests to ensure that it behaves as expected, namely: (i) that migration matrices sum to less than one (exactly one for the last generation, (ii) that it behaves continuously realtive to time parameters.

Classes

`Chrom`([ls, auto, label, tracts])	A chromosome wraps a list of tracts, which form a paritition on it.
`Chropair`([chroms, len, auto, label])	A pair of chromosomes.
`CompositeDemographicModel`(model_function, ...)	The class of demographic models that account for variance in the number of ancestors of individuals of the underlying population.
`DemographicModel`(mig[, ...])
`Haploid`([Ls, lschroms, fname, selectchrom, ...])
`Indiv`([Ls, label, fname, labs, selectchrom, ...])	The class of diploid individuals.
`Population`([list_indivs, names, fname, ...])
`Tract`(start, end, label[, bpstart, bpend])	A tract is the lower-level object of interest.

class tracts.legacy.Chrom(ls=None, auto=True, label='POP', tracts=None)

Bases: object

A chromosome wraps a list of tracts, which form a paritition on it. The chromosome has a finite, immutable length.

__init__(ls=None, auto=True, label='POP', tracts=None)

Constructor.

Parameters:

ls (int, default: None) – The length of this chromosome, in Morgans.
auto (bool, default: True) – Whether this chromosome is autosomal.
label (string, default: "POP") – An identifier categorizing this chromosome.
tracts (list of tract objects, default: None) – The list of tracts that span this chromosome. If None is given, the single, unlabeled tract is created to span the whole chromosome, according to the length len.

extract(start, end)

Extracts a segment from the chromosome.

Parameters:

start (int) – The starting point of the desired segment to extract.
end (int) – The ending point of the desired segment to extract.

Returns:

A list of tract objects that span the desired interval.

Notes

Uses the goto method of this class to identify the starting and ending points of the segment, so if those positions are invalid, goto will raise a ValueError.

goto(pos): Finds the first tract containing a given position, in Morgans, and return its index in the underlying list.

init_list_tracts(tracts)

init_unif_tracts(label)

len(): The length of this chromosome, in Morgans.

merge_ancestries(ancestries, newlabel)

Merges segments that are contiguous and of either the same ancestry, or that are labelled as in a given list.

The label of each tract in the chromosome’s inner list is checked against the labels listed in ancestries. If there is a match, then that tract is relabelled to newlabel. This batch relabelling allows us to consider several technically different ancestries as being the same, by relabelling them to actually be the same. Then, the resulting list is smoothed, to combine adjacent tracts whose labels are the same. This new list replaces the tracts list.

Parameters:

ancestries (list of strings) – The ancestries to merge.
newlabel (string) – The identifier for the new ancesty to assign to the matching tracts.

Returns:

Nothing.

plot(canvas, colordict, height=0, chrwidth=0.1)

set_sex(): Considers this chromosome to be a sex chromosome, in which case it is not autosomal. The effect of this method is to set the auto property to False.

tractlengths()

Gets the list of tract lengths. Make sure that proper smoothing is implemented.

Returns:: A tuple with ancestry, length of block, and length of chromosome.

class tracts.legacy.Chropair(chroms=None, len=1, auto=True, label='POP')

Bases: object

A pair of chromosomes. Using pairs of chromosomes allows to model diploid individuals.

__init__(chroms=None, len=1, auto=True, label='POP'): Can instantiate by explictly providing two chromosomes as a tuple or an ancestry label, length and autosome status.

applychrom(func): Applies func to chromosomes.

plot(canvas, colordict, height=0)

recombine()

class tracts.legacy.CompositeDemographicModel(model_function, parameters, proportions_list)

Bases: object

The class of demographic models that account for variance in the number of ancestors of individuals of the underlying population.

Specifically, this is the demographic model constructed by the “multifracs” family of optimization routines.

The expected tract counts per bin in the composite demographic model is simply a component-wise sum of the expected tract counts per bin across the component demographic models.

The log-likelihood of the composite demographic model is the computed based on the combined expected tract counts per bin.

__init__(model_function, parameters, proportions_list)

Construct a composite demographic model, in which we consider split groups of individuals.

Parameters:

model_function (callable) – A function that produces a migration matrix given some model parameters and fixed ancestry proportions.
parameters – The parameters given to the model function when the component demographic models are built.
proportions_list – The lists of ancestry proportions used to construc each component demographic model.

expectperbin(Ls, pop, bins, nsamp_list=None): A wrapper for demographic_model.expectperbin that yields a component-wise sum of the counts per bin in the underlying demographic models. Since the counts given by the demographic_model.expectperbin are normalized, performing a simple sum of the counts is not particularly meaningful; it throws away some of the structure that we have gained by using a composite model. Hence, the nsamp_list parameter allows for specifying the count of individuals in each of the groups represented by this composite_demographic_model, which is then used to rescale the counts reported by the expectperbin of the component demographic models.

loglik(bins, Ls, data_list, nsamp_list, cutoff=0)

Evaluates the log-likelihood of the composite demographic model.

To compute the log-likelihood, we combine the expected count of tracts per bin in each of the component demographic models into the composite expected counts per bin. The expected counts per bin are compared with the sum across subgroups of the actual counts per bin. This gives a likelihood that is directly comparable with the likelihoods of the component demographic models.

See demographic_model.loglik for more information about the specifics of the log-likelihood calculation.

migs(): Gets the list of migration matrices of the component demographic models. This method merely projects the mig attribute from the component models.

class tracts.legacy.DemographicModel(mig, max_remaining_tracts=1e-05, max_morgans=100)

Bases: object

Erlang(i, x, T)

Z(L, pop): The normalizing factor, to ensure that the tract density integrates to 1.

__init__(mig, max_remaining_tracts=1e-05, max_morgans=100)

The migratory model.

Parameters:

mig (np.ndarray) – An array containing the migration proportions from a discrete number of populations over the last generations. Each row is a time, each column is a population. row zero corresponds to the current generation. The migration rate at the last generation (mig[-1,:]) is the “founding generation” and should sum up to 1. Assume that non-admixed individuals have been removed.
max_remaining_tracts (float, default: 1e-5) – The proportion of tracts that are allowed to be incomplete after cutoff Lambda (See Appendix 2 in Gravel: doi: 10.1534/genetics.112.139808).
max_morgans (float, default: 100) – This parameter is used to impose a cutoff to the number of Markov transitions. If the simulated morgan lengths of tracts in an infinite genome is more than max_morgans, the function issues a warning and stop generating new transitions.

expectperbin(Ls, pop, bins): The expected number of tracts per bin for a diploid individual with distribution of chromosome lengths given by Ls. The bin should be a list with n+1 breakpoints for n bins. We will always add an extra value for the full chromosomes as an extra bin at the end. The last bin should not go beyond the end of the longest chromosome. For now, the function performs poor man’s integral by using the bin midpoint value times width.

full(L, pop): The expected fraction of full-chromosome tracts, p. 63 May 24, 2011.

gen_variance(popnum)

Calculates the expected genealogy variance in the model.
Calculates the e(d) (Equation 3 in MOLA (Models of Local ancestry) paper).
Generations go from 0 to self.ngen-1.

inners(L, x, pop): Calculates the length distribution of tract lengths not hitting a chromosome edge.

loglik(bins, Ls, data, nsamp, cutoff=0): Calculates the maximum-likelihood in a Poisson Random Field. Last bin of data is the number of whole-chromosome.

loglik_biascorrect(bins, Ls, data, nsamp, cutoff=0, biascorrect=True): Calculates the maximum-likelihood in a Poisson Random Field. Last bin of data is the number of whole-chromosome. Compares the model to the first bins, and simulates the addition (or removal) of the corresponding tracts.

outers(L, x, pop): Calculates the length distribution of tract lengths hitting a single chromosome edge.

plot_model_data(Ls, bins, data, nsamp, pop, colordict)

popNdist(pop): Calculates the distribution of number of steps before exiting population.

random_realization(Ls, bins, nind)

switchdensity(): Calculates the density of ancestry switchpoints per morgan in our model.

uniformizemat(): Uniformize the transition matrix so that each state has the same total transition rate.

class tracts.legacy.Haploid(Ls=None, lschroms=None, fname=None, selectchrom=None, labs=None, name=None)

Bases: object

__init__(Ls=None, lschroms=None, fname=None, selectchrom=None, labs=None, name=None)

static from_file(path, name=None, selectchrom=None)

class tracts.legacy.Indiv(Ls=None, label='POP', fname=None, labs=('_A', '_B'), selectchrom=None, chroms=None, name=None)

Bases: object

The class of diploid individuals. An individual can hence be though of as a list of pairs of chromosomes. Equivalently, a diploid individual is a pair of haploid individuals.

Thus, it is possible to construct instances of this class from a pair of instances of the haploid class, as well as directly from a sequence of chropair instances.

The interface for loading individuals from files uses the haploid-oriented approach, since individual .bed files describe only one haplotype. The loading process is thus the following:

Load haploid individuals for each haplotype,

Combine the haploid individuals into a diploid individual.

__init__(Ls=None, label='POP', fname=None, labs=('_A', '_B'), selectchrom=None, chroms=None, name=None)

Constructs a diploid individual. There are several ways to build individuals, either from files, from existing data, or programmatically.

The most straightforward way to build an individual is from existing data, by supplying only the “Ls” and “chroms” arguments.

Parameters:

Ls (list of floats, default: None) – The lengths of the chromosomes in the order in which they appear in “chroms”.
objects (chroms list of chropair) – The chromosome pairs that make up this individual. See the documentation for “chropair”.
default (None) – The chromosome pairs that make up this individual. See the documentation for “chropair”.
label (string, default: "POP") – The label to use for building single-tract chromosomes when no other data is given to buid this individual.
fname (2-tuple of strings, default: None) – Paths are generated by concatenating the first component of fname, each label from labs in turn, and the second component of fname.
labs (2-tuple of strings, default: ("_A", "_B")) – The labels used to identify maternal and paternal haplotypes in the paths leading to .bed files.
selectchrom (list of integers, default: None) – This argument is forwarded as-is to haploid.from_file. It acts as a filter on the chromosomes to load. The default value of “None” selects all chromosomes.
name (string, default: None) – An identifier for this individual.

Notes

If Ls is given, but chroms is not, then chromosomes consisting each of a single tract will be created with the label label and lengths drawn from Ls.

(deprecated) If the fname argument is given, the constructor will perform path manipulation involving the components of fname and labs to generate file names that are commonly used when dealing with .bed files.

The facilities in this constructor for loading individuals from files are deprecated. It is recommended to instead use the static methods from_files or from_haploids.

ancestryAmt(ancestry): Calculates the total length of the genome in segments of the given ancestry.

ancestryProps(ancestries): Calculates the proportion of the genome represented by the given ancestries.

ancestryPropsByChrom(ancestries)

applychrom(func): Applies the function func to each chromosome of the individual.

create_gamete()

flat_imap(f): Lazily maps a function over the full underlying structure of this individual. The function must accept 3 parameters: chrom: the chromosome pair containing the tract, copy: the chromosome containing the tract, tract: the tract itself.

static from_files(paths, selectchrom=None, name=None): Constructs a diploid individual from two files, which describe the individuals haplotypes.

static from_haploids(haps)

iflatten(): Lazily flattens this individual to the tract level.

plot(colordict, win=None): Plots an individual. colordict is a dictionary mapping population label to a set of colors. E.g.: colordict = {“CEU”:’r’,”YRI”:b}.

class tracts.legacy.Population(list_indivs=None, names=None, fname=None, labs=('_A', '_B'), selectchrom=None, ignore_length_consistency=False, filenames_by_individual=None)

Bases: object

__init__(list_indivs=None, names=None, fname=None, labs=('_A', '_B'), selectchrom=None, ignore_length_consistency=False, filenames_by_individual=None)

Constructs a population of diploid individuals. A population is essentially a simple list of indiv objects.

There are two ways to build populations, either from a dataset stored in files or from a list of individuals. The facilities for loading populations from files present in this constructor are deprecated. It is advised to instead load a list of individuals, using indiv.from_file, and to then pass that list to this constructor.

The population can be initialized by providing it with a list of “individual” objects, or a file format fname and a list of names. If reading from a file, fname should be a tuple with the start middle and end of the file names., where an individual file is specified by start–Indiv–Middle–_A–End. Otherwise, provide list of individuals. Distinguishing labels for maternal and paternal chromosomes are given in lab.

ancestry_at_pos(select_chrom=0, pos=0, cutoff=0.0): Finds ancestry proportion at specific position. The cutoff is used to look only at tracts that extend beyond a given position.

ancestry_per_pos(select_chrom=0, npts=100, cutoff=0.0): Prepares the ancestry per position across chromosome.

applychrom(func, indlist=None): Applies func to chromosomes. If no indlist is supplied, apply to all individuals.

bootinds(seed): Returns a bootstrapped list of individuals in the population. Use with get_global_tractlength inds=… to get a bootstrapped sample.

collectpop(flatdat): Organizes a list of tracts into a dictionary keyed on ancestry labels.

flatpop(ls=None): Returns a flattened version of a population-wide list at the tract level, and throws away the start and end information of the tract,

getMeansByChrom(ancestries): Gets the ancestry proportions in each individual of the population for each chromosome.

get_global_tractlength_table(lenbound): Calculates the fraction of the genome covered by ancestry tracts of different lengths, specified by lenbound (which must be sorted).

get_global_tractlengths(npts=20, tol=0.01, indlist=None, split_count=1, exclude_tracts_below_cM=0)

Parameters:

tol (float, default: 0.01) – The tolerance for full chromosomes.
indlist (list of individuals, default: None) – The individuals for which we want the tractlength. To bootstrap over individuals, provide a bootstrapped list of individuals.

Notes

Sometimes there are small issues at the edges of the chromosomes. If a segment is within tol Morgans of the full chromosome, it counts as a full chromosome note that we return an extra bin with the complete chromosome bin, so that we have one more data point than we have bins.

get_mean_ancestry_proportions(ancestries): Gets the mean ancestry proportion averaged across individuals in the population.

get_means(ancestries): Gets the mean ancestry proportion (only among ancestries in ancestries) for all individuals.

get_meanvar(ancestries)

get_variance(ancestries)

Calculates the total variance in ancestry proportions, and the genealogy variance, and the assortment variance (the mean uncertainty about the proportion of genealogical ancestors, given observed ancestry patterns).

Parameters:: Ancestries – A set of ancestry labels.

Notes

All ancestries not listed are considered uncalled. For example, calling the function with a single ancestry leads to no variance. (and some 0/0 errors).

iflatten(indivs=None): Flattens a list of individuals to the tract level. If the list of individuals “indivs” is None, then the complete list of individuals contained in this population is flattened. The result is a generator.

list_chromosome(chronum): Collects the chromosomes with the given number across the whole population.

merge_ancestries(ancestries, newlabel): Treats ancestries in label list ancestries as a single population with label newlabel. Adjacent tracts of the new ancestry are merged.

new_indiv()

newgen(): Builds a new generation from this population.

plot(colordict)

plot_all_ancestries(npts=100, colordict=None, startfig=0, cutoff=0)

plot_ancestries(chrom=0, npts=100, colordict=None, cutoff=0.0)

plot_chromosome(i, colordict, win=None): Plots a single chromosome across individuals

plot_global_tractlengths(colordict, npts=40, legend=True)

plot_indiv()

plot_next()

plot_previous()

save()

split_by_props(count): Splits this population into groups according to their ancestry proportions. The individuals are sorted in ascending order of their ancestry named “anc”.

class tracts.legacy.Tract(start, end, label, bpstart=None, bpend=None)

Bases: object

A tract is the lower-level object of interest. All the remaining structure is built on top of lists of tracts. Essentially, a tract is simply a labelled interval.

__init__(start, end, label, bpstart=None, bpend=None)

Constructor.

Parameters:

(float) (end) – The starting point of this tract, in Morgans.
(float) – The ending point of this tract, in Morgans.
(string) (label) – A meaningful identifier for this tract. Generally this marks the ancestry associated with this tract.
(int (bpend) – The starting point of this tract, in basepairs. Since the rest of Tracts uses Morgans throughout, specifying this parameter is not necessary for Tracts to function correctly.
default (None):) – The starting point of this tract, in basepairs. Since the rest of Tracts uses Morgans throughout, specifying this parameter is not necessary for Tracts to function correctly.
(int – The ending point of this tract, in basepairs.
default – The ending point of this tract, in basepairs.

copy(): Constructs a new tract whose properties are the same as this one.

get_label(): Gets the label of the tract.

len(): Gets the length of the tract (in Morgans).

tracts.legacy.eprint(*args, **kwargs)

tracts.legacy.optimize(p0, bins, Ls, data, nsamp, model_func, outofbounds_fun=None, cutoff=0, verbose=0, flush_delay=0.5, epsilon=0.001, gtol=1e-05, maxiter=None, full_output=True, func_args=None, fixed_params=None, ll_scale=1)

Optimizes parameters to fit model to data using the BFGS method.

Parameters:

p0 – Initial parameters.
data – Spectrum with data.
model_function – Function to evaluate model spectrum. Should take arguments (params, pts).
out_of_bounds_fun (default None) – A function evaluating to True if the current parameters are in a forbidden region.
cutoff (default 0) – The number of bins to drop at the beginning of the array. This could be achieved with masks.
verbose (default 0) – If greater than zero, print optimization status every verbose steps.
flush_delay (default 0.5) – Standard output will be flushed once every flush_delay minutes. This is useful to avoid overloading I/O on clusters.
epsilon (default 1e-3) – Step-size to use for finite-difference derivatives.
gtol (default 1e-5) – Convergence criterion for optimization. For more info, see help(scipy.optimize.fmin_bfgs).
maxiter (default None) – Maximum iterations to run for.
full_output (default True) – If True, returns full outputs as described in help.(scipy.optimize.fmin_bfgs).
func_args (default None) – List of additional arguments to model_func. It is assumed that model_func’s first argument is an array of parameters to optimize.
fixed_params (default None) – (Not yet implemented). If not None, should be a list used to fix model parameters at particular values. For example, if the model parameters are (nu1,nu2,T,m), then fixed_params = [0.5,None,None,2] will hold nu1=0.5 and m=2. The optimizer will only change T and m. Note that the bounds lists must include all parameters. Optimization will fail if the fixed values lie outside their bounds. A full-length p0 should be passed in; values corresponding to fixed parameters are ignored.
ll_scale (default 1) – The bfgs algorithm may fail if your initial log-likelihood is too large. (This appears to be a flaw in the scipy implementation.) To overcome this, pass ll_scale > 1, which will simply reduce the magnitude of the log-likelihood. Once in a region of reasonable likelihood, you’ll probably want to re-optimize with ll_scale=1.

Notes