|
Author
|
Topic: Multiple Decrement Models
|
warren_bergerson
Member
Member # 262
|
posted 25. March 2003 12:10
Matt,
Quote: Finally, Warren has adddressed neither of Rex's questions in his last post. To reiterate: how does one constrain the values of the model's parameters? and what is the specific hypothesis that the "multiple decrement model" will test?
I addressed both issues but maybe the techniques being used are not familiar to you. The ‘limited number of alleles’ observation can potentially be explained using either the ‘no decrement’ assumption used by Rex, or by the ‘high rate of decrement applied to rare alleles’ assumptions I defined. Although the two sets of parameters can both fit a single measure of allele distribution, the two sets of parameters ‘predict’ statistically different results if for example we were to measure allele distributions for different sub- populations.
The multiple decrement model can be used to calculate the expected result from different sets of or assumptions. By comparing actual results to predicted results we determine which values or range of values are compatible with observed data.
By comparing actual to expected ‘divergence in allele distributions’, we can ‘constrain’ decrement rates for ‘rare alleles’ to ‘on average greater than 90%’ or ‘on average less than 10%’. As I stated in my comments to Rex, running simulations using the two sets of assumptions will demonstrate differences in results produced by the two sets of assumptions.
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 25. March 2003 17:04
Here are the results where all but the first 6 alleles are lethal, using the same parameters as in my post dated 19. March 2003 17:22. The following function has been rewritten as shown to pick out alleles that are not lethal for further propagation (original code is on this thread.
code:
void newgen(int *old_pop,int *new_pop) { int i,j,k; for (i=0;i<POP_SIZE;i++) { do { k = old_pop[rng_n(POP_SIZE)]; } while (k>LETHAL_LIMIT); new_pop[i] = mutate( k ); } }
Up from a single allele (this gets dull quickly):
code:
Generation 0 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 1000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 2000 has 2 alleles, max freq 100.0% (allele 0000), next 0.005 Generation 3000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 4000 has 3 alleles, max freq 100.0% (allele 0000), next 0.005 Generation 5000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 6000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 7000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 8000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 9000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 10000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000
Down from every possible allele (a bit more interesting; [note to self: used seed 14295]):
code:
Generation 0 has 2000 alleles, max freq 0.1% (allele 0996), next 0.115 Generation 1000 has 9 alleles, max freq 47.3% (allele 0005), next 15.440 Generation 2000 has 7 alleles, max freq 51.9% (allele 0005), next 21.170 Generation 3000 has 5 alleles, max freq 53.8% (allele 0005), next 20.440 Generation 4000 has 5 alleles, max freq 55.9% (allele 0005), next 26.460 Generation 5000 has 4 alleles, max freq 48.8% (allele 0005), next 29.500 Generation 6000 has 4 alleles, max freq 40.6% (allele 0005), next 37.315 Generation 7000 has 4 alleles, max freq 41.0% (allele 0005), next 39.735 Generation 8000 has 4 alleles, max freq 44.1% (allele 0005), next 35.095 Generation 9000 has 5 alleles, max freq 62.8% (allele 0005), next 25.595 Generation 10000 has 5 alleles, max freq 79.5% (allele 0005), next 15.095 Generation 11000 has 3 alleles, max freq 90.5% (allele 0005), next 5.910 Generation 12000 has 3 alleles, max freq 94.3% (allele 0005), next 4.750 Generation 13000 has 3 alleles, max freq 91.5% (allele 0005), next 8.505 Generation 14000 has 2 alleles, max freq 99.9% (allele 0005), next 0.090 Generation 15000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 16000 has 2 alleles, max freq 99.8% (allele 0005), next 0.210 Generation 17000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 18000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 19000 has 2 alleles, max freq 99.8% (allele 0005), next 0.210 Generation 20000 has 2 alleles, max freq 96.9% (allele 0005), next 3.150 Generation 21000 has 3 alleles, max freq 95.7% (allele 0005), next 4.270 Generation 22000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 23000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 24000 has 2 alleles, max freq 99.5% (allele 0005), next 0.520 Generation 25000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 26000 has 2 alleles, max freq 97.7% (allele 0005), next 2.260 Generation 27000 has 3 alleles, max freq 99.5% (allele 0005), next 0.490 Generation 28000 has 2 alleles, max freq 100.0% (allele 0005), next 0.005 Generation 29000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 30000 has 3 alleles, max freq 99.8% (allele 0005), next 0.235 Generation 31000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000 Generation 32000 has 2 alleles, max freq 96.9% (allele 0005), next 3.080 Generation 33000 has 2 alleles, max freq 99.9% (allele 0005), next 0.110 Generation 34000 has 2 alleles, max freq 100.0% (allele 0005), next 0.045 Generation 35000 has 2 alleles, max freq 96.1% (allele 0005), next 3.915 Generation 36000 has 2 alleles, max freq 92.4% (allele 0005), next 7.560 Generation 37000 has 2 alleles, max freq 98.3% (allele 0005), next 1.720 Generation 38000 has 3 alleles, max freq 100.0% (allele 0005), next 0.005 Generation 39000 has 2 alleles, max freq 98.9% (allele 0005), next 1.130 Generation 40000 has 1 alleles, max freq 100.0% (allele 0005), next 0.000
We might think that we ought to raise the mutation rate enormously in order to compensate; if we raise the rate by a factor of, say, 130, then:
Up from one--
code:
Generation 0 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 1000 has 28 alleles, max freq 99.8% (allele 0000), next 0.080 Generation 2000 has 22 alleles, max freq 99.9% (allele 0000), next 0.050 Generation 3000 has 27 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 4000 has 26 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 5000 has 29 alleles, max freq 96.6% (allele 0000), next 2.095 Generation 6000 has 35 alleles, max freq 96.5% (allele 0000), next 2.725 Generation 7000 has 28 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 8000 has 25 alleles, max freq 99.8% (allele 0000), next 0.105 Generation 9000 has 29 alleles, max freq 99.3% (allele 0000), next 0.340 Generation 10000 has 22 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 11000 has 18 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 12000 has 21 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 13000 has 27 alleles, max freq 98.5% (allele 0000), next 1.390 Generation 14000 has 21 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 15000 has 18 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 16000 has 25 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 17000 has 19 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 18000 has 27 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 19000 has 30 alleles, max freq 98.8% (allele 0000), next 1.105 Generation 20000 has 36 alleles, max freq 99.7% (allele 0000), next 0.155 Generation 21000 has 19 alleles, max freq 99.8% (allele 0000), next 0.095 Generation 22000 has 18 alleles, max freq 99.9% (allele 0000), next 0.025 Generation 23000 has 31 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 24000 has 31 alleles, max freq 98.9% (allele 0000), next 0.980 Generation 25000 has 29 alleles, max freq 99.4% (allele 0000), next 0.435
Down from all--
code:
Generation 0 has 2000 alleles, max freq 0.1% (allele 1110), next 0.105 Generation 1000 has 33 alleles, max freq 27.0% (allele 0004), next 16.675 Generation 2000 has 41 alleles, max freq 24.5% (allele 0004), next 21.710 Generation 3000 has 34 alleles, max freq 28.4% (allele 0000), next 25.115 Generation 4000 has 38 alleles, max freq 29.7% (allele 0003), next 25.865 Generation 5000 has 34 alleles, max freq 44.7% (allele 0000), next 26.525 Generation 6000 has 31 alleles, max freq 64.2% (allele 0000), next 13.750 Generation 7000 has 40 alleles, max freq 83.4% (allele 0000), next 9.750 Generation 8000 has 38 alleles, max freq 84.8% (allele 0000), next 6.205 Generation 9000 has 31 alleles, max freq 75.6% (allele 0000), next 15.570 Generation 10000 has 31 alleles, max freq 46.2% (allele 0003), next 30.675 Generation 11000 has 24 alleles, max freq 46.3% (allele 0003), next 27.155 Generation 12000 has 31 alleles, max freq 43.3% (allele 0003), next 41.040 Generation 13000 has 35 alleles, max freq 47.6% (allele 0006), next 39.570 Generation 14000 has 26 alleles, max freq 38.3% (allele 0003), next 35.415 Generation 15000 has 32 alleles, max freq 40.4% (allele 0003), next 38.325 Generation 16000 has 26 alleles, max freq 39.4% (allele 0003), next 37.665 Generation 17000 has 26 alleles, max freq 62.5% (allele 0003), next 21.070 Generation 18000 has 27 alleles, max freq 59.8% (allele 0003), next 20.320 Generation 19000 has 20 alleles, max freq 63.3% (allele 0003), next 18.360 Generation 20000 has 28 alleles, max freq 73.0% (allele 0003), next 14.400 Generation 21000 has 24 alleles, max freq 54.2% (allele 0003), next 30.475 Generation 22000 has 27 alleles, max freq 53.5% (allele 0003), next 28.075 Generation 23000 has 27 alleles, max freq 44.1% (allele 0003), next 39.695 Generation 24000 has 39 alleles, max freq 40.9% (allele 0006), next 30.825 Generation 25000 has 39 alleles, max freq 49.7% (allele 0000), next 27.280 Generation 26000 has 29 alleles, max freq 74.4% (allele 0000), next 14.025 Generation 27000 has 25 alleles, max freq 87.0% (allele 0000), next 9.390 Generation 28000 has 27 alleles, max freq 86.6% (allele 0000), next 11.185 Generation 29000 has 30 alleles, max freq 82.0% (allele 0000), next 13.665 Generation 30000 has 30 alleles, max freq 89.5% (allele 0000), next 9.055 Generation 31000 has 25 alleles, max freq 94.1% (allele 0000), next 5.765 Generation 32000 has 28 alleles, max freq 98.0% (allele 0000), next 1.600 Generation 33000 has 29 alleles, max freq 93.8% (allele 0000), next 5.980 Generation 34000 has 39 alleles, max freq 95.0% (allele 0000), next 4.635 Generation 35000 has 24 alleles, max freq 90.8% (allele 0000), next 9.115 Generation 36000 has 28 alleles, max freq 94.1% (allele 0000), next 4.900 Generation 37000 has 25 alleles, max freq 97.8% (allele 0000), next 2.035 Generation 38000 has 21 alleles, max freq 99.9% (allele 0000), next 0.020 Generation 39000 has 31 alleles, max freq 99.9% (allele 0000), next 0.005 Generation 40000 has 19 alleles, max freq 99.9% (allele 0000), next 0.010
Now, I have to caution that these results have to be taken with a grain of salt as they're bashing pretty heavily on the random number generator I'm using (which is not a very robust implementation).
But, basically, for short genes, the limiting case appears to be either a single allele, or piles of very rare alleles, neither of which matches experiments that well on average across the genome.
This looks like a failure to me--having 99% of the alleles instantly lethal (out of 2000) produces such strong selection that it can't explain all the genes that do have multiple alleles.
Is there another hypothesis to test? I'm sure we can find a specific percentage of lethal alleles and mutation rate that will have as a limiting case any possible distribution of alleles we want.
However, this kind of data-fitting is completely unwarranted, since as my previous models show, there is a wide variability in the frequency of primary and secondary alleles with zero lethality. We don't have the resources here to investigate whether various alleles actually are lethal. [ 25. March 2003, 17:06: Message edited by: Rex Kerr ]
IP: Logged
|
|
warren_bergerson
Member
Member # 262
|
posted 26. March 2003 08:52
Rex,
Boring isn’t always bad. The results show that a high decrement model can produce high concentrations of a limited number of alleles. Three general comments on testing procedures.
ESTIMATING AND MEASURING RESULT DISTRIBUTIONS First, testing would normally be based on looking at result distributions rather than looking at the results of a single run. Typically you would generate 10, 100, or 1000 runs to establish the mean, variance, and general shape of the distribution of results produced by a set of assumptions. For the test being performed here, you would have an expected distribution for both the high decrement and zero decrement scenarios.
In addition to calculating distributions of expected results, it would typically be useful if you can observe multiple ‘actual’ results. For the human genome, we can look at the distribution of alleles for multiple alleles. Given the assumption that ‘effective population sizes are quite small’, we can also obtain results for different sub- populations. Of particular interest would be those separated for a period of time.
I get the impression that in a wide variety of species, if you sample allele distributions per gene you will find a J curve with one allele genes being most common, then 2 allele, then 3 allele. There are undoubtedly exceptions, but the J curve is at least common. I would predict, without any specific knowledge of the subject, that you will find the J curve distribution in species with both large populations and small populations. [Note: The fact that there are two copies of each gene could complicate this prediction. ]
Again based on the results shown, it appears almost certain that the high decrement distribution of results is consistently closer to the actual observed results than the zero decrement results. Given a simple choice between a high decrement model and a zero decrement model, the high decrement model will consistently provide a better fit to observed results.
SENSITIVITY TESTING While it seems clear that the high decrement model will provide a better fit than the zero decrement model, I don’t know how close the fit will be or how easy it would be to improve the fit by modifying assumption. Running a few ‘what if’ scenarios or sensitivity tests can give you an idea of what could happen. You have provided a number of interesting examples of what if analysis.
One interesting, if somewhat complex form sensitivity testing would be to look at the impact of interbreeding. If you have two separate populations, then over time you would expect to find some level of divergence in the set of rare or uncommon alleles present in the population. I would expect, that even a fairly low rate of interbreeding between the two populations have an impact similar to ( but obviously slower than) doubling the effective population size.
SUCCESSIVE APPROXIMATION One of the basic ideas behind ‘whole system’ modeling is that the results obtained from one set of studies are carried forward into the next series of studies. Once we have established what types of assumptions are compatible with or can explain the observed distributions of alleles, then those assumptions are used in the next set of tests.
Once we can agree on a set of assumption that fits the distribution of alleles in the human population, then we would consider if those assumptions are compatible with or can be reconciled with the results observed in some other species.
NEXT HYPOTHESIS Since we do not yet appear to agree on the decrement rate hypothesis, we still need to see if that issue can be resolved. I would suggest we start by comparing the range of results produced by two sets of assumptions.
IP: Logged
|
|
Frances
Member
Member # 169
|
posted 26. March 2003 12:55
Frances. Stop the posturing. [ 26. March 2003, 13:29: Message edited by: Moderator ]
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 26. March 2003 17:14
quote: Again based on the results shown, it appears almost certain that the high decrement distribution of results is consistently closer to the actual observed results than the zero decrement results. Given a simple choice between a high decrement model and a zero decrement model, the high decrement model will consistently provide a better fit to observed results.
Utterly absurd. I have cited articles, figures, statistics, and so on, that you have agreed with before that clearly show that there are multiple alleles per gene for a substantial number of genes; that heterozygosity is on the order of 20%, and so on. The model based on your specifications shows a process that stabilizes at an allele frequency of 99.9%, or a heterozygosity of 0.2%!
Aside from wishful thinking, what are you basing your statement on, anyway?
quote: First, testing would normally be based on looking at result distributions rather than looking at the results of a single run. Typically you would generate 10, 100, or 1000 runs to establish the mean, variance, and general shape of the distribution of results produced by a set of assumptions.
I don't have the time and computing resources to do a thousand runs of every conceivable hypothesis. You are welcome to do so if you wish.
I agree that there are interesting questions with respect to inbreeding, distributions of alleles in humans, sexual reproduction, linkage disequilibrium, and so on. I suggest you look through the list of articles that Mesk has provided, though, or do the computations yourself. I've provided code, simulations based on data that are well-constrained by the experimental data, and shown results that are consistent with observation to within a factor of two. For the type of "back of the envelope" calculation typical for message boards, this already seems excessive (and in reasonable agreement with experiment).
If you want to make extravagant claims such as "the high decrement model will consistently provide a better fit to observed results", I think it is high time that you showed results that gave some indication of this.
IP: Logged
|
|
|