|
Author
|
Topic: Multiple Decrement Models
|
warren_bergerson
Member
Member # 262
|
posted 18. March 2003 12:35
Rex,
Thanks for providing the program. It is, IMO, much easier to discuss issues in terms of explicit models. The issues here are, IMO, not mathematics but how the mathematics is used and interpreted. More specifically the issue here is the standards applicable to uses and interpretations of complex projection models. But lets look at your model and see what it says regarding the issues being discussed.
THE ASSUMPTIONS IN YOUR MODEL Just to confirm that I understand your model, the following are my interpretations of the key assumptions used in the model:
1. Mutation rate- 1in 4096 mutate to any of 256 possibilities so allele Ax mutating to Ay is 1 in 4096*256 2. Reproduction assumption- Randomly select allele for each individual in next generation based on probability distribution from prior generation. 3. Population size defined by programmer.
You characterize the 1 in 4096 as a very high rate of mutation. While I would agree that the rate is high relative to ‘mutations that survive in offspring’ it is not a high rate when viewed in terms of point mutation rates. If the raw point mutation rate is 10^-8 per cell per possible mutation, there are 100 cell divisions per generation and 3000 possible point mutation per gene then the expected mutations per generation per gene (in the absence of decrements) would be roughly 1 in 333. The difference between ‘raw point mutation rates’ and ‘observed generation to generation rates of mutation per genes’ is a separate subject. For the subject being discussed here, the 1 in 4096 appears to be a useful assumption.
THE ELEPHANT SEAL STUDY I don’t want to get side tracked by this issue, but the results reported still appear to be questionable. I don’t know exact time frames, but it is my understanding that the hunting that reduced the population occurred over a very short period of time 100 years? or 20 generations?. I assumed the 20 surviving seals were essentially a random sample of the original population. Once the seals were protected, it is my understanding, the population increased steadily. Again, I believe the time frame is on the order of 100 years? 20 generations?.
The initial distribution of alleles before the impact of hunting probably involved a greater concentration of allele than implied by the 50% assumption. It seems likely that some data might exist on typical distributions in populations of seals. As your calculations demonstrate, small populations over an extended period of time will dramatically reduce the amount of diversity in the gene pool. However, all these factors considered, the finding of 1 dominant allele for each of 24 randomly selected genes seems highly unlikely.
If you started with 1) a realistic estimate of the distribution of alleles in the initial population, 2)an assumption of no mutations, and 3)realistic estimates of the population changes that actually occurred over actual time frames, then, I predict, your model will show the result of 24 genes all with a single allele, is a highly improbable result. The result is highly improbably despite the bottleneck effect.
Understand I am not questioning that bottlenecks reduce the level of variance or diversity in a gene pool. I am simply questioning the improbability of the 24 and 0 result.
STABILITY AND CHANGE The punctuated equilibrium theory suggests that evolutionary and thus genetic changes is characterized by extended periods of stability interspersed with periods of relatively rapid change. The question being addressed here is "How do you create a model which simulates both stability and change?". More specifically, how do you create a model that is stable under the range of conditions where biological systems are stable and changes under the range of conditions where biological systems change.
Starting with your basic model which I believe is a mutation only model, a multiple decrement mutation-selection or mutation-(other decrements and increments) model can be created by adding decrement/increment processes. To do this, after the selection of an allele for the next generation and after applying the mutation process, a force or probability of decrement would be applied. The force or probability of decrement would d be different for each possible allele or subset of possible allele. Using random number generators and probabilities of decrement or survival the program would determine whether a selected allele would survive and be present in the future generation. The model would continue to generate and decrement new alleles until the desired population has been generated.
I would define the above decrement process using computer code, but I am not familiar with the particular language you are using. I believe most of the issues being discussed can be addressed by a decrement process to your one gene-256 possible allele model.
As your model demonstrates, using the zero decrement/selection assumption, allele populations diverge for moderately large populations. I would characterize the divergence as quite rapid at least for moderately sized populations. [If you assume low rates of mutation, the rate of divergence is slowed, but then so is the potential rate of evolutionary change.] The question to be addressed is what rates of decrement would be required to prevent divergence. For example, what rates of decrement would be required to achieve ‘98% concentration in 3 alleles after 10,000 generations of a population of 1 million starting with a single allele’. [ There must be many examples of large populations that have been stable for 10,000 generations that have gene concentrations that involve 98% of one allele.]
I have not run the simulation, but I suspect that maintaining high concentrations of a limited number of alleles under ‘normal conditions requires very high rates of decrement applicable to alleles other than ‘common’ alleles.
Using the multiple decrement version of your model, it would also be possible to measure the amount of time and changes in selection/decrement required for an evolutionary change from allele Ax to allele Ay.
As you yourself have demonstrated, it is possible/practical to simulate various types of genetic change using simplified models and assumptions. Adding per potential allele decrement assumptions into such a model is clearly practical. Using such models, it is practical to test what types of mutation and decrement assumptions are compatible with observed distributions of alleles, and observed rates of evolutionary change.
Before we begin to evaluate results, we need to agree that the required simulations are practical. The evidence would certainly suggest that is the case.
POSSIBLE VERSUS ACTUAL EXPLANATIONS Quote: Bottom line: if the effective population size is small, drift is a very effective means of limiting appearance of alleles. What's this effective population size thing? Well, animals have a habit of reproducing with their neighbors. Even if they're not actually completely isolated from each other, this can effectively lead to clearing of rare alleles, and when that small population gets lucky and spreads into a new area, they start off with only a few common alleles per gene (unless selection has intervened).
As you models demonstrate, very small populations over an extended period of time can dramatically reduce genetic variance. This is one possible explanation of some occurrences of gene pools with limited levels of diversity. Can it be demonstrated that this bottleneck process explains all or even most occurrences of ‘restricted allele distributions’. The answer is clearly no. As your own models demonstrate, in the absence of decrement processes, allele distributions diverge for even moderately large populations stable over a relatively short evolutionary periods of time.
One of the apparent problems in genetics seems to be confusing ‘could have happened that way’ with ‘actually did happen that way’. It is easy to demonstrate mathematically that founders effect, and bottlenecks, and genetic drift ‘could’ produce certain results under certain limited conditions. This is not the same as ‘did happen that way’. It is as you point out obvious, that bottlenecks could under some conditions reduce the amount of genetic variance. There are, however, equally obvious examples of situations where genetic variance is reduced, but the reduction is not explainable in terms of drift or bottlenecks. There are many situations where very strong section/decrement processes must exist in order to produce observed levels of genetic diversity.
Again thanks for posting a model and model results. Using real models, we may still disagree on interpreting results, but ultimately we have a basis for resolving those differences of opinion.
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 19. March 2003 00:18
Rather than implement a multiple-decrement technique, why don't I run the best estimate of effective population sizes and mutation rates through the non-decrement model and see what happens?
Recent estimates of effective human population size are in the 10,000-40,000 range. First, let me point out that this is not at all an unreasonable number socialogically; in fact, to me, it seems surprisingly large. Since humans lived in towns, it has been uncommon for them to go far afield from their towns, which tends to cluster people in groups of a few hundred to a few thousand. Almost all matings take place in limited groups of this size (often much smaller).
A couple of recent articles address exactly the question of effective population size: Genetics 162:1811 and 163:395. I'll quote the entire abstract for the former. quote: Likelihood and bayes estimation of ancestral population sizes in hominoids using data from multiple Loci.
Yang Z.
Galton Laboratory, Department of Biology, University College London, London WC1E 6BT, England.
Polymorphisms in an ancestral population can cause conflicts between gene trees and the species tree. Such conflicts can be used to estimate ancestral population sizes when data from multiple loci are available. In this article I extend previous work for estimating ancestral population sizes to analyze sequence data from three species under a finite-site nucleotide substitution model. Both maximum-likelihood (ML) and Bayes methods are implemented for joint estimation of the two speciation dates and the two population size parameters. Both methods account for uncertainties in the gene tree due to few informative sites at each locus and make an efficient use of information in the data. The Bayes algorithm using Markov chain Monte Carlo (MCMC) enjoys a computational advantage over ML and also provides a framework for incorporating prior information about the parameters. The methods are applied to a data set of 53 nuclear noncoding contigs from human, chimpanzee, and gorilla published by Chen and Li. Estimates of the effective population size for the common ancestor of humans and chimpanzees by both ML and Bayes methods are approximately 12,000-21,000, comparable to estimates for modern humans, and do not support the notion of a dramatic size reduction in early human populations. Estimates published previously from the same data are several times larger and appear to be biased due to methodological deficiency. The divergence between humans and chimpanzees is dated at approximately 5.2 million years ago and the gorilla divergence 1.1-1.7 million years earlier. The analysis suggests that typical data sets contain useful information about the ancestral population sizes and that it is advantageous to analyze data of several species simultaneously.
The other paper estimates about 10k recently and 40-70k historically; I haven't looked at both in enough detail to know who to believe and how reliable their estimates should be.
Now, what about mutation rates? The latest estimate I've been able to find is in Hum. Mutat. 21:12:
quote: Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases.
Kondrashov AS.
National Center for Biotechnology Information, NIH, Bethesda, Maryland 20892, USA. Kondrashov@ncbi.nlm.nih.gov
I estimate per nucleotide rates of spontaneous mutations of different kinds in humans directly from the data on per locus mutation rates and on sequences of de novo nonsense nucleotide substitutions, deletions, insertions, and complex events at eight loci causing autosomal dominant diseases and 12 loci causing X-linked diseases. The results are in good agreement with indirect estimates, obtained by comparison of orthologous human and chimpanzee pseudogenes. The average direct estimate of the combined rate of all mutations is 1.8x10(-8) per nucleotide per generation, and the coefficient of variation of this rate across the 20 loci is 0.53. Single nucleotide substitutions are approximately 25 times more common than all other mutations, deletions are approximately three times more common than insertions, complex mutations are very rare, and CpG context increases substitution rates by an order of magnitude. There is only a moderate tendency for loci with high per locus mutation rates to also have higher per nucleotide substitution rates, and per nucleotide rates of deletions and insertions are statistically independent on the per locus mutation rate. Rates of different kinds of mutations are strongly correlated across loci. Mutational hot spots with per nucleotide rates above 5x10(-7) make only a minor contribution to human mutation. In the next decade, direct measurements will produce a rather precise, quantitative description of human spontaneous mutation at the DNA level.
So the rate is apparently around 2e-8/generation per nucleotide. An average human gene contains about 1340bp of coding sequence, and although almost a third of those result in synonymous mutations, presumably there is quite a bit of noncoding sequence that is important too. So we might guess at around 2000bp per gene that could mutate to give a functionally different allele. This generates a rate of new alleles of about 4e-5/generation per nucleotide. (4e-5 is 1 in 25000.)
This has been going on for about 5 million years, according to the Yang abstract, which translates to between 250k and 500k generations. Let's see what a simple, decrement-free model produces (population size=20,000 alleles (low end of range because I'm impatient), mutation rate=1/25000 (high end of range because I'm impatient), for 250k generations (low end because I'm impatient), starting from a single allele (it reaches equilibrium pretty fast, so doesn't matter):
code:
Generation 0 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 1000 has 9 alleles, max freq 96.0% (allele 0000), next 3.430 Generation 2000 has 12 alleles, max freq 91.2% (allele 0000), next 6.155 Generation 3000 has 11 alleles, max freq 89.1% (allele 0000), next 6.250 Generation 4000 has 12 alleles, max freq 86.2% (allele 0000), next 10.005 Generation 5000 has 10 alleles, max freq 96.0% (allele 0000), next 1.450 Generation 6000 has 15 alleles, max freq 93.2% (allele 0000), next 3.735 Generation 7000 has 12 alleles, max freq 88.7% (allele 0000), next 4.430 Generation 8000 has 8 alleles, max freq 66.6% (allele 0000), next 12.670 Generation 9000 has 20 alleles, max freq 43.9% (allele 0000), next 33.845 Generation 10000 has 15 alleles, max freq 48.4% (allele 0000), next 24.070 Generation 11000 has 23 alleles, max freq 36.8% (allele 4060), next 31.940 Generation 12000 has 9 alleles, max freq 45.4% (allele 4390), next 38.630 Generation 13000 has 16 alleles, max freq 37.4% (allele 4390), next 26.755 Generation 14000 has 16 alleles, max freq 44.2% (allele 0000), next 29.135 Generation 15000 has 22 alleles, max freq 30.1% (allele 0000), next 24.920 Generation 16000 has 16 alleles, max freq 27.0% (allele 4390), next 26.265 Generation 17000 has 23 alleles, max freq 28.4% (allele 4390), next 24.320 Generation 18000 has 14 alleles, max freq 33.2% (allele 1493), next 30.810 Generation 19000 has 16 alleles, max freq 30.2% (allele 1493), next 27.990 Generation 20000 has 17 alleles, max freq 31.2% (allele 1493), next 31.000 Generation 21000 has 18 alleles, max freq 33.4% (allele 1493), next 23.290 Generation 22000 has 20 alleles, max freq 41.5% (allele 1493), next 20.995 Generation 23000 has 11 alleles, max freq 36.1% (allele 1493), next 32.875 Generation 24000 has 15 alleles, max freq 37.5% (allele 1493), next 17.445 Generation 25000 has 16 alleles, max freq 47.4% (allele 1493), next 14.225 . . . Generation 115000 has 9 alleles, max freq 83.0% (allele 2295), next 9.075 Generation 116000 has 14 alleles, max freq 91.2% (allele 2295), next 4.660 Generation 117000 has 10 alleles, max freq 97.0% (allele 2295), next 1.725 Generation 118000 has 12 alleles, max freq 90.8% (allele 2295), next 3.740 Generation 119000 has 9 alleles, max freq 97.9% (allele 2295), next 0.885 Generation 120000 has 11 alleles, max freq 96.1% (allele 2295), next 1.760 Generation 121000 has 11 alleles, max freq 87.6% (allele 2295), next 3.785 Generation 122000 has 11 alleles, max freq 81.1% (allele 2295), next 14.470 Generation 123000 has 16 alleles, max freq 78.3% (allele 2295), next 9.520 Generation 124000 has 24 alleles, max freq 74.5% (allele 2295), next 6.700 Generation 125000 has 19 alleles, max freq 64.2% (allele 2295), next 7.085 Generation 126000 has 18 alleles, max freq 66.4% (allele 2295), next 10.060 Generation 127000 has 17 alleles, max freq 51.6% (allele 2295), next 15.390 Generation 128000 has 16 alleles, max freq 27.0% (allele 1111), next 25.720 Generation 129000 has 13 alleles, max freq 29.1% (allele 1111), next 22.350 Generation 130000 has 19 alleles, max freq 31.6% (allele 3057), next 20.775 Generation 131000 has 17 alleles, max freq 29.7% (allele 3057), next 26.325 Generation 132000 has 18 alleles, max freq 35.1% (allele 3057), next 24.935 Generation 133000 has 18 alleles, max freq 28.2% (allele 3057), next 22.510 Generation 134000 has 23 alleles, max freq 35.4% (allele 3057), next 24.190 Generation 135000 has 9 alleles, max freq 52.9% (allele 3057), next 27.250 . . . Generation 235000 has 21 alleles, max freq 29.9% (allele 7184), next 24.850 Generation 236000 has 16 alleles, max freq 25.3% (allele 7184), next 16.870 Generation 237000 has 25 alleles, max freq 27.7% (allele 7184), next 17.735 Generation 238000 has 17 alleles, max freq 28.0% (allele 7184), next 18.945 Generation 239000 has 20 alleles, max freq 30.3% (allele 0907), next 21.390 Generation 240000 has 12 alleles, max freq 33.5% (allele 7184), next 24.660 Generation 241000 has 20 alleles, max freq 26.2% (allele 0907), next 19.315 Generation 242000 has 19 alleles, max freq 32.3% (allele 5893), next 19.180 Generation 243000 has 15 alleles, max freq 30.5% (allele 5893), next 24.460 Generation 244000 has 15 alleles, max freq 34.2% (allele 5893), next 32.485 Generation 245000 has 14 alleles, max freq 33.1% (allele 5893), next 31.605 Generation 246000 has 21 alleles, max freq 32.1% (allele 5893), next 24.210 Generation 247000 has 19 alleles, max freq 36.8% (allele 5893), next 18.975 Generation 248000 has 23 alleles, max freq 41.9% (allele 5893), next 13.365 Generation 249000 has 24 alleles, max freq 35.4% (allele 5893), next 21.420 Generation 250000 has 18 alleles, max freq 33.4% (allele 3174), next 29.485
So we see that we expect to end up with about 10-20 alleles, usually with only a few really common alleles (e.g. generation 120000 had only one dominant allele, while 14000 probably had two and 250000 likely had three).
How far off is this from actual human data? Well, it indicates that something like 8000/20 = 1/400 of possible alleles should be observed *at all*, and current estimates of SNPs in the human genome are about 1/1000, which is the same order of magnitude, especially when you consider that most of those alleles would be hard to find.
It's also roughly consistent with the studies of disease genes that I cited earlier--one or a few common forms, with five to ten differences isolated in total.
One could add a multiple decrement to this model, but is it necessary? Rather than invoking a completely unexplained method, it would seem to be more interesting to first implement more realistic aspects of the current model, such as taking into account the expected fraction of mutations that were lethal (about 1/3 genes are knocked out to be lethal; from what I've seen at least 1/10 of mutations generate a knockout in such genes); the different lengths of genes (we expect less variation in shorter genes and more in longer genes); the tendency of humans to produce multiple offspring with a single partner; a more detailed examination of what a sensible mutation rate is; and so on.
I know C, C++, Java, Perl, Python, Pascal, Modula-2, and Matlab passably well, so I'll happily translate languages if you wish to do further experiments in any of these. [ 19. March 2003, 15:51: Message edited by: Rex Kerr ]
IP: Logged
|
|
warren_bergerson
Member
Member # 262
|
posted 19. March 2003 11:44
Rex,
Thanks again for the wealth of specific information.
It may be useful to repeat that we are attempting to evaluate the relative merits of competing hypothesis-
1. DRIFT- The distributions of alleles in existing populations can be explained and modeled by statistical fluctuations (and natural selection) including the impacts of bottlenecks and founders effect.
2. DYNAMIC DECREMENT - The distributions of alleles in existing populations is explained and modeled by dynamic decrement processes where changes in the decrement processes are controlled by processes other than ‘mutation and natural selection’.
Determining which of these hypothesis better fits the data involves at least three components. First, it is necessary to develop models which fit both available information and the proposed hypothesis. As the information you provide suggests, there are a large number of parameters which can and or may impact the distribution of alleles. Some of the obvious factors are population size, changes in population size, elapsed time, mutation rates, and the size and range of possible and likely mutations. You have provided information on a number of the variables needed to develop a drift model. I have provided my comments on your assumptions below.
Once both drift and dynamic models are developed which fit available information, then the second step is to determine if the models developed are compatible with known rates of genetic change. This, obviously is primarily a test for the drift model.
The third step is to identify or test for the existence of the physical mechanisms responsible for changing decrement rates as implied by the dynamic decrement hypothesis.
I don’t expect the discussion here to go much beyond the first stage, fitting models to the data stage. It is, however, useful to keep in mind that finding a model that fits the data is only the first step testing the two competing hypotheses. PROPOSED PARAMETERS 1. EFFECTIVE POPULATION SIZE- I agree with the suggestion that a population between 10-40,000 is appropriate for testing both drift models and dynamic decrement models. I would, however, add a number of caveats. First, if populations of this size are appropriate for demonstrating drift, they should also be adequate for demonstrating genetic change. Second, these effective population sizes, suggest multiple instances of effective populations within a species. This suggests predictable differences between different effective populations. Third, sensitivity testing would identify/predict expected differences if effective size changes. In general, assuming relatively small effective population sizes will make it easier to test and validate both drift and dynamic decrement models.
2. MUTATION ASSUMPTIONS- There are at least five general parameters associated with mutation assumptions. One parameter is the measurement base. Are mutation rates measured ‘per cell division’ or ‘per generation’. The second type of parameter are the domain and range of the mutation process. Which starting point alleles mutate into which end point alleles. Of particular concern here, what are the likely and reasonably possible forms of mutations to be generated by the existing common forms of alleles. The third parameter is the mutation rate or probability that given allele Ax at some point in time, mutation processes will result in Ay at some future point in time.
The fourth parameter covers ‘other factors’ which may include differences between species(between generation mutation rates may be different for different species) and possibly even ‘differences in mutation rates for different genes within the same species). The fifth parameter involves the temporal changes in rates of mutation suggested by the dynamic decrement hypotheses.
IMO, the data available suggests that biological systems exhibit a wide range of mutation processes with a wide range of different rates of mutation. Again IMO, for a given allele A in a given species B, i) the set or range of ‘likely mutations is relatively small’ (probably a handful)’, ii) the set of ‘rare’ mutations probably contains only a few thousand members, and iii) the probabilities associated with the much larger set of extremely rare mutations are extremely low.
Given these caveats, I feel you should have a wide range of discretion in defining the mutation rates to be used in creating a drift model. The only specific requirement is that the assumption used be explicitly defined. It will also be noted, that if different species exhibit different rates of mutation, then the mechanisms responsible for these differences ‘could’ possibly explain the physical mechanisms responsible for making mutation rates dynamic.
This is a long winded way of saying I have no objection to the mutation assumption you used in your model, at least not as a starting point.
3. STARTING POINT POPULATION- PROJECTION PERIOD-Starting with a single allele, and the assumed mutation rates and population size, your model appears to stabilize after 10,000 generations which for humans would be about 200,000 years. Subject to sensitivity testing the various assumptions, the starting point and projection period seem reasonable.
INTERPRETATION OF RESULTS Are the results produced by the no decrement projection consistent with the observed distribution of alleles in the human population? I would argue that the model does not fit the observed data. First, note that the frequency of the dominant allele decreases and stabilizes below 50%. I believe it will be very difficult to find parameters that will get the percentage of the dominant gene above 50% for a large number of genes. From what you quoted earlier, I got the impression that alleles with frequencies in excess of 50% are fairly common.
Second, and probably more telling is the composition of the set of non-dominant alleles. If as you suggest an effective population is 10-40,000 individuals, then the human population must contain many such effective populations. I believe, no matter where you start, you would expect the composition of the set of non-dominant alleles to be different for different effective populations. As you mentioned earlier, there are instances where one group of humans will have an allele not present in other populations. But, as far as I know, there is no evidence that sub-populations have dramatically different sets of ‘non-dominant alleles’ as, I believe, your model would predict.
I believe it is easily demonstrated that almost any commonly observed distribution of alleles can be modeled/explained using variable or dynamic decrement assumptions. I believe that ‘very high rates of decrement’ need to be used to prevent the ‘ratio of the dominant allele from dropping below say 80-90% and preventing major differences in the sets of non-dominant alleles in different sub-populations.
It is again important to note that the key issue being addressed here is the feasibility of modeling genetic changes and the feasibility of testing competing hypotheses.
PROGRAMMING LANGUAGE The only language with which I would claim any degree of proficiency is APL. I have some familiarity with visual basic. I know the type of population program you are using could be generated in APL but it is not a commonly used or available. I am currently looking at the feasibility of building the model in visual basic.
Thanks again for presenting explicit models and simulations. Hopefully in a few days I can return the favor.
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 19. March 2003 17:22
Please don't use the word "dominant" to mean "common" or "primary form" when talking about alleles. The word "dominant" is already reserved meaning that a mutant allele has a phenotype when present in only a single copy. I can, with effort, understand what you mean, but given my training in genetics, "dominant" immediately brings the wrong set of concepts to my mind (and to the minds of most other biologists, or at least geneticists, I imagine).
Also, remember the "founder effect" that I mentioned earlier? If you look at the history of human migrations, you begin to suspect that the vast majority of genetic material around today came from a single population (or a small number of populations). There are several different models for the distribution of humans around the globe; in one model, there was a radiative event about 300k years ago, establishing various populations around the world, and a second radiative event around 80k (or 40k?) years ago that replaced the previous groups. Alternatively, the new radiative groups may have heavily interbred with the existing populations. Molecular data seems to be more consistent with the second model (I believe the source is Human Molecular Genetics by Strachan & Read, but I'm not positive), and archaeological evidence is consistent with either (though IMO slightly more consistent with replacement).
This is only about 2000 generations, well below the settling time of my model--regardless of population size, in fact, given the mutation rate.
So the historical evidence seems most consistent with an effective population size of about 10-20k "recently", followed by no more than 2000 generations or so of expansion. This isn't enough time to switch the most common allele unless the most common allele is already almost tied with the next-most-common--or there are selective effects (for skin melanin levels, for instance).
I agree that in my model the frequency stabilizes below 50% for the most part, but it does not do so uniformly. That's exactly why I included the stretch from 115000 to 135000--you can clearly see that there was a phase of low secondary-allele frequency there, simply by chance.
Furthermore, I don't think it's safe to say that a majority of genes have high frequencies of the most common allele. For example, blood type varies substantially by racial group. Now that I think about it, there's an inconsistency between the estimates of heterozygosity I posted earlier and the vast number of single nucleotide polymorphisms (i.e. differences at a single base pair) already found by the human genome project. The HGP has found about one SNP every 1000 base pairs, and they're not sequencing many people at all, so you'd tend to expect that over half of the genes would have reasonable frequencies of a second allele. Note that the results are highly dependent on gene length; if the gene is only 500bp long (not too uncommon for smaller cytoplasmic proteins), you get (with the fourfold reduced mutation/gene rate)
Up from one allele:
code:
Generation 0 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 1000 has 2 alleles, max freq 99.6% (allele 0000), next 0.410 Generation 2000 has 5 alleles, max freq 99.8% (allele 0000), next 0.085 Generation 3000 has 2 alleles, max freq 99.8% (allele 0000), next 0.230 Generation 4000 has 4 alleles, max freq 99.6% (allele 0000), next 0.410 Generation 5000 has 1 alleles, max freq 100.0% (allele 0000), next 0.000 Generation 6000 has 2 alleles, max freq 96.6% (allele 0000), next 3.375 Generation 7000 has 4 alleles, max freq 90.3% (allele 0000), next 7.745 Generation 8000 has 4 alleles, max freq 89.1% (allele 0000), next 6.995 Generation 9000 has 7 alleles, max freq 80.2% (allele 0000), next 12.595 Generation 10000 has 5 alleles, max freq 85.9% (allele 0000), next 12.425 Generation 11000 has 5 alleles, max freq 95.0% (allele 0000), next 2.990 Generation 12000 has 7 alleles, max freq 95.9% (allele 0000), next 2.735 Generation 13000 has 6 alleles, max freq 95.4% (allele 0000), next 3.000 Generation 14000 has 5 alleles, max freq 87.4% (allele 0000), next 6.180 Generation 15000 has 4 alleles, max freq 86.5% (allele 0000), next 10.745 Generation 16000 has 4 alleles, max freq 84.5% (allele 0000), next 14.235 Generation 17000 has 5 alleles, max freq 75.6% (allele 0000), next 23.235 Generation 18000 has 4 alleles, max freq 73.8% (allele 0000), next 25.900 Generation 19000 has 5 alleles, max freq 75.4% (allele 0000), next 21.950 Generation 20000 has 7 alleles, max freq 76.6% (allele 0000), next 20.445 Generation 21000 has 4 alleles, max freq 82.5% (allele 0000), next 12.555 Generation 22000 has 5 alleles, max freq 94.3% (allele 0000), next 3.520 Generation 23000 has 6 alleles, max freq 91.4% (allele 0000), next 3.555 Generation 24000 has 5 alleles, max freq 88.5% (allele 0000), next 9.360 Generation 25000 has 6 alleles, max freq 89.0% (allele 0000), next 8.000 Generation 26000 has 3 alleles, max freq 88.3% (allele 0000), next 11.665 Generation 27000 has 6 alleles, max freq 86.5% (allele 0000), next 11.430 Generation 26000 has 3 alleles, max freq 88.3% (allele 0000), next 11.665 Generation 27000 has 6 alleles, max freq 86.5% (allele 0000), next 11.430 Generation 28000 has 4 alleles, max freq 93.0% (allele 0000), next 6.185 Generation 29000 has 3 alleles, max freq 96.7% (allele 0000), next 2.225 Generation 30000 has 3 alleles, max freq 99.3% (allele 0000), next 0.680
Down from all alleles:
code:
Generation 0 has 2000 alleles, max freq 0.1% (allele 1110), next 0.105 Generation 1000 has 39 alleles, max freq 9.8% (allele 1880), next 6.605 Generation 2000 has 23 alleles, max freq 13.9% (allele 0455), next 10.380 Generation 3000 has 22 alleles, max freq 19.1% (allele 0283), next 14.600 Generation 4000 has 15 alleles, max freq 19.6% (allele 0455), next 19.560 Generation 5000 has 12 alleles, max freq 24.6% (allele 0455), next 23.310 Generation 6000 has 11 alleles, max freq 32.2% (allele 0455), next 18.215 Generation 7000 has 11 alleles, max freq 28.9% (allele 0455), next 17.165 Generation 8000 has 13 alleles, max freq 23.6% (allele 1607), next 22.380 Generation 9000 has 10 alleles, max freq 35.4% (allele 1607), next 17.045 Generation 10000 has 10 alleles, max freq 49.7% (allele 1607), next 23.885 Generation 11000 has 10 alleles, max freq 52.3% (allele 1607), next 21.545 Generation 12000 has 13 alleles, max freq 40.6% (allele 1118), next 34.245 Generation 13000 has 13 alleles, max freq 55.6% (allele 1118), next 16.745 Generation 14000 has 11 alleles, max freq 58.6% (allele 1118), next 18.865 Generation 15000 has 6 alleles, max freq 54.8% (allele 1118), next 37.230 Generation 16000 has 3 alleles, max freq 51.0% (allele 0455), next 45.975 Generation 17000 has 5 alleles, max freq 53.5% (allele 1118), next 44.920 Generation 18000 has 5 alleles, max freq 72.5% (allele 1118), next 22.645 Generation 19000 has 5 alleles, max freq 53.3% (allele 1118), next 43.430 Generation 20000 has 5 alleles, max freq 55.4% (allele 0455), next 44.310 Generation 21000 has 7 alleles, max freq 63.1% (allele 1118), next 32.550 Generation 22000 has 6 alleles, max freq 80.1% (allele 1118), next 15.420 Generation 23000 has 7 alleles, max freq 77.5% (allele 1118), next 19.450 Generation 24000 has 4 alleles, max freq 76.3% (allele 1118), next 18.530 Generation 25000 has 7 alleles, max freq 71.7% (allele 1118), next 26.560 Generation 26000 has 3 alleles, max freq 86.4% (allele 1118), next 13.425 Generation 27000 has 7 alleles, max freq 93.0% (allele 1118), next 6.395 Generation 28000 has 4 alleles, max freq 90.6% (allele 1118), next 5.035 Generation 29000 has 6 alleles, max freq 94.1% (allele 1118), next 5.520 Generation 30000 has 4 alleles, max freq 90.5% (allele 1118), next 9.340
So we see that it stabilizes at somewhere over 90%.
Small genes will reduce the amount of heterozygosity, and pre-HGP estimates may have underestimated average diversity because (1) they looked at smaller genes and (2) they looked at functionally important genes (where there was a selective effect). (This may also explain the elephant-seal result; I haven't been able to find the direct article, but if they picked very short genes that already were biased towards a single allele, the bottleneck would have much less clearing to do.)
Overall, I can't see a need to add a completely new mechanism of allele removal simply on the basis of such models.
I'm afraid APL doesn't lend itself very well to message boards, given that it uses a non-ASCII character set.
IP: Logged
|
|
warren_bergerson
Member
Member # 262
|
posted 21. March 2003 09:44
Rex,
More interesting results. The model you are using clearly provides you with the opportunity to quickly simulate and test a wide range of different assumptions. I offer the following general comments.
1. The technology needed to make these models widely available is relatively new. Twenty years ago, the types of simulations you ran in an afternoon would have been difficult to generate and the required computing capacity would have been available to very few individuals.
2. Experience and surprise- When you have experience working with a particular type of model, you develop a ‘sense’ of what results will be produced with a given set of assumptions. However, even with extensive experience you are not infrequently surprised by the results produced. As a specific example, I was at first somewhat surprised by the magnitude of the bottleneck effect. But once you see the impact, you realize it is fully compatible with risk theory concepts. I was also surprised, although again it make sense, that the bottleneck, to have its full effect, must continue for many generations. Finally, there must be a reverse bottleneck effect that produces predictable and measurable results when population sizes increase dramatically.
3. Fitting assumption to results- As with most types of complex simulation models, if you know the ‘desired’ end result, such as a 90% concentration in a single allele, it is usually not particularly difficult to find a set of assumptions that produce that result.
4. Testing and validating assumptions- Validating assumptions is one of the basic techniques used counter the problem of manipulated assumptions. As I am sure you are well aware, validating assumptions involves testing or validating assumption against available information. If, as a simple example, the distribution of alleles in the human population reflects a bottleneck 300,000 years ago, such an assumption could be validated against the expected distribution of allele distributions for the 30,000 genes in the genome. Such an assumption could also be validated against expected differences in allele distributions for different groups of humans separated for different periods of time.
5. Predictions and independent testing- The most rigorous method of validating the assumptions used in complex models involves the scientific method. In simple terms, the assumptions are expressed as formal explicit hypothesis, and independent evaluators are given the opportunity to validate and/or find inconsistencies in the predictions generated by the hypotheses.
You have clearly demonstrated the practicality of developing and running single decrement (mutation) simulation models of genetic change processes. As you have demonstrated, such models generate useful, testable predictions even given the fact that our current level of information is incomplete.
I believe it should be apparent that it should not be difficult, from a programming perspective to modify the single decrement program to a multiple decrement (mutation and selection) or (multiple mutation processes) or (mutation and other types of decrement) model. It would also not be difficult to simulate dynamic or controllable decrement models.
It appears to me based on the results you have generated, that we clearly have the technical ability to develop and test predictive genetic change models. It is not as clear, at least to an outsider, how far the science of population genetics has taken this technological capability.
Have population geneticists developed predictive models? Have they developed validated assumptions for explaining and predicting allele distributions and changes in allele distributions in the human genome? Have predictive genetic change models and hypotheses been made available for independent testing and validation? Are multiple decrement models and dynamic decrement models being developed and tested?
To an outsider like myself, it appears that 1)the technology exists to develop predictive models of genetic change, 2)the science of population genetics stills relies heavily on ‘speculative descriptive models and theories’ of the ‘it could have happened that way’ type.
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 23. March 2003 01:17
You are correct both in that it would be easy to modify the existing program to incorporate selection, different types of mutation, and so on; and in that with complex models it isn't hard to fiddle with your parameters until you attain the desired result.
In order to avoid this pitfall, it is critical to limit the number of free paramters as much as possible--you should constrain them with data from the system in question. In the model that I created, there are essentially zero free parameters, since all of the parameters have been measured (from effective population size to mutation rate and so on). Some parameters have been measured through multiple means.
When you want to start considering more complex systems, though, measuring the parameters gets increasingly difficult. We don't have highly robust estimates of the rates and type of each type of mutation that is observed in the human genome, for instance. The genome project is too new. We can easily account for selection, as long as the selection is just some fixed relative advantage. But what if it is relative to the environment? What if we are supposed to predict the selective effects of a novel mutation? We simply don't know how to proceed in the general case.
There are definitely predictive models--inasmuch as my model is predictive--and there are models that are quite a bit more sophisticated than mine was. However, you very rapidly discover that for the time being, models are best used as a sanity check; the variables aren't well enough constrained for some of the most interesting problems. For instance, I doubt it would be difficult to get models of evolution and fossilization that predicted either gradual evolution or punctuated equilibrium--however, I also doubt that it would be easy to constrain the variables needed to distinguish between the two.
So, yes, population geneticists have developed predictive models. Since I'm not a population geneticist, I can't say for sure how well those models match allele distribution in humans, but I can say that such a study is rather premature right now, as we're still in the process of cataloging human genetic variation. If you look in the literature, you can find such models made available for independent testing and evaluation--that's what peer review is all about. My population genetics text has the output of programs similar to the one I wrote, though, so that level of model is certainly not at all novel.
To an outsider in most fields, the field tends to look as though it relies heavily on "speculative descriptive models and theories" since that's how models and theories are distilled and analogized for the non-expert. When you hear about relativity, you generally hear about everything being relative and clocks moving slow and riding on light waves so on. You don't so frequently get the Lorentz equations spelled out!
Personally, I think almost every biological field could use increased modeling and attention to quantification. However, it is a judgment call of whether what is being done is minimally adequate or not. If you are interested in the field, I suggest that you learn enough about it so that you are at least a highly informed outsider, and make the decision on the basis of that detailed information. [ 23. March 2003, 01:17: Message edited by: Rex Kerr ]
IP: Logged
|
|
warren_bergerson
Member
Member # 262
|
posted 23. March 2003 10:30
Rex,
Quote: the variables aren't well enough constrained for some of the most interesting problems.
It is, of course, at this time only a matter of opinion or judgment, but IMO, once you attempt to create multiple decrement models, you will find there are lots of techniques and lots of information available to define constraints on the variables involved. You may be able to find sets of values for key variables which fit the stable population data, but, I predict, you will not be able to reconcile those stable population assumptions to the assumptions needed to explain observed rates of evolutionary or genetic change. Simply an opinion or prediction at this time, but I think we both agree it is technically possible to evaluate this prediction.
"Interesting" is a subjective term. IMO, the ‘interesting’ changes in biological and genetic systems are not the events which occurred or may have occurred in the distant past, but events which occur on a day to day basis and can be subjected to experimental analysis. I do not agree with the idea that short term stability observed in genetic systems is uninteresting and the idea that the ‘interesting’ part of genetics and evolution are the long term changes. The stability we observe in species is not a normal static state, but rather the result of very active, very powerful, and very measurable forces. Once, I predict, you have identified the forces needed to explain stability, you have eliminated the ability for the system to change. Furthermore, if you eliminate or override the forces which produce stability, you do not get either drift or evolutionary change- you get destructive divergence. Again, IMO, the techniques exist which would make it possible to distinguish between my view of genetic change and the more conventional views.
Quote: To an outsider in most fields, the field tends to look as though it relies heavily on "speculative descriptive models and theories" since that's how models and theories are distilled and analogized for the non-expert.
It is not particularly difficult for an outsider to differentiate between 1)models and theories that appear speculative but have a rigorous underlying mathematical and scientific basis, and 2)models and theories that appear speculative and have no underlying rigorous mathematical or scientific basis.
The argument is population genetics, evolutionary biology, AI and essentially all the life sciences is not that they have rigorous predictive scientific models and theories, but that the causal processes associated with the life sciences are too complex to permit the development of rigorous ‘hard science’ predictive models and theories.
My view is that the technology is becoming available to develop complex deterministic, predictive models and theories of the processes associated with biological systems. It is now technologically feasible to develop predictive deterministic models of genetic change, evolutionary change, developmental processes, information processing in neurons, and my own area of interest human decision making.
IMO, the ‘problems’ are that the technology needed to develop these complex deterministic predictive models 1)did not develop in academia but in business application, 2)the technology is only now becoming widely available to academics, and 3)relatively few academics have either the skills or the experience to apply this developing technology.
Quote: Personally, I think almost every biological field could use increased modeling and attention to quantification. However, it is a judgment call of whether what is being done is minimally adequate or not.
I somewhat doubt that ‘minimum scientific standards’ are a subjective judgment call. If a technology develops making it possible to move to more rigorous standards, then the less rigorous prior standards become scientifically unsound. Science operates on a ‘best available practices’ standard, not an ‘acceptable to the majority’ standard. If as I suggest, the life sciences are currently undergoing a technological revolution, then that revolution will change that standards for the willing and the unwilling.
To get back to the subject of this thread, multiple decrement predictive models are clearly technologically feasible. Techniques for constraining the values of variables in such models, and the practical level of complexity of such models may need to be reviewed and validated, but such procedures are technical in nature. The main point of the discussion here remains the conclusion that multiple decrement predictive models of genetic change are technologically feasible.
IP: Logged
|
|
Frances
Member
Member # 169
|
posted 23. March 2003 12:57
Warren: To get back to the subject of this thread, multiple decrement predictive models are clearly technologically feasible. Techniques for constraining the values of variables in such models, and the practical level of complexity of such models may need to be reviewed and validated, but such procedures are technical in nature. The main point of the discussion here remains the conclusion that multiple decrement predictive models of genetic change are technologically feasible.
If they are argued to be technologically feasible then I would love to see their results to allow us actual comparisson and evaluation. That would seem to be the scientific way to evaluate such claims. Do you have any results that led you to these conclusions Warren?
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 24. March 2003 00:23
Well, I disagree with over half of what Warren's said, for one reason or another, but in the past, discussing issues has not proven particularly fruitful.
So let's get back to modeling. Warren, if you tell me exactly how to constrain the decrements, I'll add the feature to my program.
IP: Logged
|
|
Mesk
Member
Member # 630
|
posted 24. March 2003 01:10
quote: warren_bergerson: Have population geneticists developed predictive models? Have they developed validated assumptions for explaining and predicting allele distributions and changes in allele distributions in the human genome? Have predictive genetic change models and hypotheses been made available for independent testing and validation?
Warren, I suggest that you browse through the archives of the journal Genetics for many excellent examples of mathematical models of population genetic phenomena. This journal published many of the most important practical and theoretical studies of population genetics, and best of all (IMO) it has an archive which provides full-text PDF access back as far as 1988!
I'm currently involved in the rather painful process of teaching myself the basics of modern theories of population genetics, in preparation for an attempt to identify the molecular signature of natural selection which I predict should be present at a particular locus in the human genome. In this attempt I will be leaning heavily on published mathematical models of the behaviour of the human genome over time. These models predict the patterns of polymorphism which should be present in a particular chromosomal region under neutrality (that is, in the absence of natural selection acting on a locus within that region), taking into account the complex interplay of factors such as changes in population size, levels of population admixture, local rates of mutation and recombination, and so on. Deviations from the predictions of these models can indicate the recent action of natural selection, and even provide an idea of the magnitude, form and age of the selective event, and statistical tests have been developed to determine real selective events from fluctuations due to random factors such as drift.
The following are some articles that I would particularly recommend. Firstly, some classic tests based on the predicted behaviour of loci under neutrality, which are still used today to infer the effect of selection and other population genetic factors:
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585-595.
Fu, Y.-X., and Li, W.-H. 1993. Statistical tests of neutrality of mutations. Genetics 133: 693-709.
And some more recent and sophisticated tests:
Kelly, J.K. 1997. A test of neutrality based on interlocus association. Genetics 146: 1197-1206.
Wall, J.D. 1999. Recombination and the power of statistical tests of neutrality. Genetical Research 74: 65-79.
And some explorations of the effects of various forms of natural selection on genetic parameters:
Braverman, J.M., Hudson, R.R., Kaplan, N.L., Langley, C.H., and Stephan, W. 1995. The hitchhiking effect on the site frequency spectrum of DNA polymorphism. Genetics 140: 783-796.
Akashi, H. 1999. Inferring the fitness effects of DNA mutations from polymorphism and divergence data: Statistical power to detect directional selection under stationarity and free recombination. Genetics 151: 221-238.
Innan, H., and Tajima, F. 1999. The effect of selection on the amount of nucleotide variation within and between allelic classes. Genetical Research 73: 15-28.
Galtier, N., Depaulis, F., and Barton, N.H. 2000. Detecting bottlenecks and selective sweeps from DNA sequence polymorphism. Genetics 155: 981-987.
Kim, Y., and Stephan, W. 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160: 765-777.
Przeworski, M. 2002. The signature of positive selection at randomly chosen loci. Genetics 160: 1179-1189.
For a good introductory review to some of the topics discussed in the above articles see:
Otto, S.P. 2000. Detecting the form of selection from DNA sequence data. Trends in Genetics 16(12): 526-529.
This is by no means a definitive list (it's basically the most relevant out of a pile of articles pulled out of a disorganised filing cabinet), but it is a good starting point. If you are genuinely interested in this subject I also have folders stuffed full of articles in which the models and tests outlined in these articles are applied to real genetic data from Drosophila and humans (although most of the articles above did use empirical data sets as well as simulations to validate their models).
I would encourage you to explore the literature yourself as well, as this is the best way to get a real feel for just how heavily some of these models have been tested both by their originators and by competing teams of scientists. There has been some fairly powerful intellectual debate concerning which of the modelling approaches is most applicable to human data, but there now seems to be widespread acceptance that modern techniques are on the right track. Also, the successful use of these models to identify the action of natural selection on genes where independent evidence for such selection exists (e.g. Bamshad et al. 2002. A strong signature of balancing selection in the 5' cis-regulatory region of CCR5. Proceedings of the National Academy of Sciences 99: 10539-10544) has provided empirical validation for this acceptance.
IP: Logged
|
|
warren_bergerson
Member
Member # 262
|
posted 24. March 2003 11:44
Rex,
Quote: So let's get back to modeling. Warren, if you tell me exactly how to constrain the decrements, I'll add the feature to my program.
Sounds like a good idea. Rather than explore the limits or limitations of a single decrement model, it may be useful to explore the information processing potential of multiple decrement and particularly dynamic decrement models.
As a starting point, it may be useful to briefly review some of the concepts of multiple decrement analysis. The starting point, as previously discussed, is a set of elements A1,…An or alleles belonging to a class G of elements called a gene. From the set of elements are defined or constructed a population of individuals where each individual has a the property Ax [or (Ax, Ay) ]defined by the allele or alleles present in the individual. The group of individuals at time t=0 is the population P0. Over a one generation period of time the population P0 transforms to P1.
In modeling the transformation from P0 to P1 we start by randomly selecting an individual from population P0. For simplicity we have assumed the individual has a single allele Ax, but it would be a simple matter to perform the transformation starting with two individuals each with two alleles. For the simplified single individual with a single allele Ax, there are four basic types of transformation or decrement processes which can occur. First, Ax can continue as Ax. Second Ax can transform or mutate to Ay. Third, Ax can duplicate to create multiple individuals with property Ax. Finally, Ax can be terminated or eliminated from the population.
In the type of multiple decrement model being considered here, the first step is to randomly select an individual from the initial population. The various increment and decrement processes are then applied. After applying these processes we could have -one or many individuals to add to the next generation, no individual to add to the next generation, or individual(s) with changed alleles. The process of populating the next generation continues until predefined population size has been achieved.
In multiple decrement modeling, there are two general types of reasons for including an increment/decrement process. First, we add decrement processes to reflect known processes. If for example, we know average litter size is 10 or if we know raw point mutation rates and error correction rates, we might incorporate those processes in the model. [Very often, after appropriate sensitivity testing, we will find that many potential decrement processes do not have a material impact on the process being analyzed, and many factors can be combined into a simplified single decrement or increment factor.] The second reason for adding a decrement process to a model would be as a ‘balance item’ in order to fit the model to observed population changes.
Consider as simple example, a ‘lethal’ allele decrement process. Assume that the gene Ax of gene G plays a role is the assembly of some chemical compound Cz that is essential to the survival of the cell. If the allele Ay is not compatible with the production of Cz, then the presence of Ay will be result in a 100% decrement. Furthermore, this decrement will occur when the presence of Cz is important to survival. If, for example, Cz was essential to a process occurring at age 5, the presence of allele Ay would produce observable measurable mortality at age 5(if you prefer, the trait may be first manifest in week five of the development process).
If the set of likely mutations contains 3000 members, and all but 1 of the 3000 mutations did not produce Cz, then all but one allele would be fatal. Such a decrement process would explain the fact that the population contained only a single allele Ax . The same type of process could limit observed distribution of alleles to 2, 3, or 4 forms. At least IMO, this would be a more likely and a more testable explanation for a limited distribution of alleles than bottlenecks or statistical fluctuations. Note that while lethal allele decrements provide reasonable explanations for restrictions or limitations in the distributions of alleles, these decrements also limit the ability of a system to change or evolve.
In looking at the role of lethal alleles decrements, there are three important questions to be addressed. First, what portion of genes are associated with the production of essential chemicals. Second, what portion of likely alleles associated with a gene are compatible with the production of the essential chemical. Finally, when during the lifetime of the organisms are these essential chemicals first manifest (and thus when does the absence of the essential chemical result in death/decrement.).
Note that in studying the impact of lethal alleles in complex organisms, we do not need to limit ourselves to looking at generation to generation results. The lifetime of a multi-cellular organisms can involves trillions of cells and 100’s or 1000’s of cell generations. This potentially provides a wide range of opportunities to study lethal allele decrements.
The study of lethal allele decrements suggests two possible types of decrement or selection mechanisms. First, traditional theory suggests that decrement is the result of natural selection which means death or inability to reproduce occurring at some point in time after the gene is first manifest. If the ratio of lethal genes is fairly high, this suggests a lot of death and dying. If the a fair portion of the lethal alleles are first manifest after the beginning of the development process, this suggests relatively high cell and embryo mortality rates.
The second possibility is that most fatal alleles are eliminated by error correction processes. This would be selection or decrement by a process other than natural selection. This, if it occurs, would seem to be far more efficient process than high mortality rates.
Lethal allele decrements, and the selective error correction alternative, demonstrate how a multiple decrement model can be used to model the processes which prevent or stop change. Multiple dynamic decrement models can also be used to simulate very high rates of genetic change. But that is a subject for another day.
The discussion here has focused on the ability of multiple decrement models to fit or simulate genetic change processes. The more general use of these models is in reverse engineering biological information processing. Biological information processing can be modeled in terms of ‘database elements’ and ‘operations performed on database elements’. It appears that most major forms of biological information processing, genetic change- neuronal information processing -human decision making can be reduced to database elements with a logical structure similar to populations of genes. It also appears that the same very general types of operations are performed on the different types of biological database elements. Multiple decrement models are thus one type of model that can used to model and simulate all types of biological information processing. Multiple decrement models and techniques can also be used to evaluate the validity of proposed models.
Mesk,
From my personal experience and observation of others, I doubt if you will gain anything but a superficial knowledge of complex modeling from reading articles. A working knowledge of complex modeling comes from lots of hands on experience combined with a strong mathematical training of the appropriate type and a lot of natural interest and ability. That is my opinion based on a lot of years of working with complex simulation models, and working with people who work with complex models.
Population genetics, like all academic areas, generates lots of published articles. Included in the published materials are numerous models purporting to simulate or explain various aspects of genetic change. Are there models or modeling approaches which claim to simulate and predict more than limited features of genetic change processes? Does anyone even make the claim that such models are currently feasible? Have population geneticists abandoned the ‘its too complex to model’ position?
IP: Logged
|
|
Rex Kerr
Member
Member # 632
|
posted 24. March 2003 15:32
I'll let Mesk address your points to him.
I think I know what I need to do to the program to make it compatible with your description, but I don't see the experimental constraints yet. For instance, you say, "If the allele Ay is not compatible with the production of Cz, then the presence of Ay will be result in a 100% decrement." I fully agree, but we have not tested the alleles of most genes. As the earlier limited-population modeling very clearly shows, you would only expect a handful of possible single-mutation alleles to exist at all, so you gain little information from which ones happen to be absent. How do we constrain the lethality of alleles Ay in a realistic way?
Alternatively, can you come up with a hypothesis to test with the model? Obviously if I set some alleles to have a 100% decrement rate, we'll never see them. I don't think that even needs to be tested...so what is the hypothesis? [ 24. March 2003, 15:33: Message edited by: Rex Kerr ]
IP: Logged
|
|
Frances
Member
Member # 169
|
posted 24. March 2003 23:51
Warren: From my personal experience and observation of others, I doubt if you will gain anything but a superficial knowledge of complex modeling from reading articles. A working knowledge of complex modeling comes from lots of hands on experience combined with a strong mathematical training of the appropriate type and a lot of natural interest and ability. That is my opinion based on a lot of years of working with complex simulation models, and working with people who work with complex models.
You seem to forget that the issue is not just acquiring superficial knowledge of complex modeling but rather that these articles may help one understand what science has been able to achieve and what are some of the problems. When one objects to Darwinian models it may be helpful to be familiar with the actual arguments and state of the art of such modeling. I am glad to hear that you have familiarity with complex simulation models and thus I am looking forward to your presentation of the decrement/increment model. So far though it seems not self evident that such models will even be able to do as well as the models shown by others, mainly because of the limited model for beneficial/detrimental genes. When may we expect the results?
IP: Logged
|
|
warren_bergerson
Member
Member # 262
|
posted 25. March 2003 09:46
Rex,
Quote: I think I know what I need to do to the program to make it compatible with your description, but I don't see the experimental constraints yet. For instance, you say, "If the allele Ay is not compatible with the production of Cz, then the presence of Ay will be result in a 100% decrement." I fully agree, but we have not tested the alleles of most genes. As the earlier limited-population modeling very clearly shows, you would only expect a handful of possible single-mutation alleles to exist at all, so you gain little information from which ones happen to be absent. How do we constrain the lethality of alleles Ay in a realistic way?
In the discussion here we are addressing two issues. The first issue involves the shape of the fitness landscape. The second nature of the selection mechanisms. For the moment, lets concentrate on the first issue.
The issue being discussed is two competing assumptions or hypotheses describing the shape of the fitness landscape applicable to the members of the set of common mutations. In your model, this set contains 256 different members. Your model uses the assumption or hypothesis that the same decrement rate is applicable to all 256 members of the set of common alleles. I proposed as an alternative assumption or hypothesis that the decrement rate applicable to most common mutations is 100% and the decrement rate applicable to limited number of commonly observed alleles is 0%.
I think we are in agreement that 1)your model can be used to simulate and test your hypothesis, 2)your model could be modified to simulate and test the hypothesis I propose. I believe we also agree that there exists a known phenomena , what I called ‘lethal alleles’, that can explain the decrement rates included in my hypothesis.
For the sake of discussion here, let’s assume your hypothesis suggests 0% decrement rates to all 256 common mutation and my hypothesis assumes a 0% decrement rate for 6 specified alleles and 100% decrement for the other 250 common mutations. There are obviously many possible intermediate hypotheses, but for the moment we simply want to determine which of the two hypothesis provides a better fit to observed data. Let us also assume that for the gene being considered, there are initially three known alleles.
I assume you are aware that the models representing theses two hypothesis will produce different predictions with respect to the numbers and distributions of ‘rare alleles’ in the population. The math is a bit complicated to do with out actually running some simulations, but your model should predict that rare alleles( alleles other that the 3 known alleles) are 50-100 times more common than would be predicted by my assumptions. Again the math is a bit complicated, but using known population changes and assumptions regarding when different sub-populations diverged, it should be possible to develop practical tests to differentiate between the two hypothesis presented.
Before we begin testing the two hypotheses, it would be useful to run some simulations to see how fast the results predicted by the two models will diverge.
Frances- Your comments raise two interesting issues.
First, complex modeling, simulation, and testing is a resource intense activity. Business applications involving complex modeling often involve large numbers of people and large amounts of money. Data gathering, model construction, and interpreting results are specialized functions often performed by separate groups of specialists.
The discussion here has been addressing the technical feasibility of applying complex multiple decrement models to the analysis of genetic change processes. Once the feasibility of building such simulation models has been addressed, (and I think Rex has shown that it is technically feasible to build such models) then we begin to address the issue of whether these models can be used to define and test competing hypotheses.
If it is technically possible to build multiple decrement genetic change models, and if such models can be used to test competing genetic change theories, then we come to the issue of resources. Even if it is technically possible to test neo-Darwinian concepts with multiple decrement models, would the resources be available to perform such testing? For the discussion here, we can refer to existing studies, but I doubt if either Rex or I have either the ability or the resources to measure the alleles in 1000 elephant seals or 100,000 humans.
The second issue you raise is the old but unresolved issue of the what knowledge already exists in areas like evolutionary biology or population genetics. Directly relating to the specific subject being discussed here I offer two observations. First, because complex multi-decrement modeling is resource intense, if the approach has been or is being used, then it should leave a very large and easily detectable footprint. Second, the discussion here is open to anyone wishing to present specific examples of existing knowledge.
IP: Logged
|
|
brauer
Member
Member # 398
|
posted 25. March 2003 11:13
quote:
First, because complex multi-decrement modeling is resource intense, if the approach has been or is being used, then it should leave a very large and easily detectable footprint.
A minor point: this is hard to ascertain as long as one is averse to looking at (and understanding) the published literature.
Also: aside from its idiosyncratic terminology and simplistic construction, Warren's model does not appear to add anything novel to current simulation-based models of population genetics.
Finally, Warren has adddressed neither of Rex's questions in his last post. To reiterate:
- how does one constrain the values of the model's parameters? and
- what is the specific hypothesis that the "multiple decrement model" will test?
IP: Logged
|
|
|