|
Author
|
Topic: Practical experiments for testing the concept of intelligent design
|
Argon
Member
Member # 276
|
posted 19. April 2004 22:41
Krauze writes: "The assumption seems to be that research-generating abilities are a sign of a good theory, and if ID doesn't have them, then to the rubbish bin with that.
I think we are pretty much in agreement on this topic.
I do think that research-generating abilities is generally a sign of a good theory. But I do not accept that it is a sufficient sign of a good theory. I think it is the 'tightness of linkage' between a theory and a research program that has more weight. Now, this is not a problem unique to 'ID'. There are many instances where the theories of Darwinian evolution cannot be linked to historical phenomena (most historical events, probably, because the relevant historical context is lost). They best one can say in such cases is that the outcome and whatever remaining data we have relating to event 'X' is compatible with theory 'Y', but this neither proves nor disproves a linkage.
Another way of looking at the question is: "What research programs couldn't an ID paradigm generate?". Or "What is incompatible with an ID paradigm"? This *is* difficult because ID is actually a superset of mechanisms that *includes* natural mechanisms. Given that these potential natural mechanisms are already pretty 'floppy' and foggily delimited to begin with makes ID an even floppier proposition.
Personally, I do not disparage ID on the basis of the ability to generate research ideas. But I would hope to see a much tigher linkage between theory and phenomena. In a post long ago, I suggested that investigating more recently 'evolved' systems which are more likely to retain information about their historical contexts would probably be the most fruitful area to pursue.
Terence A-H Tan writes: "Food for thought: Perhaps in the face of uncertainty, we have to go by which theory best explains the prediction. Perhaps in the realm of nature, some entities are better explained by ID concepts, other entities better explained by evolutionary concepts,..."
We already explain many events in terms of ID (anthropology, archaelogy & etc.). The question is how we determine what is 'best'. There are mountains of treatises in this area (not that I would know much about the details - gads, it's dry stuff), but there is no single, reliable metric. This does not mean that the choice is entirely subjective, but the 'grey area' of uncertainty is pretty big. I lean toward favoring possibilities with the simplest, most proximate causes we can find involving mechanisms that we know were available at the time. Sometimes that points to people, other times, not. Still, I cannot say that I have seen many instances in biology where I think there exists enough background information to even consider ID as the 'best' explanation. In most cases, 'no explanation' is the best explanation (abiogenesis and the archaebacterial/eubacterial/eukaryotic split fit well in this category).
IP: Logged
|
|
Krauze
Member
Member # 1119
|
posted 20. April 2004 19:30
Hi Argon,
"Another way of looking at the question is: "What research programs couldn't an ID paradigm generate?". Or "What is incompatible with an ID paradigm"?"
Well, to me the question of "bad design" is hard to reconcile with ID. That is, if I have a system which exhibit some pretty strong signs of being designed, but where a fundemantal part of it (i.e. not something that natural processes could later have added) could have been constructed in a way that improved the entire organism, my view of that structure as being designed would suffer quite a blow. And with that, my own ability to infer design would be compromised.
To take an example: I consider DNA replication to be designed. In that regard, I'm troubled by the fact that DNA polymerase only moves in the 3'-5' direction, meaning that on one strand, it has to continually jump back as the DNA is being unraveled, requiring several other proteins to attach the strings in the correct order. Under a naturalistic paradigm, this doesn't seem to cause any concern. From a popular book about cell biology: quote: "A human being trying to design a solution would probably suggest simply making a second form of DNA polymerase that could read DNA in the opposite direction. After all, there is nothing magic about the directionality. In principle, some alteration of the DNA polymerase should do the trick. Nature's solution to the problem, however, is worthy of an "I Love Lucy" television comedy." Rensberger B., Life Itself: Exploring the Realm of the Living Cell (Oxford: 1996), p. 129
From an ID perspective, I would expect to find either a functional reason for this arrangement, or a reason why the proposed solution is not feasible after all.
PS. In my first post in this thread, I pondered the structure of RNA: "Could there be some good reason why it employs uracil instead of thymine?" Earlier today, while browsing an old thread, I found this article giving an affirmative answer, courtesy of Mike Gene: "Researchers unlock secret of RNA's versatility".
IP: Logged
|
|
David L. Hagen
Member
Member # 323
|
posted 18. September 2004 15:31
Re: Apparent suboptimality” directed search for design goals, objectives & constraints Krause Thanks for your expressing those concerns: “Well, to me the question of ‘bad design’ is hard to reconcile with ID. That is, if I have a system which exhibits some pretty strong signs of being designed, but where a fundamental part of it (i.e., not something that natural processes could later have added) could have been constructed in a way that improved the entire organism, my view of that structure as being designed would suffer quite a blow. And with that, my own ability to infer design would be compromised. . . . From an ID perspective, I would expect to find either a functional reason for this arrangement, or a reason why the proposed solution is not feasible after all.”
May I suggest further parsing your concerns and separating out further possible issues. E.g.,
1 “The problem of evil” or inferred moral character: 1.1 A Designer apparently establishing goals and objectives that appear to be evil 1.2 A Designer apparently not willing overcome evil
2 The “problem of misunderstood Goodness” Converse to the “problem of evil,” we may also need to consider that there is a “problem of misunderstood Goodness.” I.e., we may not understand: 2.1 Designs or constraints necessary to constrain evil. 2.2 Consequences imposed because of disobedience. 2.3 Moral tests of character.
3 “The problem of underachievement” or inferred inability: In addition to the moral issues, there is the potential “problem of underachievement” or inferring inability to the Designer. E.g., a Designer who is apparently: 3.1 Incapable of designing to perceived optimal design 3.2 Unwilling to spend the effort to design to a perceived optimal design 3.3 Unable to implement a perceived optimal design
I will interpret your comment on “bad design” as referring to “underachievement” rather than the first moral problems of “evil” or “misunderstood Goodness.” I will presume that the “problem of evil” and “problem of misunderstood Goodness” will be addressed elsewhere, but they must be clearly understood and potentially addressed in the overall context when such questions arise.
For this thread, I will presume we will address identifying design issues due the “Problem of underachievement” (2). This requires clearly parsing questions to their components and being ready to distinguish accusations over the “Problem of evil” (1) or “Misunderstood Goodness” (2).
4 “The problem of observation” or the “problem of the observer’s inability”: I believe concerns over the “Problem of underachievement” may also indicate problems of the knowledge or ability of the observer. I.e., 4.1 Insufficient knowledge of the original design goals, objectives and constraints. 4.1.1 Lack of communication with the Intelligent Designer. 4.1.2 Limited ability to recover the original design goals and objectives from independent “objective” examination or “reengineering.” 4.1.3 Insufficient knowledge of the constraints on means of establishing natural laws and methods to sustain such solutions. 4.2 Inability to comprehend the original design objectives and constraints. 4.2.1 Design for conditions different from the present. 4.2.2 Design systems capable of accommodating changing environments 4.3 Inability to compute the desired optimal designs.
I think problems in this area primarily arise where there is a complex design problem with numerous weighted objectives and constraints. If we only identify one or a few of these objectives and constraints, then we may develop designs that satisfy this smaller set of objectives and constraints. If we then evaluate a design satisfying the full criteria set, it may appear “sub-optimal” according to the identified subset of criteria and other designs that better satisfy those subsets of objectives and constraints.
In particular, 4.2.2 suggests that the original design may have specifically provided for accommodating and adapting to changing environments while preserving the primary design envelope. E.g., explicitly enabling a wide range of “microevolution” while preserving the broader design envelope (e.g., a species or appropriate classification) thus preventing “macroevolution.”
5 “Apparent suboptimality” directed search for design goals, objectives & constraints I believe we can turn such perceptions around to establish a positive design methodology of:
Presuming the option of an Intelligent Designer; If an existing system appears to be suboptimally designed, Then that appearance indicates research areas to: 5.1 Identify higher design goal(s) 5.2 Identify further design objectives 5.3 Identify further design constraints. 5.4 Identify computational limits to design and optimization
6 Establish an Intelligent Design Methodology This raises the need to establish an overall design methodology, identifying the goals, objectives, design principles and constraints. In particular, may I propose a design goal and principles: 6.1 Design Goal 6.1.1 Design and form human beings capable of intelligent design 6.1.2 Design and form a universe in which they can design and build. 6.2 Design Principles 6.2.1 Preserve the complex specified information or original design 6.2.2 Enable a high degree of complexity 6.2.3 Achieve this with a high degree of simplicity 6.2.4 Reduce or minimize the energy and material flows required 6.2.5 Provide sufficient and preferably optimal flows of energy and materials needed {I have been thinking through an overall design methodology and will try to set up a separate thread to address this.}
7 DNA Replication Design Research Thus on the issue of the method of DNA replication, may I encourage you to begin to lay out and apply such a design methodology. In particular, may I encourage you to further explore the implications of 6.2.1 and 6.2.2. From an observer’s perspective of an engineer (not a specialist in biochemistry), may I suggest that there may be design issues to examine that include: 7.1 Proofing the DNA strand copy 7.2 Repairing errors in duplication 7.3 Synchronizing dual strand duplication 7.4 Enabling meiosis including forming the “Holliday Junction” 7.5 Enabling “gene conversion” from one allele to another 7.6 Addressing potentials for strand torsion, kinking, knotting etc. 7.7 Connecting microtubules to efficiently transport materials & assembled components 7.8 Forming motion micromachines to efficiently transport assembled components 7.9 Locating mitochondria to efficiently provide energy flows
8 Design Hypothesis & Testable Design Requirements Having identified such design issues, we can then put forward a design hypothesis to explain the observed phenomena. By developing such design methodologies, they could then be used to identify unusual features with narrow design requirements. The probability of achieving these designs by (random) natural causes can then be evaluated to distinguish ID vs evolution.
9 Predictive theory This can hopefully further be developed into predictive methods to forecast and search for similar phenomena. These then provide a basis to evaluate the probabilities of achieving that phenomena by natural causes vs by design.
In summary, separate out the issues of design vs morality in “bad design”, formulate a design hypothesis, then develop testable experiments to distinguish ID vs natural causes.
IP: Logged
|
|
David L. Hagen
Member
Member # 323
|
posted 04. October 2004 12:11
Krause said: quote: "I consider DNA replication to be designed. In that regard, I'm troubled by the fact that DNA polymerase only moves in the 3'-5' direction, meaning that on one strand, it has to continually jump back as the DNA is being unraveled, requiring several other proteins to attach the strings in the correct order. Under a naturalistic paradigm, this doesn't seem to cause any concern."
I stated above the Design principle that there is probably a good reason for the observed mechanism.
One explanation for this exclusive polymerase movement direction is to increase the accuracy of DNA replication:
quote: "Only DNA Replication in the 5'-to-3' Direction Allows Efficient Error Correction
The need for accuracy probably explains why DNA replication ocurs only in the 5'-to-3' direction. If there were a DNA polymerase that added deoxyribonucleoside triphosphates in the 3'-to-5' direction, the growing 5'-chain end, rather than the incoming mononucleotide, would carry the activating triphosphate. In this case, the mistakes in polymerization could not be simply hydrolized away, because the bare 5'-chain end thus created would immediately terminate DNA synthesis (Figure 5-11). It is therefore much easier to correct a mismatched base that has just been added to the 3' end than one that has just been added to the 5' end of a DNA chain. Although the mechanism for DNA replication (see Figure 5-8) seems at first sight much more complex than the incorrect mechanism depicted earlie in Figure 5-7, it is much more accurate because all DNA synthesis occurs in the 5'-to-3' direction."
See: Alberts, Johnson, Lewis, Raff, Roberts & Walter; The Molecular Biology of the Cell 4th ed. 2002 p 242-243.
IP: Logged
|
|
Bruce
Member
Member # 1284
|
posted 25. October 2004 14:52
This post touches on contributions by Micah Sparacio, David L. Hagen and Terrence Tan.
Bioinformatics - a 'cheap' road to experimental results that generate interest - no test tubes required (initially)...
Hi All,
It goes without saying that there is enormous interest in the scientific community and the wider community about the sequencing of the human genome. Interest is also exploding in the area of cataloguing and analysing functional parts of the genomes of humans and other species (e.g. insects such as Drosophila, bacteria etc.) pursuant to identifying the genetic characteristics of DNA codes for diseases, species differentiation, etc.
As someone with several years experience in the field of software engineering, programming and database development, I can see the potential for enormous confusion in terms of conflicting standards and methods for computational biology, DNA sequence analysis etc. With teams of bioinformaticists and computational biochemists working hand over fist to investigate gene expression processes and functional modules in the genome, it has in fact been noted* that there is a lack of consensus at the methodological and laboratory method level as to what is a 'competent' computional/in-silica experimental 'practice', and that results can be difficult to reproduce.
The positive side to this at the moment may be that scientific endeavor is unhindered by methodological strictures that may stifle creativity and inquiry. However, in the future it is possible that a centralized attempt will be made to universally standardize methodologies and in-silica and other laboratory procedures for the identification and analysis of functional and special codes in genomes (if this is not happening already). The question is - what will be the underlying rules and assumptions driving the formulation of the standards? Why not get the ball rolling for a methodology incorporating observation of ID principles such as CSI? It has been noted that methology/systematising is a hard ball to get rolling. However, the lucid observation put forward by a number of contributors is that a cheap experiment is required to get some meaningful results to drive interest. Bioinformatics seems perfect for this.
I (in concert with other contributors) am suggesting a ‘suck it and see’ approach here. My vocation is not in genetics and my mathematics is not a scratch on that of most other contributors to these forums. Frankly, I have developed something of a headache trying to read through all of the contributions about why an experiment based on ID theory may or may not work from a philosophical basis or an epistemological basis – or because it a misplaced paradigm or whatever, mostly because I do not understand the arguments. In any case, I agree with member 234 (Tan) and Micah Sparacio (and others), that it would be great to actually give an ID/CSI based experiment a run, and see if the results are practically useful, and if they compare favorably with existing approaches, regardless of what the theoretical premises of the other approaches are. (Apologies to those who have been working hard at the Mesa prog. I have not got the facilities nor had the time to look into it properly.) A H Tan suggested looking for microRNAs in blood plasma. A great idea I think. But why go down the in-vitro or in-vivo path when hundreds of existing researchers are producing what seem to be meaningful results in-silica. I mean, why bother with all that clinky laboratory hardware and equipment when you can just analyse the DNA data that someone has already kindly assayed and captured etc. (a very ‘software’ thing to say)
The RUBBER to the ROAD part (of this post): In recent papers ( see New and Features Archive) researchers have used various algorithms (e.g. Ahab and Stubb) to identify and analyse cis-Regulatory modules. As I remember it, Ahab (Rajewsky et al.) refers to and programmatically relies on 2nd law thermodynamics and evolutionary conservation constraints, as do some of the other algorithms. Other research teams lauded Ahab because of its thermodynamics based method of comparing binding energies, and for avoiding reliance on transcription factor weight matrices#. Ergo, clearly the nuts and bolts of current computational biology do rely directly on programmatic implementation and application of existing well known theories, and get meaningful results. To try out ID for computational cis-Regulatory module analysis, it seems logical to build an algorithm that is based on searching for CSI in the genome, and testing to see if the ‘hits’ on the gene sequence do reveal functional modules and special sites (I'll leave the details to someone who can do math.) Would the results be more accurate than those attained with existing algorithms? I’d like to know.
If productive work could be done by intelligent design theorists in the field of bioinformatics, with a cohesive and well appointed standard methodology that incorporates the design hypothesis , and real productive results were produced, along the lines of say - finding more cis-Regulatory modules or microRNA binding codes faster and with greater accuracy than competing methodologies and algorithms. This may be an effective way of applying, and proving the practical veracity of, the informational tenets of the design hypothesis quickly and relatively cheaply. Basically (with some obvious oversimplification):
1. Develop design hypothesis/CSI informational rules for analysing gene data. 2. Build a virtual lab methodology and algorithms around these rules 3. Get a large amount of clean and reliable gene data 4. Analyse it with the ID-CSI methodology and rules (algorthms etc.) – look for cis Regulatory modules, investigate microRNA binding sites etc. etc. 5. Check the results with independent methods etc. (which should be easier to warrant if good results are obtained.)
Example Objective: Find and identify the function of more cis-Regulatory modules(, or microRNAs or whatever,) faster and more accurately, producing a tangible benefit in genetic medicine or the like.
I’m sure there are a million both ‘good’ and extremely complicated reasons why NOT to do this. This is not a good rationale for not doing it. Lets see if it flies or dies and then criticize it afterwards It seems to me that the spirit of scientific experiment should involve a healthy disdain for analysis paralysis.
Question: Would an algorithm based on CIS would be much more difficult to implement programmatically than one based on random number generators and binding site energy formulae etc.?
*(see 'Getting the noise out of Gene Arrays' by eliot marshall Science, Vol 306, Issue 5696, 630-631 , 22 October 2004) Also note the number of different approaches to computational analysis of coding modules (cis-Regulatory modules) in the ISCID News and Features archive of late (e.g. Ahab algrotihm, STUBB algorithm.) Some employ interspecies evo. correlation , others don't and so on... # View Stub algorithm developers paper...
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 25. October 2004 15:53
B. Long asked quote: Question: Would an algorithm based on CIS would be much more difficult to implement programmatically than one based on random number generators and binding site energy formulae etc.?
Yup, it would a whole lot more difficult. There's no algorithmic way to mechanize the identification of the "specification" part of "complex specified information" that I have seen. Without the "S" of CSI, one merely has (im)probability calculated as though an object were a discrete combinatorial object.
RBH [ 25. October 2004, 15:54: Message edited by: RBH ]
IP: Logged
|
|
Scott
Member
Member # 1222
|
posted 25. October 2004 16:59
quote: There's no algorithmic way to mechanize the identification of the "specification" part of "complex specified information" that I have seen.
I think all this means is that there is no (or perhaps I should say can be no) global CSI identification algorithm.
So we would be faced with developing specific algorithms and attempting to generalize cases of CSI to develop more encompassing algorithms. Could be an interesting area for ID research.
IP: Logged
|
|
Bruce
Member
Member # 1284
|
posted 26. October 2004 07:19
(Changed monika to be more friendly...)
Sorry for the delay coming back. I got distracted.
quote: quote: -------------------------------------------------------------------------------- There's no algorithmic way to mechanize the identification of the "specification" part of "complex specified information" that I have seen. --------------------------------------------------------------------------------
...there is no (or perhaps I should say can be no) global CSI identification algorithm...
So we would be faced with developing specific algorithms and attempting to generalize cases of CSI to develop more encompassing algorithms. Could be an interesting area for ID research.
In that case, I guess that this could be an important undertaking. I know from the limited bit of mathematical programming I have done (mandlebrots and financial equetions etc.) that it's not necessarily a very long trip from a known formula to an iterative algorithm that can be implemented in code. Then it's a case of pointing it at some data, tuning parameters and testing.
Questions (Sorry - I'm catching up a bit):
1. Has anyone made an effort to implement a CSI search algorithmically? 2. Is there a formula in Bill Dembskis latest paper (heavy going for me) that is suitable? 3. Can someone briefly explain the objective of Mesa? Is that related to this problem?
From what I do understand of the specified part of CSI, it may be possible to develop application specific 'specification detection/search' in code by using dictionaries. For example, you could analyse a page of letters by using an English dictionary to get a hit rate for specified words (according to English language rules) Perhaps what is already known (the rules) about what represents something functional, or potentially functional in a genome sequence (e.g. a cis-Regulatory module) could be used to develop such a dictionary for CSI searching in DNA sequences.
Another example : Maybe knowledge about the chirality, shape, binding affinities, constituent molecules etc of microRNAs, could be used to build a dictionary/rules to give a probability weighting to what in the DNA sequence is likely to be specified. The algorithm could search first for complex information, and then (inner loop?) test for specified genomic information according to the rules in the dictionary.
Much later, a bit of AI could be added to strengthen the algorithm's 'genome segment specification assessment'.
I'm rather getting ahead of myself here, but some pseudocode could look like:
##start pseudocode While not 3PrimeTelomereReached do ( RipDNASequenceSegment(); #overlapping/shift would be required in practice if SegmentIsComplex() then ( ComplexityScore = CalculateSegmentComplexityScore() if ComplexityScoreSignificant() then ( #Inner Specification Assessment Loop StoreSegmentForLaterSpecificationAssessment() OR PerformSegmentSpecificationAssessment() ) ) else (ShifttoNextSegment) )
# Procedure which uses the dictionary and domain rules for determining specified nature of information - the genome segment in this case. This gets called in the inner loop above PerformSegmentSpecificationAssessment() ( if CompareSegmenttocisRegulatoryRules() then ( ScoreForcisRegulatoryModule() ) if CompareSegmenttomicroRNARules() then ( ScoreFormicroRNAModule() ) # and so on for different specific searches/applications )
#So what would ScoreForcisRegulatoryModule() look like exactly?
ScoreForcisRegulatoryModule() ( while not endOfDictionary ( # Compare the current segment to everything that we know has meaning - is specified for a task if KnownFunctionalPatternComparison() ( if ScoreOtherSpecificationAssessmentRules()>50 then ( RegisterCSIcisRegulatorySearchHIT() ) ) )
##end pseudocode
Sorry my pseudocode is a bit of a 'mutant', but it has a 'specific purpose'
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 27. October 2004 14:35
Bruce summarized quote: The algorithm could search first for complex information, and then (inner loop?) test for specified genomic information according to the rules in the dictionary.
Bear in mind that for Dembski, "complex" just means "improbable", period. Any 500 bp sequence is exactly as "complex" as any other 500 bp sequence. Any bridge hand is exactly as "complex" as any other bridge hand. It's trivially easy to invent a set of variables ("chirality, shape, binding affinities, constituent molecules etc of microRNAs") that would yield improbabilities that would be called "complex" on Dembski's definition.
There are two difficulties that arise with the proposed procedure. One difficullty with Dembski's probability calculations (that treat a sequence as a discrete combinatorial object) is the degeneracy of the genetic code. Multiple sequences can code for the same protein. The numerator of the probability ("complexity") calculation increases with degeneracy and hence "complexity" decreases. So, in order to even calculate Dembskian "complexity" one must take into account the degeneracy of the code.
So the real work must done by specification. Bruce suggested quote: Perhaps what is already known (the rules) about what represents something functional, or potentially functional in a genome sequence (e.g. a cis-Regulatory module) could be used to develop such a dictionary for CSI searching in DNA sequences.
What one would find with such a procedure is functionally similar sequences -- gene families. But we already know a naturalistic mechanism for producing gene families: gene duplication. The "complexity" (improbability) of a specification hit (a sequence pattern that matches a known functional sequence) goes to (near) 1.0 when we know a naturalistic mechanism for producing such similarities, so finding a new sequence that is structurally similar to a known sequence does not detect complex specified information. The very pattern-matching procedure Bruce proposes precludes finding complex specified information.
RBH
IP: Logged
|
|
Scott
Member
Member # 1222
|
posted 27. October 2004 18:27
One of RBH's objections seems to assume that all regions of DNA are coding regions. Why should degeneracy be an issue if we are not interested in whether or not the sequence is an amino acid coding sequence? Certainly that may be a second-level concern but hardly a primary concern or a show-stopper.
Another objection seems to be that all sequences are equally improbable and thus equally complex. That may be true, but again, is it a show-stopper?
What is the minimum sequence length needed to meet Dembski's UPB? Given a sequence of that length, would we be able to distinguish that sequence from a random sequence of the same length? Would that be the first test for CSI?
It seems to me that what we would be doing is developing pattern recognition software in which the patterns are not (all) specified in advance but in which the pattern must meet some criteria.
Let's say we had a sequence in which all of the nucleotides were the same. How long would that sequence need to be before we could infer that it was not just a random distribution of nucleotides? Would the length of the entire genome be relevant?
What would be quite interesting and maybe even of greater value to ID would be if we could make some predictions about what we think we should or will find.
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 28. October 2004 01:07
Scott wrote quote: One of RBH's objections seems to assume that all regions of DNA are coding regions. Why should degeneracy be an issue if we are not interested in whether or not the sequence is an amino acid coding sequence? Certainly that may be a second-level concern but hardly a primary concern or a show-stopper.
No, that's not my concern: it was implicit in Bruce's description: quote: Perhaps what is already known (the rules) about what represents something functional, or potentially functional in a genome sequence (e.g. a cis-Regulatory module) could be used to develop such a dictionary for CSI searching in DNA sequences.
Scott further wrote quote: What is the minimum sequence length needed to meet Dembski's UPB? Given a sequence of that length, would we be able to distinguish that sequence from a random sequence of the same length? Would that be the first test for CSI?
With a 4-base code, a sequence just 249 bases long is equivalent to about 10^150 combinations, and 1 in 10^150 is one version of the UPB. So any sequence of 250 bases is less probable (on the assumptions of random assembly and a uniform PDF) than the UPB. As I understand it, the task set for Bruce'algorithm design is not to distinguish functional sequences from non-coding sequences. As far as I can tell, he is suggesting that we use criteria we know to be associated with functional coding sequences to find new functional coding sequences.
Recall also that "degeneracy" means that there are 'synonyms' for coding sequences. Different sequences can code for the same protein. As a consequence, the calculation of the probability that a given function (the specification) is generated by a single sequence is in general false, and therefore the calculation of the probability associated with "complex specified information" assocaited with the functional specification is problematic. If two sequences can perform a given function, then the probability that the specification will be met is twice what it would be if only one performed it. So in calculating "complexity," the degeneracy must come into play.
Scott suggested quote: Let's say we had a sequence in which all of the nucleotides were the same. How long would that sequence need to be before we could infer that it was not just a random distribution of nucleotides?
There are sequences of repeated elements -- the repeated Alu sequences in primates are an example. There are 500,000 to 1,000,000 such sequences in primate genomes, each sequence roughly 300 base pairs long. They therefore exceed the UPB. Do we infer design on that account?
RBH
IP: Logged
|
|
Bruce
Member
Member # 1284
|
posted 28. October 2004 02:50
Firstly, thanks to both of you for your replies. At least I 'feel' like I didn't waste my time, which lessens any potential embarassment at being behind the curve and so-on...
RBH wrote: quote: Bear in mind that for Dembski, "complex" just means "improbable", period. Any 500 bp sequence is exactly as "complex" as any other 500 bp sequence. Any bridge hand is exactly as "complex" as any other bridge hand. It's trivially easy to invent a set of variables ("chirality, shape, binding affinities, constituent molecules etc of microRNAs") that would yield improbabilities that would be called "complex" on Dembski's definition.
Granted, complex for Dr. Dembski does mean improbable. Which probably makes the outer loop mostly redundant as an identifier of specified information - or cisRegulatory modules in the gene. However, it is not meaningless, if only to identify anomalies like AATGTAAGGAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Which, as I understand it, would indicate that the dear overworked researcher entering the seqence data has fallen asleep with his finger on the 'A' key So lets call the outer loop error checking for the moment. At least it is required to select the next shifted sequence/frame in the gene. So we'll leave it in the 'spewed'-o-code.
"Probabilities by themselves,however, are not information measures...probabilities are an inconventient way to measure information"(Dembski "Intelligent Design" p 155)
Is any 500 bp sequence really just as improbable as any other sequence? Perhaps not if you take into account the rules associated with the function of gene sequences. Here our dictionary comes into play. Of course, if we only exactly match known functional sequences we would be achieving nothing (ta RBH):
quote: The "complexity" (improbability) of a specification hit (a sequence pattern that matches a known functional sequence) goes to (near) 1.0 when we know a naturalistic mechanism for producing such similarities, so finding a new sequence that is structurally similar to a known sequence does not detect complex specified information.
So, as Scott has suggested, the rules must be predictive and smart:
quote: It seems to me that what we would be doing is developing pattern recognition software in which the patterns are not (all) specified in advance but in which the pattern must meet some criteria.
The key is that a bit of forecasting and ultimately AI will be required in some kind of pattern recognition. It will have to be a smart pattern recognition, based partly or wholly on CSI formulae - on CSI 'smarts'.
Obviously we do not want to be circularly inductive here - we don't just want a garbage-in-garbage-out system which finds only what we put into it and tells us what we want to know. See Dembski:
"It follows, therefore, that how we measure information needs to be independent of whatever procedure we use to inviduate the possibilities under consideration. The way to do this is not simply to count possibilities, but to assign probabilities to these possibilities" (ibid., P. 154)
ergo - the dictionary of independently established specification rules for binding sites - based on CSI - and especially Dembski specified information smarts. Furthermore - not just a dictionary of stored rules, but some CSI based smarts for generating rules and making 'best guess' estimates etc.
For, as RBH has astutely noted:
quote: What one would find with such a procedure is functionally similar sequences -- gene families. But we already know a naturalistic mechanism for producing gene families: gene duplication.
quote: The very pattern-matching procedure Bruce proposes precludes finding complex specified information.
I think these statements are profound - and key to the meaning of the entire undertaking. However, does it matter where the existing data about gene functional groups comes from in terms of detection, and does it matter that researchers have already 'labelled' the process of gene duplication naturalistic? Remember - some of the most putatively successful existing algorithms (Ahab) apparently gives priority to binding site energy output estimates based on the 2nd law of thermodynamics/entropy, while with 'micro-'evolutionary 'conservation' is observed only in part. I posit that their actual code is based on programmatic mechanisms not necessarily totally dependent on assumptions of the chance accumulation of elected favourable mutations - even if the results are driven and interpretted evolutionarily. Keep in mind that one of the challenges of ID is that the efficacy of the evolutionary/chance explanation for design is cirularly inferred also. Simulation and bioinformatic analysis seems a great place to investigate this, because in bioinformatics information analysis is everything. Now obviously, adaption and mutation (micro evolution) are experimentally demonstrable. However, the real question is 'why' does the process - why do the 'patterns' exist in the first place? Is specification only 'apparent' specification as a result of massive accumulated natural selective events - or is the probability against this prohibitive - and is a design explanation more mathematically reaslistic? I have gone a little tangential here but my point is that - no - I don't think the pattern matching procedure precludes finding specified complex information (if it is CSI smart), because, among other things, this assertion would have to be true for existing algorithms that assert chance-cumulative complex information based on pattern matching and extrapolation from existing results. (that RBH's response is to a pattern matching approach without CSI smarts is duely noted.)
Logically, I don't think it matters how the knowledge about an existing functional microRNA target site was found - whether it was stumbled across or detected the old fashioned hard way using gene splicing, radiactive marking and centrifuging etc., or identified by an entropy based 'binding site enrgy' algorithm - what matters is whether the CSI algorithm (or any of the other algorithms for that matter) is powerful and produces new and otherwise unforeseen results.
Remember - the objective is to determine if such a CSI search algorithm is just as, or -especially - substantially more accurate and powerful than other search algorithms.
I am not mathematically smart enough to figure out up front whether the algorithm is worthwhile, but I think it would be hard to validate that the algorithm was based on some kind of logical 'sleight of hand' or circularity, iff it delivered results markedly more accurate - or even just as accurate - as existing algorithms, which results could then be empirically verified in-vitro.
So I guess it would be a kind of experimental proof by contradiction or contra-indication.
Just to provide some kind of systematic cross-reference for this, I can suggest a way to get around the question of existing 'naturalistic' results invalidating a new approach. On my desk I have a superb textbook from the very brilliant and very evolutionarilly oriented people at Cold Spring Harbour. It is an exceptional text with the best explanation of the history of the discoveries of genetics, and the best explanation of how gene interactions and synthesis etc. work, that I have ever seen. (DNA Science, Micklos and Freyer. Cold Spring Harbour Laboratory Press) The first chapter of the book has a section devoted to Mendel's crosses - as any good DNA science book would. It then goes on down the evolutionary pathway (understandably.) However, in reading it I noticed something very important. Many of the actual laboratory experiements described - involving tagging of proteins and DNA fragents with radioactive isotopes like phosphorus 3 etc. etc. to cleverly determine the way that genes combine and split and so on - have no functional dependence on the assumption of evolutionary processes being responsible for their results at all. Now, granted, they are not always looking for specifically evolutionary evidence - usually they just want to know 'how' something is happening at the micro level. However - if a CSI based algorithm could be fed the raw experimental data - and then generate meaningful predictive information that is experimentally and empirically real and useful - then I think we might have more confidence in such an algorithm?
SO
A good test for the veracity of the whole undertaking might be to adapt the algorithm to focus on the results of many of these early experiments - even on the underlying genetics of Mendel's crosses themselves (which certainly weren't based on the premise of chance mutations etc.) - and see what happens. We could take a raw gene sequence, with no data from or rules based on any previous algorithmic search results, and see if our approach of looking for CSI based on these first experimental results turns up
A. Accurate results B. Anything new and experimentally validatable
Sorry it's long winded - and thanks again to the sharp insight of both RBH and Scott.
Best,
Bruce
Moderator Edit - Spaces added to long string [ 29. October 2004, 12:45: Message edited by: Moderator ]
IP: Logged
|
|
Bruce
Member
Member # 1284
|
posted 28. October 2004 07:12
I had another thought. Perhaps it would not be necessary to defer to pattern matching.
Strategy 1: Repeating Sequences For example, what about searching the genome for repeating very similar sequences/segments? If the same sequence, or sequences very similar to each other, of reasonable length, appears enough times in a given genome that the probability of it happening that often is remote relative to the size of the DNA sequence, then we could be talking about some specified information. The task would then be, having decided that it might be functionally important because it has a high score for specified information, an effort could be made to look for a matching sequence in microRNA or RNA etc. This wouldn't necessarily tell you what it does - but it may certainly indicate functional significance. This would be similar to looking for message fragments in noisy signals. Even if you didn't speak English, didn't have an english dictionary, and were searching the following noisy signal:
Howpqeoirnvpintelligent134[0=9gnm2509is0 [9134=09efbmtri[0bt-9intelligent4rt-0-42t0m4t9ju24tdesign2= 03459jg0tisrg4trgb24tthat2tbthe2t4t2t4h4question. 245ythrIs24t4t2t4bt 4there2gtg4a2tb4tbbtde signingrgwrgbrgbwrgbintell igencewbwrgbwrtbatwtrb wrtbwrbwrwork246y5ynhtnin325 =09et0$#ggthe35-089enbn4@ $Tuniverse,23-54-mpowror325t24tis2524g@#%G4wgrtintelligencel ;kjwenr9835gan325t25 t245gb%^anthropomorphisation ...
You would certainly be taking a lot of notice of the string 'intelligen' based on proximity and density.
So - being very basic indeed genome wise, if we pretend that ATTGGCT appears with high frequeny and in close proximity in a DNA seqence
Strategy 2: Searching for symmetry
Similarly - if the following turns up even once in a genome, you are going to take notice on the basis of the symmetry alone - let alone your understanding of the importance of stereochemical chiral enantiomers like some amino acids etc. (yes - it's imaginary - as far as I know):
ATGCCGTCATTACTGCCGTA
The proximity and sameness of the reverse sequence TACTGCCGTA following immediately after the ATGCCGTCAT is a specified information bell ringer. Maybe there is also a good chance that it means there is a strange pair of stereochemical molecular enantiomers somewhere that might match this. However, none of this knowledge of chirality is required - just the fact that the sequence raises a specified information flag based on probability and proximity makes it significant to the CSI search algorithm.
Ergo - pattern matching to known sequences found otherwise is thus not necessary.
Moderator Edit to correct formatting issue - spaces added to long string [ 29. October 2004, 12:43: Message edited by: Moderator ]
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 29. October 2004 11:44
Bruce,
I think if you take the square brackets out of your example, it won't screw up the screen formatting as it now does.
RBH
IP: Logged
|
|
RBH
Member
Member # 380
|
posted 29. October 2004 17:15
Bruce wrote quote: For example, what about searching the genome for repeating very similar sequences/segments? If the same sequence, or sequences very similar to each other, of reasonable length, appears enough times in a given genome that the probability of it happening that often is remote relative to the size of the DNA sequence, then we could be talking about some specified information. The task would then be, having decided that it might be functionally important because it has a high score for specified information, an effort could be made to look for a matching sequence in microRNA or RNA etc.
What justifies the shift from "very similar sequences/segments" to "specified information"? If you detect a whole slew of Alu segments in a primate genome, what does that tell you about "specified complexity"? Remember, "specified complexity" has a quite specific meaning: a highly improbable (= "complex") object that matches an independently given pattern (= specified). By finding multiple instances in a context (genome) where we already know about how duplicates of segments come into being, the "complex" part is immediately lost: duplicate sequences are a high probability phenomenon in genomes. By pattern matching without an independent specification, the "specified" part is thrown away. The "specification" comes from the very phenomenon (genetic strings) that the alleged event that is specified comes from: there's no independent specification.
The example (finding stuff like "intelligen" in an otherwise random-appearing string) does not represent the procedure described in the quote above. While the exercise as described might be interesting and even useful in finding new members of gene families, it would tell us zippo about "specified complexity." And, btw, a friend of mine in bioinformatics tells me it's already in use in a somewhat different form.
The second strategy is reminiscent of the Bible Code stuff. Given the enormous number of base pairs in a decent-sized genome, and given that a slew of them are apparently non-coding and therefore not under selective control for persistence, I suspect one can massage the sequences to find just about anything one cares to look for along those lines. I'm not against fishing expeditions in science -- I've conducted a few myself -- but one has to be acutely aware that not all fish are real in a statistical universe like that represented by a genome dominated by non-coding sequences. And again, it wouldn't raise a specified complexity flag, merely a potential complexity (improbability) flag that would have to first be assessed against the Bible Code style of search. Is there a Moby Dick control around anywhere?
RBH
IP: Logged
|
|
|