|
Author
|
Topic: The CSI Bit String puzzle
|
andyg
Member
Member # 415
|
posted 12. September 2004 15:44
Here are four bit strings, where the bits are taken from an ASCII keyboard.
1.
atactgcgtcataagtaatctctattagacaaagatttcattacctgttggcatattgca aaaataacaccaatacggaatcgtcatgttcacgattaaaacagatgatctcacccatcc agcagtgcaagcattagtggcttaccatatttccggcatgctgcagcagtctccccctga aagcagtcatgctttagacgtgcaaaaattacgtaacccgacagtgacattctggtcagt atgggaaggcgaacaactcgcaggaattggtgcgctgaagttgctggatgataaacatgg cgaactgaaatcaatgcggaccgcgccaaattatttacgtcggggtgtcgccagtctgat tttacgccacattttgcaggtcgcccatgacagatgccttcatcgcctgagtttagaaac gggtacacaggctggatttacggcctgccatcaactttatttgaagcatggtttcgttga ttgcgaaccgtttgccgattatcaacttgatccacacagtcgatttttgtcattgacgct atgcgaagataatgagttgctttgagccagacgcagcacattcttgcattcgacgtgctg cgtctttatttatcaccaacaggaaacgccttgtccatagacgccccttccacatgcgtc acaagaaacctctattccagtgacacaattacgcctaattaattacatataatatttaat tatgaattcctcaccatctattacatgctttttaaccatatcggaatatttatcataatc ggcgggattcataacaatatattttcgctgcgatatttcatagcgaatccctgtaagggt ccatggcattaaaaatgcctctttaataggattacatttcatacaaagtaattttaaatt gccaggtatcgcaggaataacctcaatcttattatattcaatatacgcttctttcaaatt tttggggaaccatctaatttctttaatattattctcactacaatcaaaaaccttagcggt 2. gctatvcggatcgatcrgachacgcgccgttatctagcgatcg acgatcgagcgccsgatcggcgtcatactgactcagtmcac tcgatcagtcagvaaaacagtgragacgtcgagtasgtkag cmhgsacacgtgatcygctagdtcgatcatgcatgcatgcg atagtatgcgatghcatcgatgtcatgtgyacgttcatgtcaatn cggscgatcgahtccavgctagcttgcagcachgctcgtgat artantgatatgctgatgascgatcagtgatgagtcahgtcac agtgtactacgatcgatgctgagctcttctttctagcvgatctagg ctac
3.
actgnatgragswrathtggathytnytnathgcnatggayga ratgwsnaarathtgygcnaayacngaygarttyathaayga rtgywsnathacncarwsngcngtngarcayathwsnytn athttygargatcnyrt
4.
atgccacctttaacaactaaaataacaggaagcaacaattacttttccttaatatctcttaacatcaatggtctcaactc gccaataaaaagacatagactaacaaactggctacacaaacaagacccaacattttgctgcttacaggaaactcatctca gagaaaaagatagacactacctcagaatgaaaggctggaaaacaattttccaagcaaatggtgtgaagaaacaagcagga gtagccatcctaatatctgataagattgacttccaacccaaagtcatcaaaaaagacaaggagggacacttcattctcat caaaggtaaaatcctccaagaggaactctcaattctgaatatctatgctccaaatacaagagcagccacattcactaaag aaactttagtaaagctcaaagcacacattgcgcctcacacaataatagtgggagacttcaacacaccactttcaccaatg gacagatcatggaaacagaaactaaacagggacacacggaaactaacagaagtggaaaaacattatgaactaaccagtac ccctgagctcttgactctagctgcatatgtatcaaaagatggcctagtcggccatcactggaaagagaggcccattggac acgcagactttgtgtgccccggtacaggggaacgccagggccaaagggggggagtgggtgatagaattgaacaaaaccat ccaagatctaaaacaataaagaaatcacaaagggagacaactctggagatagaaatcctaggaaagaaatcaggaaccat agatgtgagcatcagcaacagaatacaagatatgcaagagagaatctcaggtgcagaagattccatagaaaacatggaca caacaatcaaagaaaatgcaaaatgcaaaaagatcctaactccaaacatccagaaaatccaggacacaatggtaagacca aacctaaggataataggtatagatgagaatgaagattttcaacttaaagagccaataaatatcttcaaccaagttctaga agaaatcttccctaaccaaaagaaagagatgcccatgaat
Two questions:
1. How, in principle, would one go about determining whether any of the strings contain complex specified information?
2. For the more ambitious - do any of the strings contain CSI? Do some contain more CSI than others? [ 12. September 2004, 15:45: Message edited by: andyg ]
IP: Logged
|
|
Scott
Member
Member # 1222
|
posted 12. September 2004 16:50
quote: 1. How, in principle, would one go about determining whether any of the strings contain complex specified information?
1. The number of possible symbols at each position in the string.
2. The length of the string.
IP: Logged
|
|
|
|
Salvador T. Cordova
Member
Member # 959
|
posted 12. September 2004 23:09
quote: Two questions:
1. How, in principle, would one go about determining whether any of the strings contain complex specified information?
2. For the more ambitious - do any of the strings contain CSI? Do some contain more CSI than others?
The default answer is inconclusive.
If we find two species of independent lineage having these sequences we presume genetic engineering (common design) or lateral gene transfer or something else.
Ironically, the whistle was blown on Monsanto because some of their genetically engineered food slipped through European import controls. Monsanto could have argued the design inference is unreliable therefore the DNA sequences can't be attributed to them, but no one will buy that explanation when real money is involved. I'm sure one could ping all the detractors of genetic engineering to find equally compelling examples.
So in answer to your question, "inconclusive" is the default answer. If the string matches a novel gene conjured up by another scientist and that gene has fairly unique features, it is a strong candidate for CSI.
As far as already living creatures, that's where all the intersting debate is right now, but it seems to me, few really understand what has been laid out in ID literature regarding CSI. That's ok, that's what these boards are for. Perhaps together we can iron any kinks.
Do 1 and 4 correspond to any known DNA sequence??? [ 12. September 2004, 23:18: Message edited by: Salvador T. Cordova ]
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 13. September 2004 22:33
Scott wrote:
quote: 1. The number of possible symbols at each position in the string. 2. The length of the string.
In this case - where I specified that the symbols come from an ASCII keyboard, (1) is certainly helpful.... but such an approach is not useful in the general case of detecting CSI, as the investigator would not know the number of possible symbols/components in teh message/system.
As far as point 2 goes, I'm not sure. Bill Dembski has talked about a 10 to the 150th power as a probability bound, but on other occasions (IIRC), has said that a telephone number or ATM number can exhibit CSI. Perhaps Bill would like to comment?
Salvador's answer presupposes that some of my strings represent nucleotide sequences. That may or may not be the case, but to assume this again relies on background knowledge of the sort that is not (I think) appropriate to determining CSI in general.
Perhaps I should have been more clear at the outset - the four strings should be treated without any presuppositions, as if the strings had beenr eceived from outer space.
IP: Logged
|
|
Scott
Member
Member # 1222
|
posted 14. September 2004 03:51
quote: Perhaps I should have been more clear at the outset - the four strings should be treated without any presuppositions, as if the strings had been received from outer space.
Now you're just being inconsistent.
quote: I specified that the symbols come from an ASCII keyboard.
Yes, you did. And now you're saying we should ignore that. So which is it?
If we are to treat the strings as if they have been received from outer space, then are you sure that ASCII characters are appropriate?
Did we receive ASCII characters, or did we receive some other string that we then translated into ASCII characters?
Brings up an interesting question. What is the probability that a received bit string, when translated into ASCII characters, would consist only of the characters shown in the OP.
: Revisiting the original question.
So do you agree that the exercise is to calculate probabilities?
And that in order to do so we need to know, or make some assumptions about, the number of symbols being employed? (If we can represent the result as 'bits' we can reduce this to two.)
And that we also need to know, or make some assumption about, whether each symbol is equiprobable at each location in the string?
And lastly, we need to know the length of the string.
I should have known we were getting off to a poor start when you called these character strings "bit strings."
Serious discussion is invited, encouraged, even. Please revisit your OP and revise it accordingly.
thanks
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 14. September 2004 22:11
Scott writ:
quote: So do you agree that the exercise is to calculate probabilities?
I'm not sure. As I wrote above, Dembski has variously said that CSI is present at a 10 to the 150th power bound or less, but has also said that phone numbers contain CSI.
IP: Logged
|
|
David L. Hagen
Member
Member # 323
|
posted 18. September 2004 00:32
Clues: Strings 1 and 4 contain only the four characters agct so could nominally be DNA strings. Using the online DNA decoding tool at: http://www.geneseo.edu/~eshamb/php/dna.php (Can someone verify the following interpretation?)
String 1 nominally decodes to 340 amino acid codons, BUT it does not begin with start - ATG NOR does it end with stop - TAA, TAG or TGA. String 4 nominally decodes to 360 amino acid codons. It Does begin with Start - ATG, BUT ends with AAT instead of TAA = Stop. String 2 has characters other than agct. So at least “noisy”. The remaining characters nominally code to 106 amino acid codons with two characters left over. String 3 also has characters other than agct so it may also be “noisy”. It nominally decodes to 28 amino acid codons with 2 characters left over. (BioJava.org also appears to have DNA conversion routines.) Possible further tests: Compare the strings against public genome data bases to see if they match to known genomic strings. E.g. do they match gene coding or non-coding regions? What happens if these strings are reversed?
IP: Logged
|
|
Scott
Member
Member # 1222
|
posted 18. September 2004 14:25
First andyg needs to tell us if we received bit strings, or ascii characters. He provides ascii strings, but then calls them bit strings, and then later says to pretend they arrived from outer space, but fails to specify in what format they arrived. I'm still waiting for him to modify the OP to clarify just what it is he is seeking an answer to, and just what the supposed puzzle is.
There are, of course, several levels of analysis one could use to determine CSI. If these were received as bit strings, I suppose there would be no surprise that they could be converted into ascii characters, but what are the probabilitites of getting just these ascii characters?
Then, as you point out, if we notice a pattern indicative of DNA or RNA, we could do the sort of analysis you are engaged in.
quote: What happens if these strings are reversed?
Good question. We could also convert to mRNA code.
Also, can we use Perl or something similarly capable and do regular expression searches to look for start codons and stop codons and adjust for any possible reading frames.
Questions for you math whizzes. Given only four characters, how long would a string of those four characters need to be (minimum length) to qualify as a specification?
Is the minimum information content supposed to be at least 500 bits? We could ask then how long a binary string would need to be and do the conversion?
Given an ascii character set (256 characters), how long would a string need to be to qualify as a specification?
If we cannot use any of these strings as a specification, the whole exercise may be moot.
If I weren't busy moving, I might come up with a few procedures for manipulating these strings. One simple, and useful one, might be just to count the characters. I'm sure not going to spend the time to do it manually .
IP: Logged
|
|
andyg
Member
Member # 415
|
posted 20. September 2004 19:58
quote: I'm still waiting for him to modify the OP to clarify just what it is he is seeking an answer to, and just what the supposed puzzle is.
The puzzle is to say which of the strings contain CSI. If you can't do that, tell me how you would do it in principle.
IP: Logged
|
|
Scott
Member
Member # 1222
|
posted 20. September 2004 22:18
quote: The puzzle is to say which of the strings contain CSI. If you can't do that, tell me how you would do it in principle.
In order for someone to tell which of the strings contain CSI, you need to resolve your prior inconsistent statements about those strings.
As for detecting CSI in principle, that's been covered.
IP: Logged
|
|
michaelgoodrich
Member
Member # 393
|
posted 28. September 2004 11:22
Andy G. writes:
quote: Here are four bit strings, where the bits are taken from an ASCII keyboard.
Methinks more context is needed; e.g., what is the specification of the probablistic context of the raw data?
IP: Logged
|
|
Jerry D. Bauer
Member
Member # 756
|
posted 06. November 2004 00:24
Sorry, I don't see any specified information anywhere in there. Although some strings are more or less complex when compared to the others when utilizing Berlyne's complexity as in 'a pattern can be considered more complex the larger the number of independently selected elements it contains,' this seems about as far as we can take it.
If information is to be considered specified then each piece of information must play a specific role in working with the others to produce a primary function in the whole. Now, one could assume at first examination that these are nucleotides coding for something in a living organism and if this were true each group of nucleotides might serve in a specified roll in that if I were to remove any one group of them the overall function might cease or change.
But if they came from outer space then the odds this is pre-programmed code for earth life might seem pretty remote. Therefore, I must conclude that this is complex information, but not complex specified information.
IP: Logged
|
|
|