First, let's begin with some definitions:
- Entropy is NOT a measure of 'disorder' (a very common misstatement) but of the dispersal of energy in a system (this dispersal makes it unavailable to do work)
- Shannon Entropy quantifies the information content of a message, it is a quantitative measure of the average unpredictability, given by:
H(X) = -∑ P(x) log₂ [P(x)] P(x) probability X is in state x [P log₂ P = 0 where P=0]see: Mathematical Foundations of Information Theory, Khinchin
- Deoxyribonucleic acid (DNA) is composed of two long polymers of simple units called nucleotides, with backbones made of five-carbon sugars and phosphate groups joined by ester bonds; attached to each sugar is one of four bases: thymine (T), cytosine (C), guanine (G), and adenine (A). A group of three bases, taken together, represent a codon (which map to Amino Acids or control Codons) It is the sequence of these Codons that compose the bulk of the 'informational' content of DNA.
The DNA pattern equates to an alphabet of 22 letters. The frequency of letters in this alphabet are not relevant to the *relative* information content question at hand because we're doing a relative comparison, but would change the specific measured values.
The simplest way to work with the equation is to assume all the letters are equally probable, having a probability of 1/22 of appearing:
GCT = 1 'codon' letter with a probability of 1/22
I = - log₂(1/22) = 4.4594316... bits/codon letter
As our message gets longer, it simply increases proportionally to the length of the message:
GCT AAC TTT TGG
I = 4 * (- log₂(1/22)) = 17.837726... bits
And now if you duplicate that:
GCT AAC TTT TGG GCT AAC TTT TGG
I = 8 * (- log₂(1/22)) = 35.6754529... bits
Therefore the information content goes up, even when you merely duplicated data (even a portion of it, because it is proportional) and even simple single base-pair mutation CHANGE the information contained within DNA.
Now imagine a duplicated allele in an organism, it is going to express that protein at a greater rate than in the progenitor organism, that is a clear change right there. Then, when one of these two alleles experience a mutation the original function of the allele remains intact, and we have a new protein being expressed in the organism. In no possible sense is this 'not increasing information'.
You can increase the size of the alphabet and adjust the probabilities/frequencies of the letters all you want (clearly start & stop would appear less frequently) but you won't change the fact that duplicating a portion of the data will produce an increase the total information. Nor will you change the relative effect of duplicating large segments of the message.
Because, even though Shannon information decreases the more we constrain/limit the system (because the probabilities of the next codon become more certain), there is still an informational content expressed as bits/letter -- adding more letters to a message that is already billions of letters long isn't going to have a measurable change in the probabilities, but will greatly increase the number of codons and drive up the total informational content.
If you are a creationist and you disagree then I'm sorry, but you'll need to show your math (and if you're correct, publish it in a reputable peer-reviewed journal and you will win the Nobel Prize).
Shannon information theory
Frank L. Lambert's Simple evolution vs thermodynamics
York University: Coding and Information Theory, video series