Information Theory

The unique measure of uncertainty for a probability distribution: H(X) = -sum p_i log p_i, always non-negative

Every prefix-free binary code with codeword lengths l_1,...,l_n satisfies the sum 2^{-l_i} <= 1

Shannon's noiseless coding theorem: optimal prefix code lengths satisfy H(X) <= E[l] < H(X) + 1

A binary code correcting t errors satisfies |C| * sum_{i=0}^{t} C(n,i) <= 2^n; the (7,4) Hamming code meets it

Any binary code with minimum distance d and block length n satisfies |C| <= 2^{n-d+1}

The BSC capacity C = 1 - H(p) is the maximum reliable bit rate; C(0)=1 and C(1/2)=0

I(X;Y) = H(X) + H(Y) - H(X,Y) measures how much knowing Y reduces uncertainty about X

Any estimator of X from Y with error probability P_e satisfies H(X|Y) <= H(P_e) + P_e * log(|X|-1)