Information Theory
Shannon Entropy
The unique measure of uncertainty for a probability distribution: H(X) = -sum p_i log p_i, always non-negative
Kraft's Inequality
Every prefix-free binary code with codeword lengths l_1,...,l_n satisfies the sum 2^{-l_i} <= 1
Source Coding Theorem
Shannon's noiseless coding theorem: optimal prefix code lengths satisfy H(X) <= E[l] < H(X) + 1
Hamming Bound
A binary code correcting t errors satisfies |C| * sum_{i=0}^{t} C(n,i) <= 2^n; the (7,4) Hamming code meets it
Singleton Bound
Any binary code with minimum distance d and block length n satisfies |C| <= 2^{n-d+1}
Channel Capacity
The BSC capacity C = 1 - H(p) is the maximum reliable bit rate; C(0)=1 and C(1/2)=0
Mutual Information
I(X;Y) = H(X) + H(Y) - H(X,Y) measures how much knowing Y reduces uncertainty about X
Fano's Inequality
Any estimator of X from Y with error probability P_e satisfies H(X|Y) <= H(P_e) + P_e * log(|X|-1)