Introduction
introduction.Rmd
library(spartropy)
Notes
- All variables in this package are considered to be categorical (discrete) variables
- Below \(\log\) is used in a generic fashing – any \(\log\) with base \(b\) goes
- Note that \(0 \log 0 = 0\) by convention as this corresponds also to the limit
Definitions and formulas
Entropy, \(\mathrm{H}\)
Definition:
\[\begin{align} \mathrm{H}(X) &= - \sum_{x} P(X = x) \log P(X = x) \end{align}\] where the sum are over the possible values of \(X\), denoted here by \(x\).
Source: https://en.wikipedia.org/wiki/Entropy_(information_theory)
Joint entropy, \(\mathrm{H}(X, Y)\) (symmetric in \(X\) and \(Y\))
Definition:
\[\begin{align} \mathrm{H}(X , Y) &= - \sum_{x, y} P(X = x , Y = y) \log P(X = x , Y = y) \end{align}\] where the sum are over the possible values of \(X\), denoted here by \(x\), and over possible values of \(Y\), denoted by \(y\).
Conditional entropy, \(\mathrm{H}(X \mid Y)\) (asymmetric in \(X\) and \(Y\))
Definition:
\[\begin{align} \mathrm{H}(X \mid Y) &= - \sum_{x, y} P(X = x , Y = y) \log \frac{P(X = x , Y = y)}{P(Y = y)} \end{align}\]
Formulas:
\[\begin{align} \mathrm{H}(X \mid Y) &= \mathrm{H}(X, Y) - \mathrm{H}(Y) \\ \mathrm{H}(Y \mid X) &= \mathrm{H}(X, Y) - \mathrm{H}(X) \end{align}\]
\[\begin{align} \mathrm{H}(X , Y) &= \mathrm{H}(X \mid Y) + \mathrm{H}(Y) \\ &= \mathrm{H}(Y \mid X) + \mathrm{H}(X) \end{align}\]
Mutual information, \(\operatorname{I} (X; Y)\) (symmetric in \(X\) and \(Y\))
Definition:
\[\begin{align} \operatorname{I} (X; Y) &= \sum_{x, y} P(X = x , Y = y) \log \frac{P(X = x , Y = y)}{P(X = x) P(Y = y)} \end{align}\]
Formulas:
\[\begin{align} \operatorname{I} (X; Y) &= \mathrm{H}(X) - \mathrm{H}(X \mid Y)\\ &= \mathrm{H}(Y) - \mathrm{H}(Y \mid X)\\ &= \mathrm{H}(X) + \mathrm{H}(Y) - \mathrm{H}(X,Y) \\ &= \mathrm{H}(X,Y) - \mathrm{H}(X \mid Y) - \mathrm{H}(Y \mid X) \\ \mathrm{H}(X \mid Y) &= \mathrm{H}(X) - \operatorname{I} (X; Y) \end{align}\]
(Normalised) shared information distance, \(D(X, Y)\) (symmetric in \(X\) and \(Y\))
\[\begin{align} \operatorname{D} (X, Y) &= \frac{ \mathrm{H}(X \mid Y) + \mathrm{H}(Y \mid X) }{\mathrm{H}(X,Y)} \end{align}\]
\(0 \leq \operatorname{D} (X, Y) \leq 1\) with \(\operatorname{D} (X, Y) = 0\) iff \(X\) and \(Y\) are perfectly dependent (fully determined) and \(\operatorname{D} (X, Y) = 1\) iff \(X\) and \(Y\) are independent.
This measure is also sometimes called normalised independent information.
Functions
-
entropyB(d)
calculates entropy \(H(X)\) and joint entropy \(H(\ldots)\) -
entropy_condB(d, idx_x, idx_y)
calculates conditional entropy \(H(X \mid Y)\) ofd[, idx_x]
givend[, idx_y]
-
mutinfB(d, idx_x, idx_y)
calculates mutual information betweend[, idx_x]
andd[, idx_y]
where B
is either
-
E
for natural logarithm -
2
for \(\log2\) -
10
for \(\log10\)
Example
Using mtcars
data:
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
\(H(\text{mpg})\)
Entropy, \(H\) of the mpt
variable, i.e. \(H(\text{mpg})\):
H_mpg <- entropyE(mtcars[, "mpg"])
H_mpg
#> [1] 3.162484
\(H(\text{mpg} \mid \text{hp}, \text{wt})\)
\[\begin{align} \mathrm{H}(X \mid Y) &= \mathrm{H}(X) - \operatorname{I} (X; Y) \\ \mathrm{H}(\text{mpg} \mid \text{hp}, \text{wt}) &= \mathrm{H}(\text{mpg}) - \operatorname{I} (\text{mpg}; \text{hp}, \text{wt}) \\ \end{align}\]
H_mpg_hpwt <- H_mpg - I_mpg_hpwt
H_mpg_hpwt
#> [1] 0.0433217
entropy_condE(mtcars, idx_x, idx_y)
#> [1] 0.0433217
\[\begin{align} \mathrm{H}(\text{hp}, \text{wt} \mid \text{mpg}) &= \mathrm{H}(\text{hp}, \text{wt}) - \operatorname{I} (\text{mpg}; \text{hp}, \text{wt}) \\ \end{align}\]
H_hpwt <- entropyE(mtcars[, c("hp", "wt")])
H_hpwt_mpg <- H_hpwt - I_mpg_hpwt
H_hpwt_mpg
#> [1] 0.3032519
entropy_condE(mtcars, idx_y, idx_x)
#> [1] 0.3032519
\[\begin{align} \operatorname{D} (\text{mpg}, \{ \text{hp}, \text{wt} \}) &= \frac{ \mathrm{H}(\text{mpg} \mid \text{hp}, \text{wt}) + \mathrm{H}(\text{hp}, \text{wt} \mid \text{mpg}) }{\mathrm{H}(\text{mpg}, \text{hp}, \text{wt})} \end{align}\]
D_mpg_hpwt <- (H_mpg_hpwt + H_hpwt_mpg) / H_joint
D_mpg_hpwt
#> [1] 0.1
(entropy_condE(mtcars, idx_x, idx_y) + entropy_condE(mtcars, idx_y, idx_x)) / entropyE(mtcars[, c("mpg", "hp", "wt")])
#> [1] 0.1
Thus, as \(D\) is close to 0, then hp
and wt
says alot about mpg
(and vica versa).