Introduction • spartropy

library(spartropy)

Notes

All variables in this package are considered to be categorical (discrete) variables
Below \(\log\) is used in a generic fashing – any \(\log\) with base \(b\) goes
- Later we see that in spartropy there are functions available for log() (natural logarithm), log2() and log10()
Note that \(0 \log 0 = 0\) by convention as this corresponds also to the limit

Definitions and formulas

Entropy, \(\mathrm{H}\)

Definition:

\[\begin{align} \mathrm{H}(X) &= - \sum_{x} P(X = x) \log P(X = x) \end{align}\] where the sum are over the possible values of \(X\), denoted here by \(x\).

Source: https://en.wikipedia.org/wiki/Entropy_(information_theory)

Joint entropy, \(\mathrm{H}(X, Y)\) (symmetric in \(X\) and \(Y\))

Definition:

\[\begin{align} \mathrm{H}(X , Y) &= - \sum_{x, y} P(X = x , Y = y) \log P(X = x , Y = y) \end{align}\] where the sum are over the possible values of \(X\), denoted here by \(x\), and over possible values of \(Y\), denoted by \(y\).

Source: https://en.wikipedia.org/wiki/Joint_entropy

Conditional entropy, \(\mathrm{H}(X \mid Y)\) (asymmetric in \(X\) and \(Y\))

Definition:

\[\begin{align} \mathrm{H}(X \mid Y) &= - \sum_{x, y} P(X = x , Y = y) \log \frac{P(X = x , Y = y)}{P(Y = y)} \end{align}\]

Formulas:

\[\begin{align} \mathrm{H}(X \mid Y) &= \mathrm{H}(X, Y) - \mathrm{H}(Y) \\ \mathrm{H}(Y \mid X) &= \mathrm{H}(X, Y) - \mathrm{H}(X) \end{align}\]

\[\begin{align} \mathrm{H}(X , Y) &= \mathrm{H}(X \mid Y) + \mathrm{H}(Y) \\ &= \mathrm{H}(Y \mid X) + \mathrm{H}(X) \end{align}\]

Source: https://en.wikipedia.org/wiki/Conditional_entropy

Mutual information, \(\operatorname{I} (X; Y)\) (symmetric in \(X\) and \(Y\))

Definition:

\[\begin{align} \operatorname{I} (X; Y) &= \sum_{x, y} P(X = x , Y = y) \log \frac{P(X = x , Y = y)}{P(X = x) P(Y = y)} \end{align}\]

Formulas:

\[\begin{align} \operatorname{I} (X; Y) &= \mathrm{H}(X) - \mathrm{H}(X \mid Y)\\ &= \mathrm{H}(Y) - \mathrm{H}(Y \mid X)\\ &= \mathrm{H}(X) + \mathrm{H}(Y) - \mathrm{H}(X,Y) \\ &= \mathrm{H}(X,Y) - \mathrm{H}(X \mid Y) - \mathrm{H}(Y \mid X) \\ \mathrm{H}(X \mid Y) &= \mathrm{H}(X) - \operatorname{I} (X; Y) \end{align}\]

Source: https://en.wikipedia.org/wiki/Mutual_information

(Normalised) shared information distance, \(D(X, Y)\) (symmetric in \(X\) and \(Y\))

\[\begin{align} \operatorname{D} (X, Y) &= \frac{ \mathrm{H}(X \mid Y) + \mathrm{H}(Y \mid X) }{\mathrm{H}(X,Y)} \end{align}\]

\(0 \leq \operatorname{D} (X, Y) \leq 1\) with \(\operatorname{D} (X, Y) = 0\) iff \(X\) and \(Y\) are perfectly dependent (fully determined) and \(\operatorname{D} (X, Y) = 1\) iff \(X\) and \(Y\) are independent.

This measure is also sometimes called normalised independent information.

Independence

If \(X\) and \(Y\) are independent:

\[\begin{align} \mathrm{H} (Y \mid X) &= \mathrm {H}(Y) \\ \mathrm{H} (X \mid Y) &= \mathrm {H}(X) . \end{align}\]

Functions

entropyB(d) calculates entropy \(H(X)\) and joint entropy \(H(\ldots)\)
entropy_condB(d, idx_x, idx_y) calculates conditional entropy \(H(X \mid Y)\) of d[, idx_x] given d[, idx_y]
mutinfB(d, idx_x, idx_y) calculates mutual information between d[, idx_x] and d[, idx_y]

where B is either

E for natural logarithm
2 for \(\log2\)
10 for \(\log10\)

Example

Using mtcars data:

head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

\(H(\text{mpg})\)

Entropy, \(H\) of the mpt variable, i.e. \(H(\text{mpg})\):

H_mpg <- entropyE(mtcars[, "mpg"])
H_mpg
#> [1] 3.162484

\(H(\text{mpg}, \text{hp}, \text{wt})\)

H_joint <- entropyE(mtcars[, c("mpg", "hp", "wt")])
H_joint
#> [1] 3.465736

\(I(\text{mpg}; \text{hp}, \text{wt})\)

idx_x <- match("mpg", colnames(mtcars))
idx_y <- match(c("hp", "wt"), colnames(mtcars))
I_mpg_hpwt <- mutinfE(mtcars, idx_x, idx_y)
I_mpg_hpwt
#> [1] 3.119162

\(H(\text{mpg} \mid \text{hp}, \text{wt})\)

\[\begin{align} \mathrm{H}(X \mid Y) &= \mathrm{H}(X) - \operatorname{I} (X; Y) \\ \mathrm{H}(\text{mpg} \mid \text{hp}, \text{wt}) &= \mathrm{H}(\text{mpg}) - \operatorname{I} (\text{mpg}; \text{hp}, \text{wt}) \\ \end{align}\]

H_mpg_hpwt <- H_mpg - I_mpg_hpwt
H_mpg_hpwt
#> [1] 0.0433217
entropy_condE(mtcars, idx_x, idx_y)
#> [1] 0.0433217

\[\begin{align} \mathrm{H}(\text{hp}, \text{wt} \mid \text{mpg}) &= \mathrm{H}(\text{hp}, \text{wt}) - \operatorname{I} (\text{mpg}; \text{hp}, \text{wt}) \\ \end{align}\]

H_hpwt <- entropyE(mtcars[, c("hp", "wt")])
H_hpwt_mpg <- H_hpwt - I_mpg_hpwt
H_hpwt_mpg
#> [1] 0.3032519
entropy_condE(mtcars, idx_y, idx_x)
#> [1] 0.3032519

\[\begin{align} \operatorname{D} (\text{mpg}, \{ \text{hp}, \text{wt} \}) &= \frac{ \mathrm{H}(\text{mpg} \mid \text{hp}, \text{wt}) + \mathrm{H}(\text{hp}, \text{wt} \mid \text{mpg}) }{\mathrm{H}(\text{mpg}, \text{hp}, \text{wt})} \end{align}\]

D_mpg_hpwt <- (H_mpg_hpwt + H_hpwt_mpg) / H_joint
D_mpg_hpwt
#> [1] 0.1
(entropy_condE(mtcars, idx_x, idx_y) + entropy_condE(mtcars, idx_y, idx_x)) / entropyE(mtcars[, c("mpg", "hp", "wt")])
#> [1] 0.1

Thus, as \(D\) is close to 0, then hp and wt says alot about mpg (and vica versa).