This vignette shows how to use the R package disclapmix
that implements the method described in (Andersen, Eriksen, and Morling 2013b) and (Andersen et al. 2015). For a more gentle
introduction to the method, refer to the introduction vignette and (Andersen, Eriksen, and Morling 2013a).
We again use the Danish reference database (Hallenberg et al. 2005) with \(n = 185\) observations (male Y-STR
haplotypes) at \(r=10\) loci is
available in the danes
dataset. Let us load the package as
well as the data:
library(disclapmix)
data(danes)
The database is in compact format, i.e. one unique haplotype per row. To fit the model, we need one observation per row. This is done for example like this:
## int [1:185, 1:10] 13 13 13 13 13 13 14 14 14 14 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:185] "1" "2" "3" "4" ...
## ..$ : chr [1:10] "DYS19" "DYS389I" "DYS389II" "DYS390" ...
Also, note that the database is now an integer matrix.
Assume now that we have a mixture and that the reference database are without these two contributors:
donor1 <- db[1, ]
donor2 <- db[20, ]
refdb <- db[-c(1, 20), ]
We now construct the mixture:
mix <- generate_mixture(list(donor1, donor2))
We can then see some properties of possible pairs:
pairs <- contributor_pairs(mix)
pairs
## Mixture:
## DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## 1 13,14 13,14 29,30 22 10 13,15 13 14,15 11,12 12
##
## Number of possible contributor pairs = 32
To do much more, we need a model assigning hpalotype probabilies. In the introduction vignette, we found that 4 clusters seemed fine, so let us fit this model:
fit <- disclapmix(x = refdb, clusters = 4L)
We can now use this model to e.g. rank the contributor pairs:
ranked_pairs <- rank_contributor_pairs(pairs, fit)
ranked_pairs
## Mixture:
## DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## 1 13,14 13,14 29,30 22 10 13,15 13 14,15 11,12 12
##
## Contributor pairs = 32
##
## Sum of all (product of contributor pair haplotypes) = 2.263911e-14
##
## Showing rank 1-5:
##
## Rank 1 [ P(H1)*P(H2) = 2.929608e-15 ]:
## DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1 13 14 30 . . 15 . 14 11 .
## H2 14 13 29 . . 13 . 15 12 .
## Prob
## H1 1.945729e-11
## H2 1.505661e-04
##
## Rank 2 [ P(H1)*P(H2) = 2.224515e-15 ]:
## DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1 13 13 29 . . 13 . 15 12 .
## H2 14 14 30 . . 15 . 14 11 .
## Prob
## H1 2.216703e-05
## H2 1.003524e-10
##
## Rank 3 [ P(H1)*P(H2) = 1.722751e-15 ]:
## DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1 13 14 30 . . 13 . 14 11 .
## H2 14 13 29 . . 15 . 15 12 .
## Prob
## H1 3.356602e-09
## H2 5.132425e-07
##
## Rank 4 [ P(H1)*P(H2) = 1.364569e-15 ]:
## DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1 13 13 29 . . 15 . 15 12 .
## H2 14 14 30 . . 13 . 14 11 .
## Prob
## H1 7.556192e-08
## H2 1.805895e-08
##
## Rank 5 [ P(H1)*P(H2) = 1.060766e-15 ]:
## DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1 13 13 30 . . 15 . 14 11 .
## H2 14 14 29 . . 13 . 15 12 .
## Prob
## H1 6.777397e-11
## H2 1.565152e-05
##
## (27 contributor pairs hidden.)
We can get the ranks for the donors:
get_rank(ranked_pairs, donor1)
## [1] 13
get_rank(ranked_pairs, donor2)
## [1] 13