Infinite Hidden Markov Models via the Hierarchical Dirichlet Process

Category: graphical models. In this presentation, we propose a new formalism under which we study the infinite hidden Markov model (iHMM) of Beal et al. [2]. The iHMM is a hidden Markov model (HMM) in which the number of hidden states is allowed to be countably infinite. This is achieved using the formalism of the Dirichlet process. In particular, a two-level urn model is used to determine the transition probabilities of the HMM (an urn model is a particular aspect of the Dirichlet process.) At the first level, the probability of transitioning from a state u to a state v is proportional to the number of times the same transition is observed at other time steps, while with probability proportional to α0 an “oracle” process is invoked. At this second level, the probability of transitioning to state v is proportional to the number of times state v has been chosen by the oracle (regardless of the previous state), while the probability of transitioning to a previously unseen (i.e. novel) state is proportional to γ. The inclusion of the oracle was motivated by a need to tie together the transition models to have common destination states. Beal et al. presented an approximate Gibbs sampling algorithm for inference in this model, but without an explicit generative model the scope of the model was limited. A fresh aspect on the iHMM is provided by the recent development of the Hierarchical Dirichlet Process (HDP) of Teh et al. [1]. The HDP framework considers problems involving related groups of data. In particular, each group of data is modelled by a Dirichlet process (DP) mixture model, with the common base measure of the DPs being itself distributed according to a global DP. This is a hierarchical Bayesian model, and is necessary to ensure that the different DP mixtures are able to share a common set of mixture components. Note that Beal et al. defined a notion of “hierarchical Dirichlet process”, but this is not hierarchical in the Bayesian sense—involving a distribution on the parameters of a Dirichlet process—but is instead a description of the two level urn model. Nevertheless, it turns out that it is possible to formulate the iHMM within the HDP framework of Teh et al. (Figure 1 shows the relevant hierarchical Bayesian model for the iHMM), by viewing it as another instance of the grouped data formulation in the following way. We assign observations to groups, where the groups are indexed by the value of the previous state variable in the sequence; then the current state and emission distributions define a group-specific mixture model. In this way, the sequential ordering of the data is essential since the hidden state sequence implicitly defines a partition into groups. Further, the Chinese restaurant franchise (CRF) aspect of the HDP turns out to be equivalent to the two level urn model of Beal et al. This realisation of the iHMM in the HDP framework seems straightforward, but there are some striking non-trivialities to resolve. First, the HDP framework was developed assuming a fixed partition