Testing Markov Chains without Hitting

We study the problem of identity testing of markov chains. In this setting, we are given access to a single trajectory from a markov chain with unknown transition matrix $Q$ and the goal is to determine whether $Q = P$ for some known matrix $P$ or $\text{Dist}(P, Q) \geq \epsilon$ where $\text{Dist}$ is suitably defined. In recent work by Daskalakis, Dikkala and Gravin, 2018, it was shown that it is possible to distinguish between the two cases provided the length of the observed trajectory is at least super-linear in the hitting time of $P$ which may be arbitrarily large. In this paper, we propose an algorithm that avoids this dependence on hitting time thus enabling efficient testing of markov chains even in cases where it is infeasible to observe every state in the chain. Our algorithm is based on combining classical ideas from approximation algorithms with techniques for the spectral analysis of markov chains.

[1]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[2]  H. Cramér On the composition of elementary errors , .

[3]  M. Bartlett The frequency goodness of fit test for probability chains , 1951, Mathematical Proceedings of the Cambridge Philosophical Society.

[4]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .

[5]  P. Billingsley,et al.  Statistical Methods in Markov Chains , 1961 .

[6]  J. Cheeger A lower bound for the smallest eigenvalue of the Laplacian , 1969 .

[7]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[8]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[9]  A. Martin-Löf On the composition of elementary errors , 1994 .

[10]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[11]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[12]  Miklós Simonovits,et al.  Random walks and an O*(n5) volume algorithm for convex bodies , 1997, Random Struct. Algorithms.

[13]  M. Simonovits,et al.  Random walks and an O * ( n 5 ) volume algorithm for convex bodies , 1997 .

[14]  Dana Ron,et al.  A Sublinear Bipartiteness Tester for Bounded Degree Graphs , 1998, STOC '98.

[15]  Frank Thomson Leighton,et al.  Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms , 1999, JACM.

[16]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[17]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[18]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[19]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[20]  Satish Rao,et al.  Expander flows, geometric embeddings and graph partitioning , 2004, STOC '04.

[21]  Rocco A. Servedio,et al.  Testing monotone high-dimensional distributions , 2005, STOC '05.

[22]  Luca Trevisan,et al.  Approximation algorithms for unique games , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[23]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[24]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[25]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[26]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[27]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[28]  Gregory Valiant,et al.  Instance-by-instance optimal identity testing , 2013, Electron. Colloquium Comput. Complex..

[29]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[30]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[31]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[32]  Daniel M. Kane,et al.  Testing Identity of Structured Distributions , 2014, SODA.

[33]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[34]  Ilias Diakonikolas,et al.  Collision-based Testers are Optimal for Uniformity and Closeness , 2016, Electron. Colloquium Comput. Complex..

[35]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[36]  Clément L. Canonne,et al.  Distribution Testing Lower Bounds via Reductions from Communication Complexity , 2017, Computational Complexity Conference.

[37]  Constantinos Daskalakis,et al.  Testing Symmetric Markov Chains From a Single Trajectory , 2018, COLT.

[38]  Constantinos Daskalakis,et al.  Which Distribution Distances are Sublinearly Testable? , 2017, Electron. Colloquium Comput. Complex..

[39]  Santosh S. Vempala,et al.  The Kannan-Lov\'asz-Simonovits Conjecture. , 2018, 1807.03465.

[40]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[41]  Daniel M. Kane,et al.  Testing Bayesian Networks , 2016, IEEE Transactions on Information Theory.