Directed information

Directed information, $I(X^{n}\to Y^{n})$ , is an information theory measure that quantifies the information flow from the random process $X^{n}=\{X_{1},X_{2},\dots ,X_{n}\}$ to the random process $Y^{n}=\{Y_{1},Y_{2},\dots ,Y_{n}\}$ . The term directed information was coined by James Massey and is defined as

I(X^{n}\to Y^{n})\triangleq \sum _{i=1}^{n}I(X^{i};Y_{i}|Y^{i-1})

,

where $I(X^{i};Y_{i}|Y^{i-1})$ is the conditional mutual information $I(X_{1},X_{2},...,X_{i};Y_{i}|Y_{1},Y_{2},...,Y_{i-1})$ .

The Directed information has many applications in problems where causality plays an important role such as capacity of channel with feedback,[1][2] capacity of discrete memoryless networks with feedback,[3] gambling with causal side information,[4] compression with causal side information,[5] and in real-time control communication settings,[6][7] statistical physics.[8]

Estimation and optimization

Estimating and optimizing the directed information is challenging because it has $n$ terms where $n$ may be large. In many cases, one is interested in optimizing the limiting average, that is, when $n$ grows to infinity termed as a multi-letter expression.

Estimation

The estimation of directed information from given samples is a very hard problem since the directed information expression does not depends on samples but on the joint distribution $\{P(x_{i},y_{i}|x^{i-1},y^{i-1})_{i=1}^{n}\}$ which is unknown. There exist several algorithms based on context tree weight[9] and on empirical parametric distributions[10] and using Long short-term memory.[11]

Optimization

The maximization of the directed information is a fundamental problem in information theory. For a fixed sequence of channel distributions $\{P(y_{i}|x^{i},y^{i-1}\}_{i=1}^{n})$ , the objective is to optimize $I(X^{n}\to Y^{n})$ over the channel input distributions $\{P(x_{i}|x^{i-1},y^{i-1}\}_{i=1}^{n})$ .

There exist algorithms for optimizing the directed information based on Blahut-Arimoto,[12] Markov decision process,[13][14][15][16] Recurrent neural network,[11] Reinforcement learning.[17] and Graphical methods (the Q-graphs).[18][19] For the case of Blahut-Arimoto,[12] the main idea is to start with the last element of the directed information and go backward. For the case of Markov decision process,[13][14][15][16] the main ideas is to transform the optimization into an infinite horizon average reward Markov decision process. For Recurrent neural network[11] the main idea is to model the input distribution using the Recurrent neural network and optimize the parameters using Gradient descent. For Reinforcement learning[17] the main idea is to solve the Markov decision process formulation of the capacity using Reinforcement learning tools which allows us to deal with a large or even continuous alphabet.

Causal conditioning

The essence of directed information is from the causal conditioning operation. The causal conditionng is denoted by $(\cdot ||\cdot )$ . Recall the notation $x^{n}=\{x_{1},x_{2},\dots ,x_{n}\}$ , $y^{n}=\{y_{1},y_{2},\dots ,y_{n}\}$ ; The probablity of $y^{n}$ causal conditioned on $x^{n}$ is denoted by $P(y^{n}||x^{n})$ and is defined as

P(y^{n}||x^{n})\triangleq \prod _{i=1}^{n}P(y_{i}|y^{i-1},x^{i})

.

Note, that this is very similar to the chain rule of regular conditioinig, i.e., $P(y^{n}|x^{n})=\prod _{i=1}^{n}P(y_{i}|y^{i-1},x^{n})$ , where the difference is in the conditioning on the sequence $x^{i}$ versus $x^{n}$ . One can also intruce delay to the causal conditioning,[3] i.e.,

P(y^{n}||x^{n-1})\triangleq \prod _{i=1}^{n}P(y_{i}|y^{i-1},x^{i-1})

.

Properties

The chain rule for causal conditioning[2] is

P(y^{n},x^{n})=P(y^{n}||x^{n})P(x^{n}||y^{n-1})

The chain rule implies two things: first, any joint distribustion $P(y^{n},x^{n})$ can be decomose into a product of $P(y^{n}||x^{n}),P(x^{n}||y^{n-1})$ ; and second, any product of $P(y^{n}||x^{n}),P(x^{n}||y^{n-1})$ results in a joint distribustion $P(y^{n},x^{n})$ .

The casual conditioning probability $P(y^{n}||x^{n})\triangleq \prod _{i=1}^{n}P(y_{i}|y^{i-1},x^{i})$ is indeed a probability vector, i.e.,

P(y^{n}||x^{n})\geq 0,\ \forall (y^{n},x^{n})

.

\sum _{y^{n}}P(y^{n}||x^{n})=1,\ \forall (x^{n})

.

Directed Information can be written in terms of causal conditioning as follows

I(X^{N}\rightarrow Y^{N})=\mathbf {E} \left[\log {\frac {P(Y^{N}||X^{N})}{P(Y^{N})}}\right]

Conservation law of information

This law, established by James Massey and his son,[20] gives a very nice intuition to the directed information and its relation to mutual information. The law states that for any $X^{n},Y^{n}$ , the following equality holds:

I(X^{n};Y^{n})=I(X^{n}\to Y^{n})+I(Y^{n-1}\to X^{n}).

References

Massey, James (1990). "Causality, Feedback And Directed Information". Proc. 1990 Intl. Symp. on Info. Th. and its Applications, Waikiki, Hawaii, Nov. 27-30, 1990.
Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv:cs/0608070. doi:10.1109/TIT.2008.2009849. S2CID 13178.
Kramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory. 49 (1): 4–21. doi:10.1109/TIT.2002.806135.
Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory. 57 (6): 3248–3259. arXiv:0912.4872. doi:10.1109/TIT.2011.2136270. S2CID 11722596.
Simeone, Osvaldo; Permuter, Haim Henri (June 2013). "Source Coding When the Side Information May Be Delayed". IEEE Transactions on Information Theory. 59 (6): 3607–3618. arXiv:1109.1293. doi:10.1109/TIT.2013.2248192. S2CID 3211485.
Charalambous, Charalambos D.; Stavrou, Photios A. (August 2016). "Directed Information on Abstract Spaces: Properties and Variational Equalities". IEEE Transactions on Information Theory. 62 (11): 6019–6052. arXiv:1302.3971. doi:10.1109/TIT.2016.2604846. S2CID 8107565.
Tanaka, Takashi; Esfahani, Peyman Mohajerin; Mitter, Sanjoy K. (January 2018). "LQG Control With Minimum Directed Information: Semidefinite Programming Approach". IEEE Transactions on Automatic Control. 63 (1): 37–52. arXiv:1510.04214. doi:10.1109/TAC.2017.2709618. S2CID 1401958.
Vinkler, Dror A; Permuter, Haim H; Merhav, Neri (20 April 2016). "Analogy between gambling and measurement-based work extraction". Journal of Statistical Mechanics: Theory and Experiment. 2016 (4): 043403. arXiv:1404.6788. Bibcode:2016JSMTE..04.3403V. doi:10.1088/1742-5468/2016/04/043403. S2CID 124719237.
Jiao, Jiantao; Permuter, Haim H.; Zhao, Lei; Kim, Young-Han; Weissman, Tsachy (October 2013). "Universal Estimation of Directed Information". IEEE Transactions on Information Theory. 59 (10): 6220–6242. arXiv:1201.2334. doi:10.1109/TIT.2013.2267934. S2CID 10855063.
Quinn, Christopher J.; Kiyavash, Negar; Coleman, Todd P. (December 2015). "Directed Information Graphs". IEEE Transactions on Information Theory. 61 (12): 6887–6909. arXiv:1204.2003. doi:10.1109/TIT.2015.2478440. S2CID 3121664.
Aharoni, Ziv; Tsur, Dor; Goldfeld, Ziv; Permuter, Haim Henry (June 2020). "Capacity of Continuous Channels with Memory via Directed Information Neural Estimator". 2020 IEEE International Symposium on Information Theory (ISIT): 2014–2019. arXiv:2003.04179. doi:10.1109/ISIT44484.2020.9174109. ISBN 978-1-7281-6432-8. S2CID 212634151.
Naiss, Iddo; Permuter, Haim H. (January 2013). "Extension of the Blahut–Arimoto Algorithm for Maximizing Directed Information". IEEE Transactions on Information Theory. 59 (1): 204–222. arXiv:1012.5071. doi:10.1109/TIT.2012.2214202. S2CID 3115749.
Permuter, Haim; Cuff, Paul; Van Roy, Benjamin; Weissman, Tsachy (July 2008). "Capacity of the Trapdoor Channel With Feedback". IEEE Transactions on Information Theory. 54 (7): 3150–3165. arXiv:cs/0610047. doi:10.1109/TIT.2008.924681. S2CID 1265.
Elishco, Ohad; Permuter, Haim (September 2014). "Capacity and Coding for the Ising Channel With Feedback". IEEE Transactions on Information Theory. 60 (9): 5138–5149. arXiv:1205.4674. doi:10.1109/TIT.2014.2331951. S2CID 9761759.
Sabag, Oron; Permuter, Haim H.; Kashyap, Navin (January 2016). "The Feedback Capacity of the Binary Erasure Channel With a No-Consecutive-Ones Input Constraint". IEEE Transactions on Information Theory. 62 (1): 8–22. doi:10.1109/TIT.2015.2495239. S2CID 476381.
Peled, Ori; Sabag, Oron; Permuter, Haim H. (July 2019). "Feedback Capacity and Coding for the $(0,k)$ -RLL Input-Constrained BEC". IEEE Transactions on Information Theory. 65 (7): 4097–4114. arXiv:1712.02690. doi:10.1109/TIT.2019.2903252. S2CID 86582654.
Aharoni, Ziv; Sabag, Oron; Permuter, Haim Henri (18 August 2020). "Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet". arXiv:2008.07983 [cs.IT].
Sabag, Oron; Permuter, Haim Henry; Pfister, Henry (March 2017). "A Single-Letter Upper Bound on the Feedback Capacity of Unifilar Finite-State Channels". IEEE Transactions on Information Theory. 63 (3): 1392–1409. arXiv:1604.01878. doi:10.1109/TIT.2016.2636851. S2CID 3259603.
Sabag, Oron; Huleihel, Bashar; Permuter, Haim Henry (2020). "Graph-Based Encoders and their Performance for Finite-State Channels with Feedback". IEEE Transactions on Communications. 68 (4): 2106–2117. arXiv:1907.08063. doi:10.1109/TCOMM.2020.2965454. S2CID 197544824.
Massey, J.L.; Massey, P.C. (September 2005). "Conservation of mutual and directed information". Proceedings. International Symposium on Information Theory, 2005. ISIT 2005.: 157–158. doi:10.1109/ISIT.2005.1523313. ISBN 0-7803-9151-9. S2CID 38053218.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Massey, James (1990). "Causality, Feedback And Directed Information". Proc. 1990 Intl. Symp. on Info. Th. and its Applications, Waikiki, Hawaii, Nov. 27-30, 1990.

[2008.2009849-2] Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv:cs/0608070. doi:10.1109/TIT.2008.2009849. S2CID 13178.

[Kramer_2003-3] Kramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory. 49 (1): 4–21. doi:10.1109/TIT.2002.806135.

[4] Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory. 57 (6): 3248–3259. arXiv:0912.4872. doi:10.1109/TIT.2011.2136270. S2CID 11722596.

[5] Simeone, Osvaldo; Permuter, Haim Henri (June 2013). "Source Coding When the Side Information May Be Delayed". IEEE Transactions on Information Theory. 59 (6): 3607–3618. arXiv:1109.1293. doi:10.1109/TIT.2013.2248192. S2CID 3211485.

[6] Charalambous, Charalambos D.; Stavrou, Photios A. (August 2016). "Directed Information on Abstract Spaces: Properties and Variational Equalities". IEEE Transactions on Information Theory. 62 (11): 6019–6052. arXiv:1302.3971. doi:10.1109/TIT.2016.2604846. S2CID 8107565.

[7] Tanaka, Takashi; Esfahani, Peyman Mohajerin; Mitter, Sanjoy K. (January 2018). "LQG Control With Minimum Directed Information: Semidefinite Programming Approach". IEEE Transactions on Automatic Control. 63 (1): 37–52. arXiv:1510.04214. doi:10.1109/TAC.2017.2709618. S2CID 1401958.

[8] Vinkler, Dror A; Permuter, Haim H; Merhav, Neri (20 April 2016). "Analogy between gambling and measurement-based work extraction". Journal of Statistical Mechanics: Theory and Experiment. 2016 (4): 043403. arXiv:1404.6788. Bibcode:2016JSMTE..04.3403V. doi:10.1088/1742-5468/2016/04/043403. S2CID 124719237.

[9] Jiao, Jiantao; Permuter, Haim H.; Zhao, Lei; Kim, Young-Han; Weissman, Tsachy (October 2013). "Universal Estimation of Directed Information". IEEE Transactions on Information Theory. 59 (10): 6220–6242. arXiv:1201.2334. doi:10.1109/TIT.2013.2267934. S2CID 10855063.

[10] Quinn, Christopher J.; Kiyavash, Negar; Coleman, Todd P. (December 2015). "Directed Information Graphs". IEEE Transactions on Information Theory. 61 (12): 6887–6909. arXiv:1204.2003. doi:10.1109/TIT.2015.2478440. S2CID 3121664.

[2003.04179-11] Aharoni, Ziv; Tsur, Dor; Goldfeld, Ziv; Permuter, Haim Henry (June 2020). "Capacity of Continuous Channels with Memory via Directed Information Neural Estimator". 2020 IEEE International Symposium on Information Theory (ISIT): 2014–2019. arXiv:2003.04179. doi:10.1109/ISIT44484.2020.9174109. ISBN 978-1-7281-6432-8. S2CID 212634151.

[1012.5071-12] Naiss, Iddo; Permuter, Haim H. (January 2013). "Extension of the Blahut–Arimoto Algorithm for Maximizing Directed Information". IEEE Transactions on Information Theory. 59 (1): 204–222. arXiv:1012.5071. doi:10.1109/TIT.2012.2214202. S2CID 3115749.

[2008.924681-13] Permuter, Haim; Cuff, Paul; Van Roy, Benjamin; Weissman, Tsachy (July 2008). "Capacity of the Trapdoor Channel With Feedback". IEEE Transactions on Information Theory. 54 (7): 3150–3165. arXiv:cs/0610047. doi:10.1109/TIT.2008.924681. S2CID 1265.

[1205.4674-14] Elishco, Ohad; Permuter, Haim (September 2014). "Capacity and Coding for the Ising Channel With Feedback". IEEE Transactions on Information Theory. 60 (9): 5138–5149. arXiv:1205.4674. doi:10.1109/TIT.2014.2331951. S2CID 9761759.

[2015.2495239-15] Sabag, Oron; Permuter, Haim H.; Kashyap, Navin (January 2016). "The Feedback Capacity of the Binary Erasure Channel With a No-Consecutive-Ones Input Constraint". IEEE Transactions on Information Theory. 62 (1): 8–22. doi:10.1109/TIT.2015.2495239. S2CID 476381.

[1712.02690-16] Peled, Ori; Sabag, Oron; Permuter, Haim H. (July 2019). "Feedback Capacity and Coding for the $(0,k)$ -RLL Input-Constrained BEC". IEEE Transactions on Information Theory. 65 (7): 4097–4114. arXiv:1712.02690. doi:10.1109/TIT.2019.2903252. S2CID 86582654.

[2008.07983-17] Aharoni, Ziv; Sabag, Oron; Permuter, Haim Henri (18 August 2020). "Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet". arXiv:2008.07983 [cs.IT].

[18] Sabag, Oron; Permuter, Haim Henry; Pfister, Henry (March 2017). "A Single-Letter Upper Bound on the Feedback Capacity of Unifilar Finite-State Channels". IEEE Transactions on Information Theory. 63 (3): 1392–1409. arXiv:1604.01878. doi:10.1109/TIT.2016.2636851. S2CID 3259603.

[19] Sabag, Oron; Huleihel, Bashar; Permuter, Haim Henry (2020). "Graph-Based Encoders and their Performance for Finite-State Channels with Feedback". IEEE Transactions on Communications. 68 (4): 2106–2117. arXiv:1907.08063. doi:10.1109/TCOMM.2020.2965454. S2CID 197544824.

[20] Massey, J.L.; Massey, P.C. (September 2005). "Conservation of mutual and directed information". Proceedings. International Symposium on Information Theory, 2005. ISIT 2005.: 157–158. doi:10.1109/ISIT.2005.1523313. ISBN 0-7803-9151-9. S2CID 38053218.