Skip to main content

Theory and Modern Applications

Generalized kinetic theory of coarse-grained systems. I. Partial equilibrium and Markov approximations

Abstract

The general kinetic theory of coarse-grained systems is presented in the abstract formalism of communication theory developed by Shannon and Weaver, Khinchin and Kolmogorov. The martingale theory shows that, under reasonable, general hypotheses, coarse-grained systems can be approximated by generalized Markov systems. For mixing systems, the Kolmogorov entropy production can be defined for nonstationary processes as Kolmogorov defined it for stationary processes.

1 Introduction

The purpose of this article (and the one to follow) is to define a generalized kinetic theory of classical thermodynamic systems at a coarse-grained level (see Sect. 2 for definitions). The microscopic evolution of the system induces an evolution on the coarse-grained states, which is generally non-Markovian.

In the same context, it has been shown recently [1] that coarse-grained deterministic dynamical systems can be approximated by generalized Markov systems, which may explain why Markov processes are so popular in modeling actual phenomena. These conclusions were obtained by applying and extending some pioneering results of Kolmogorov [24]. The formalism used in our previous works was relatively intuitive, even if sometimes lengthy, but it was sufficient for our first aim. However, we had to adopt some hypotheses that could seem reasonable but were difficult to justify precisely.

In the present work, we adopt a more abstract and rigorous formalism and show that the previous results can be generalized to a much broader framework, as mentioned above. This formalism is the one used in communication theory by Shannon-Weaver [5] and by Khinchin [6] to define optimal coding, by Kolmogorov [2, 3] (and also [4] for a pedagogical exposition) to define entropic invariants of dynamical systems. It was also introduced in [7] in the Markovian situation only.

The system evolution is specified by a stationary distribution on the path space \(X^{Z}\), where X is the finite set of coarse-grained states, and Z represents the discrete time (see Sect. 2). At the coarse-grained level, the stationary evolution is not Markovian, but the advantage is that the evolution takes place on the finite state space X and that we avoid all controversial discussions concerning ergodicity and time scales for reaching equilibrium [8, 9].

Sections 2 and 3 fix notations and definitions and give basic examples. Section 4 introduces nonstationary processes: the initial condition is not the stationary state on X, but the evolution is given by the stationary process. It corresponds to the notion of partial equilibrium of Landau-Lifschitz [10]. We define the entropy production of both processes and show that they are equal, assuming a mixing property in Sect. 5. Although this result seems obvious, its proof is quite lengthy.

In Sect. 6, we address the main question of kinetic theory, namely why the evolution can be approximated by a Markovian evolution, as in the theory of Brownian motion, Fokker-Planck equation, etc. Obviously, one also has to use a coarse-grained time scale. We define various Markovian evolutions and prove that they approximate the exact evolution on the coarse-grained time scale using the production of relative entropy.

We want to dedicate this article to the memory of Prof. Mark Kac, who introduced one of us to the problems of justification of the Markov processes in statistical mechanics.

2 Notations and definitions

In this article, X denotes a finite set. Elements of X are denoted as \(x\in X\). Z is the set of positive, 0, or negative integers.

2.1 The spaces \(X^{Z}\), X

\(X^{Z}\) is the space of sequences (\(x(n)\)), \(n\in Z\), \(x(n) \in X\).

We define the shift τ: \(X^{Z} \rightarrow X^{Z}\) by

$$ \tau \bigl(x(.)\bigr) (n) = x(n - 1) $$
(2.1)

Let \(I = \{ i_{1}, \ldots, i_{l} \}\) be a finite subset of Z ordered by \(i_{1} < i_{2} < \cdots < i_{l}\). We define for \(k \in Z\)

$$ I + k = \{ i_{1 + k}, \ldots, i_{l} + k \} $$
(2.2)

\(X^{I}\) denotes the set of maps \(x(I)\in X^{I}\).

In expanded notations, we write

$$ x(I) \equiv ( x_{1}, i_{1}; x_{2}, i_{2}; \ldots; x_{l}, i_{l} ) $$
(2.3)

with

$$ x_{k = s(I) (i_{k})} $$

The shift is also defined as \(\tau : X^{I} \to X^{I + 1}\) by

$$ \tau \bigl( x(I) \bigr) \equiv ( x_{1}, i_{1} + 1: x_{2}, i_{2} + 1:; \ldots; x_{l}, i_{l} + 1 ) $$
(2.4)

for \(x(I)\) given by (2.3).

If \(J \subset I \) is a subset of I, \(x(I) \vert _{J} \) is the restriction of the map \(x(I)\) to the subset J.

Finally, if \(m \leq n\), and \(I = \{ m, m + 1, \ldots, n \}\) is the interval of integers between m and n, we denote

$$ x(I) \equiv x[m, n] $$
(2.5)

2.2 Probabilities on \(X^{\boldsymbol{Z}}\)

A stochastic process on X is the data of a system of probabilities \(p_{I}\) on \(X^{I}\) or all finite set \(I \subset Z \) with the compatibility conditions:

$$\begin{aligned}& \text{if } J \subset I \text{ and } x(J) \in X^{J}, \text{then} \\& p_{J} \bigl(x(J)\bigr) = \sum_{ \{ x(I) \in X^{I} \vert x(I) \vert _{J} = x(J) \}} p_{I} \bigl(x(I)\bigr) \end{aligned}$$
(2.6)

Obviously, the \(p_{I}\) are known as soon as the \(p_{[m,n]}\) are known.

It is known that a system of probabilities \(p_{I}\) satisfying the compatibility conditions of Eq. (2.6) defines a probability p sur \(X^{Z}\), the \(p_{I}\) being the marginal laws of p

$$ p_{{I}} \bigl(x(I)\bigr) = p \bigl( x(Z) \in X^{Z} \bigl\vert x(Z) \bigr\vert _{I} = x(I) \bigr) $$
(2.7)

This result is the extension Theorem of Kolmogorov [11]. The probability p is defined on the measurable subsets of \(X^{Z}\). By definition, the subset appearing in p in Eq. (2.7) is measurable.

The stochastic process p is stationary if for any measurable set \(A \subset X^{Z}\)

$$ p\bigl(\tau (A)\bigr) = p(A) $$

or equivalently, for any \(x(I) \in X^{I}\), and any I

$$\begin{aligned}& p_{I + 1}\bigl(\tau \bigl(x(I)\bigr) = p_{I}\bigl(x(I)\bigr) \bigr) \end{aligned}$$
(2.8)

In particular, if p is stationary, it defines a unique probability distribution \(p_{0}\) on X by

$$ p_{0}(x) = p_{[n]}\bigl((x,n)\bigr) $$

which is independent of n.

Remark

(Conventions and definition)

(i) In order to simplify the notations, we shall skip the index I of \(p_{I}\) whenever it is clear that we refer to \(p_{I}\). For instance, we write \(p(x(I))\) instead of \(p_{I}(x(I))\).

(ii) When we use conditional probabilities, the condition is always in the past: for instance, if \(m < n\)

$$ p\bigl(x(n)\bigr)| x[m,n - 1]) \equiv p_{[m,n]}\bigl(x(n)\bigr)| x[m,n - 1]) \equiv \frac{p_{[m,n]}(x[m,n])}{p_{[m,n - 1]}(x[m,n - 1])} $$

(iii) According to the usual definition [4], the stochastic process p is ergodic if any measurable set \(B \subset X^{Z}\) is invariant by τ having probability \(p(B)= 0\) or 1.

2.3 Coarse-graining

Let A be a partition of X: the elements of \(a\in A\) are subsets of \(a\subset X\) such that

$$\begin{aligned}& X = \bigcup_{a \in A} a \end{aligned}$$
(2.9)
$$\begin{aligned}& a \cap a ' = \phi \quad \text{if }a ' \ne a \end{aligned}$$
(2.10)

A probability q on X generates a probability \(q^{(A)}\) on A by

$$ q^{(A)}(a) = \sum_{x \in A} q(x) $$
(2.11)

The partition A on X induces the partitions \(A^{Z}\) of \(X^{Z}\) and \(A^{I}\) of \(X^{I}\). The stochastic process p induces a stochastic process \(p^{(A)}\) defined by

$$ p_{I}^{(A)}\bigl(a(I)\bigr) = \sum _{x(I) \in a(I)} p_{I}\bigl(x(I)\bigr) $$
(2.12)

where the notation \(x(I) \in a(I)\) means

$$ \bigl( x(I) \in a(I)\bigr) ) \quad \Leftrightarrow \quad \bigl( x(I) (i) \in a(I) (i) \text{ for all } i \in I \bigr) $$
(2.13)

\(p^{(A)}\) is a coarse-grained process of p. If p is stationary, \(p^{(A)}\) is stationary. If p is ergodic, \(p^{(A)}\) is ergodic.

Such coarse-grained processes are extensively used in physics and applied sciences when inaccurate observations cannot allow one to distinguish two different elements x belonging to the same subset a of A [1, 12].

3 Examples

We only cite a few well-known processes that are of interest to us.

(a) Bernoulli processes

Let μ be a probability on X. The Bernoulli process defined by μ is

$$ p_{I}\bigl(x(I)\bigr) = \prod_{i \in I} \mu \bigl(x(I) (i)\bigr) $$
(3.1)

It is stationary. It is ergodic if and only if \(\mu (x) > 0\) for any \(x\in X\).

(b) Markov processes

Let \(R = (R_{yx})\), \(y,x \in X \) be a stochastic matrix, so

$$ \sum_{y} R_{yx} = 1,\quad 0 \le R_{yx} \le 1 $$
(3.2)

Let μ be a stationary probability for R:

$$ \mu (y) = \sum_{x} R_{yx} \mu (x) $$
(3.3)

Then, we define a stochastic process by

$$ p_{[m,n]} \bigl(x[m,n]\bigr) = R_{x_{n}x_{n - 1}} \cdots R_{x_{k}x_{k - 1}} \cdots R_{x_{m + 1}x_{m}} \mu (x_{m}) $$
(3.4)

where

$$ x([m,n] = (x_{m}, m; x_{m + 1}, m + 1: \cdots x_{n}, n ) $$
(3.5)

This process is stationary. It is ergodic if and only if R is irreducible. The Bernoulli process (a) is a particular case when \(R_{yx} = \mu (x)\).

(c) Dynamical systems

These systems are of special interest to physics (see [1] and Remark 2 below). Let \((M, \mathcal{M}, \mu )\) be a probability space so that M is a measurable space with a σ-algebra \(\mathcal{M} \) of measurable subsets and μ a probability defined on \(\mathcal{M}\).

Let f: \(M\rightarrow M\) be a measurable bijection, which is measure-preserving, namely

$$ \mu (f^{ - 1}(\mathcal{B}) = \mu (\mathcal{B}),\quad \mathcal{B} \in \mathcal{M} $$
(3.6)

Let X be a finite partition of M in measurable subsets. We define a coarse-grained, stochastic process on X by the formula

$$ p_{[m,n]} \bigl(x[m,n]\bigr) = \mu \bigl( x_{m} \cap f^{ - 1}(x_{m + 1}) \cap \cdots f^{ - k}(x_{m + k}) \cap \cdots f^{m - n}(x_{n}) \bigr) $$
(3.7)

where \(x[m,n] \in X^{[m,n]}\) is given by Eq. (3.5).

Then \(p_{[m,n]} (x[m,n])\) is the measure of the subset of elements \(z\in M\) with

$$ z \in x_{m},\quad f(z) \in x_{m + 1}, \ldots, f^{n - m}(z) \in x_{n}. $$

This process is stationary. It is ergodic if f is ergodic (i.e., if the only measurable subset \(\mathcal{B}\) of M invariant by f is of measure \(\mu (\mathcal{B}) = 0\) or 1).

Remark 1

This definition, due to Kolmogorov, was introduced to define nonspatial invariants of dynamical systems [4].

Remark 2

A particularly interesting example [4] is the case when M is a phase space, and f is a Hamiltonian map (i.e., the map given by the solution of the Hamilton equation at a given time) and μ is the volume on M, which is preserved by f because of the Liouville theorem.

4 Changing initial conditions: definition of the nonstationary process . Production of entropy

4.1 Definition of a particular nonstationary process

Let A be a partition of X. The elements of A are subsets \(a \subset X\) satisfying Eqs. (2.5)–(2.10).

Let p be a stationary process on X, and q be a probability on A. These two data determine a process on X that is a probability on \(X^{N}\) given by the formulas

$$\begin{aligned}& \overline{p}_{[0,n]} (x\bigl([0,n]\bigr) = \frac{q(a(x_{0})}{p_{0}(a(x_{0})} p \bigl(x[0,n]\bigr)\quad (n \ge 0) \end{aligned}$$
(4.1)
$$\begin{aligned}& \overline{p}_{[m,n]} (x\bigl([0,n]\bigr) = \sum _{x[0,m - 1]} \frac{q(a(x_{0})}{p_{0}(a(x_{0})} p\bigl(x[0,n]\bigr) \quad (m < 0 \le n) \end{aligned}$$
(4.2)

Here

$$ \begin{gathered} x[0,n] \in X^{[0,n]} \\ a(x_{0}) \in A \quad \text{is the unique } a \text{ such that } x_{0} \in a \\ p_{0}(a) = \sum_{x \in a} p_{0}(x),\quad p_{0} \text{ being } p_{[0]}. \end{gathered} $$

Definitions (4.1)–(4.2) show that the distributions \(\overline{p}_{[0,n]}\) and \(p_{[m,n]}\) satisfy the compatibility conditions and define a probability on \(X^{N}\), and a stochastic process (induced by the integers ≥0) on X. This stochastic process is nonstationary (indeed, being indexed by the ≥0 integers, the stationarity is meaningless).

The initial distribution is

$$ \overline{p}_{0}(x_{0}) = \frac{q(a(x_{0}))}{p(a(x_{0}))} p_{0}(x_{0}) $$
(4.3)

and the distribution at time n is

$$ \overline{p}_{n}(x_{n}) = \sum _{x[0,n - 1]} \frac{q(a(x_{0}))}{p(a(x_{0}))} p\bigl(x[0,n - 1], x_{n} \bigr) $$
(4.4)

where \((x[0, n - 1], x_{n})\) denotes the path \(( x(0), 0; x(1), 1; \ldots; x(n - 1), n - 1; x_{n}, n )\).

Convention

As previously mentioned, we skip the indices I for \(\overline{p}_{I}\) when there is no possible confusion.

Lemma 4.1

The conditional probabilities of the process with the condition starting at time 0 are identical to the corresponding conditional probabilities of the process p, so, for \(0< k \leq n\)

$$ \overline{p} \bigl( x[k,n] | x[0,k - 1]\bigr) = p \bigl(x[k,n] | x[0,k - 1] \bigr) $$
(4.5)

The proof is obvious using (4.1).

4.2 Entropy and relative entropy

If Z is a finite set, if \(|Z|\) is the number of its elements, and if p, q are probabilities on Z, we define the entropy of p and the relative entropy of p and q by the usual formulas [7]

$$\begin{aligned}& S(p) = - \sum_{z \in Z} p(z) \ln p(z) \ge 0 \end{aligned}$$
(4.6)
$$\begin{aligned}& S(p| q) = \sum_{z \in Z} p(z) \ln \frac{p(z)}{q(z)} \ge 0 \end{aligned}$$
(4.7)

One has

$$ \begin{gathered} 0 \le S(p) \le \ln \vert Z \vert \\ S(p | q) \ge 0 \quad \text{and}\quad S(p | q) = 0 \text{ if and only if }p = q. \end{gathered} $$

4.3 Path entropy

For the stationary process, the nonstationary process, and any positive integer n, we define the path entropy \(S_{n}\)

$$\begin{aligned}& S_{n}(p) = S(p_{[0,n]} ) = - \sum _{x[0,n]} p\bigl(x[0,n]\bigr) \ln p\bigl(x[0,n] \bigr) \end{aligned}$$
(4.8)
$$\begin{aligned}& S_{n}(\overline{p}) = S(\overline{p}_{[0,n]} ) = - \sum _{x[0,n]} \overline{p}\bigl(x[0,n]\bigr) \ln \overline{p} \bigl(x[0,n] \bigr) \end{aligned}$$
(4.9)

Lemma 4.2

(a) One has the following identities:

$$ S_{n}(p) = S(p_{0}) + \sum_{k = 1}^{n} d_{k}S_{k}(p) $$
(4.10)

where

$$ d_{k}S_{k}(p) = S_{k}(p) - S_{k - 1}(p) $$
(4.11)

and the same identities with instead of p.

(b) One has

$$\begin{aligned}& d_{k} S_{k}(p) = \sum_{x[0,k - 1]} p \bigl(x[0,k - 1]\bigr) S \bigl( \bigl(. | x[0,k - 1]\bigr) \bigr) \ge 0 \end{aligned}$$
(4.12)
$$\begin{aligned}& d_{k} S_{k}(\overline{p}) = \sum _{x[0,k - 1]} \overline{p}\bigl(x[0,k - 1]\bigr) S \bigl( \bigl(. | x[0,k - 1]\bigr) \bigr) \ge 0 \end{aligned}$$
(4.13)

Proof

(a) is trivial. On the other hand, one has

$$\begin{aligned} d_{k} S_{k}(p) =& - \sum p \bigl(x[0,k]\bigr) \ln p(x[0,k] + \sum p\bigl(x[0,k - 1]\bigr) \ln p(x[0,k - 1] \\ =& - \sum p\bigl(x[0,k]\bigr) \ln \frac{p(x[0,k]}{p(x[0,k - 1]} \\ =& - \sum_{x[0,k - 1]} p\bigl(x[0,k - 1]\bigr)\sum _{x(k)} \frac{p(x[0,k])}{p(x[0,k - 1])} \ln \frac{p(x[0,k]}{p(x[0,k - 1]} \end{aligned}$$

which is (4.12). Similarly, we derive (4.13) using Lemma 4.1. □

Lemma 4.3

(a) For the stationary process p, one has the identity

$$\begin{aligned}& d_{k} S_{k}(p_{[0,k]}) - d_{k - 1} S_{k - 1}(p_{[0,k - 1]}) \\& \quad = - \sum_{x[ - k, - 1]} p \bigl( x[ - k, - 1] \bigr) S \bigl( p\bigl(. | x[ - k, - 1] \bigr) | p\bigl(. | x[ - k + 1, - 1] \bigr) \bigr) \le 0 \end{aligned}$$

(b) \(d_{k} S_{k}(p(p_{[0,k]})\) is a decreasing sequence with a limit \(s(p)\).

Proof

Using the definition of \(d_{k} S \) (Eq. (4.12), one has by stationarity of p

$$\begin{aligned}& d_{k} S_{k}(p(p_{[0,k]}) - d_{k - 1} S_{k - 1}(p(p_{[0,k - 1]}) \\& \quad = - \Bigl[ \sum p\bigl(x[0, k]\bigr) \ln p \bigl( x(k) | x[0, k - 1] \bigr) \\& \qquad {}- \sum p\bigl(x[0, k - 1]\bigr) \ln p \bigl( x(k - 1) | x[0, k - 2] \bigr) \Bigr] \\& \quad = - \Bigl[ \sum p\bigl(x[ - k, 0]\bigr) \ln p \bigl( x(0) | x[ - k, - 1] \bigr) \\& \qquad {}- \sum p\bigl(x[ - k + 1, 0]\bigr) \ln p \bigl( x(0) | x[ - k + 1, - 1] \bigr) \Bigr] \\& \quad = - \sum p(x\bigl([ - k, 0]\bigr) \ln \frac{p ( x(0) | x[ - k, - 1] )}{p ( x(0) | x[ - k + 1, - 1] )} \\& \quad \equiv - \sum p\bigl(x[ - k, - 1]\bigr) S ( p\bigl(. \bigl\vert x[ - k, - 1] \bigr\vert p\bigl(. | x[ - k + 1, - 1 \bigr) \bigr) \le 0 \end{aligned}$$

 □

4.4 The case of a stationary probability p

In this case, we will use a theorem that was first presented in Ref. [1], using the concept of martingale (see, for instance, [13, 14], or [12] for a simplified definition).

Theorem 4.4

For \(x = x(0)\in X\), the sequence of random variables \(p ( x \vert x[ - k, - 1] )\) is a martingale with respect to the sequence \(\mathcal{F}_{k}\) of σ-algebras generated by \(x([-k,-1])\). Moreover, these random variables are positively bounded by 1.

Using this theorem, Lemma 4.3(b), and the identity (4.10), one obtains the result of Kolmogorov-Shannon [4, 6]:

Theorem 4.5

For the stationary process p, one has

(a) \(d_{k}S_{k}(p)\) has a limit \(s(p)\) for \(k \rightarrow \infty \)

(b) One has

$$ \lim_{n \to \infty} \frac{1}{n} S_{n}(p) = s(p) $$
(4.14)

Definition

\(s(p)\) is the (asymptotic) production of entropy per unit time of the process p.

For completeness, the proofs of Theorems 4.4 and 4.5 are given in Appendix A.

Remark 3

In the special case where p comes from a dynamical system (Eq. (3.7)), it is proved in [3] that \(d_{k}S_{k}(p)\) is a decreasing sequence, and there is no need to use martingale theory.

4.5 Non-stationary probability

In the nonstationary case, the production of entropy at time k is \(d_{k}S_{k}(\overline{p} )\). The asymptotic entropy production of is well defined if \(d_{k}S_{k}(\overline{p} )\) tends to a limit when \(k \rightarrow \infty \). As shown later, further hypotheses are necessary for such a limit to exist. In the general case, we can only prove that

$$ C' s(p) \le \lim \inf d_{k}S_{k}( \overline{p}) \le \lim \sup d_{k}S_{k}(\overline{p}) \le C'' s(p) $$
(4.15)

where \(C'\) (resp. \(C''\)) is the lower (resp. upper) bound of \(q(x)/p(x)\) (\(0\leq C' \leq 1 \leq C''\)).

Proof

These inequalities straightforwardly result from Eqs. (4.1)–(4.2) and Theorem 4.4. □

5 Production of entropy for a nonstationary distribution

5.1 Mixing process

We say that the stationary process p is a mixing process if for \(0 < n< n+ k\)

$$ \lim_{n \to \infty} p_{[0, n + k)} \bigl( x(0), x[n,n + k] \bigr) = p_{0}\bigl(x(0)\bigr) p_{[0, k)} \bigl( \tau ^{ - n} \bigl(x[n,n + k]\bigr) \bigr) $$
(5.1)

In expanded notations, this means that for \(n \rightarrow \infty \)

$$\begin{aligned} &p \bigl( x(0), 0: x_{n}, n; x_{n + 1}, n + 1; \ldots; x_{n + k}, n + k \bigr) \\ &\quad \to p\bigl(x(0), 0\bigr) p ( x_{n}, 0; x_{n + 1}, 1; \ldots; x_{n + k}, k ) \end{aligned}$$
(5.2)

for any sequence \(x(0), 0: x_{n}, n; x_{n + 1}, n + 1; \ldots; x_{n + k}, n + k\) of \(k+1\) elements.

The mixing property implies ergodicity [4].

Theorem 5.1

If p is mixing, the nonstationary process defined in Sect4has an asymptotic distribution in X, which is \(p(x)\).

Proof

The asymptotic distribution of (if it exists) is

$$ \begin{aligned} \overline{p}_{\infty} (x) &= \lim _{n \to \infty} \sum_{x[0,n - 1]} \overline{p} \bigl(x[0,n]\bigr) \quad \text{with }x(n) = x \text{ fixed} \\ &= \lim_{n \to \infty} \sum_{x(0)} \overline{p}(x\bigl(x(0), 0; x(n), n\bigr) \\ &= \lim_{n \to \infty} \sum_{a \in A} \frac{q(a)}{p(a)}\sum_{x(0) \in a} p\bigl(x(0), 0; x(n), n \bigr) \end{aligned} $$
(5.3)

But \(p(x(0), 0; x(n), n) \to p(x(0)) p(x)\) by the mixing property. As the sum over \(a\in A\) is finite, the limit in Eq. (5.3) exists, and it is \(p(x)\). □

5.2 Production of entropy for when p is mixing. The main theorem

Theorem 5.2

Assume that p is a mixing process. Then

  1. (a)

    \(d_{n}S_{n}(\overline{p})\) has the limit \(s(p)\)

  2. (b)

    \(\lim_{n} \frac{1}{n}S_{n}(\overline{p}) = s(p)\)

Then has a well-defined production of entropy, which is the same as the entropy production of p.

The proof of this basic theorem is the consequence of successive partial results, which are postponed to Section 5.4 and completed in Appendix B.

5.3 Production of entropy for a mixing process p

Theorem 5.3

If p is a mixing process, one has

$$\begin{aligned}& d_{k}S_{k}(p) = \lim_{n \to \infty} \sum p \bigl( x(0), x[n, n + k - 1] \bigr) S \bigl( p\bigl(. | x(0), x[n, n + k - 1] \bigr) \bigr) \end{aligned}$$
(5.4)

the sum being taken over \(x(0) \in X\) and over \(x[n, n + k - 1] \in X^{[n, n + k - 1]}\).

Proof

The mixing property (5.1) implies that the conditional probability \(p_{[0, n + k]} ( x(n + k) | x(0), x[n, n + k - 1] )\) has a limit:

$$\begin{aligned}& \lim_{n \to \infty} p_{[0, n + k]} \bigl( x(n + k) | x(0), x[n, n + k - 1] \bigr) \\& \quad = p_{[0, k]} ( x(n + k) | \tau ^{ - n}\bigl(x[n, n + k - 1] \bigr)1 \end{aligned}$$
(5.5)

where the limit is taken with a fixed k and fixed \(x(0), x(n),\ldots,x( n + k - 1), x(n + k)\). Then

$$ \begin{gathered} - p\bigl(. | x(0), x[n, n + k - 1]\bigr) \ln \bigl( p\bigl(. | x(0), x[n, n + k - 1]\bigr) \bigr) \\ \quad \to - p \bigl(. | \tau ^{ - n}\bigl(x[n,n + k - 1]\bigr) \bigr) \ln p \bigl(. | \tau ^{ - n}\bigl(x[n,n + k - 1]\bigr) \bigr) \end{gathered} $$
(5.6)

and all these quantities are uniformly bounded by \(\max_{0 \leq \alpha \leq 1} |\alpha \ln \alpha |\). As X is finite, we can sum (5.6) on \(x(n+k)\) and obtain

$$ S \bigl( p\bigl(. | x(0), x[n, n + k - 1]\bigr) \bigr) \to S \bigl( p \bigl(. | \tau ^{ - n} x[n, n + k - 1]\bigr) \bigr) $$
(5.7)

while staying uniformly bounded. By the Lebesgue theorem of dominated convergence in \(L^{1}(Z, p)\), we have [15]

$$ \mathrm{E}_{ p} \bigl\{ S \bigl( p\bigl(. | x(0), x[n, n + k - 1] \bigr) \bigr) \bigr\} \to \mathrm{E}_{ p} \bigl\{ S \bigl( p \bigl(. | \tau ^{ - n} x[n, n + k - 1]\bigr) \bigr) \bigr\} $$
(5.8)

where \(\mathrm{E}_{p}\{\}\) is the mathematical expectation for the measure p. Now, the first term in Eq. (5.8) is

$$\begin{aligned}& \sum_{x(0), x[n, n + k - 1]} p\bigl(x(0), x[n, n + k - 1]\bigr) S \bigl( p\bigl(. | x(0), x[n, n + k - 1]\bigr) \bigr) \end{aligned}$$

and the last member in Eq. (5.8) js, using (4.12)

$$ \sum_{x[0, k - 1]} p(x[0, k - 1] S \bigl( p\bigl(. | x[0, k - 1]\bigr) \bigr) \equiv d_{k}S_{k}(p) $$

which proves Eq. (5.4). □

Theorem 5.4

If p is a mixing process, for any probability q in X and for the associated process defined in Sect4, one has

$$ d_{k}S_{k}(p) = \lim_{n \to \infty} \sum \overline{p} \bigl( x(0), x[n, n + k - 1] \bigr) S ( \overline{p}\bigl(. | x(0), x[n, n + k - 1] \bigr) $$
(5.9)

where k is fixed and the sum is over \(x(0)\), \(x[n, n+k]\).

Proof

Using the definition of and Lemma 4.1, one has

$$ \begin{gathered} \sum \overline{p} \bigl( x(0), x[n, n + k - 1] \bigr) S \bigl( \overline{p}\bigl(. \{ x(0), x[n, n + k - 1]\bigr) \bigr) \\ \quad \equiv \mathrm{E}_{ p} \biggl\{ \frac{q(a(x(0)))}{p(a(x(0)))} S ( p\bigl(. | x(0), x[n, n + k - 1] \bigr) \biggr\} \\ \quad = \sum_{a \in A} \frac{q(a)}{p(a)} \mathrm{E}_{ p} \bigl\{ \mathbf{1}_{a}\bigl(x(a)\bigr) S ( p \bigl(. | x(0), x[n, n + k - 1] \bigr) \bigr\} \end{gathered} $$
(5.10)

where \(\mathbf{1}_{a}\) is the characteristic function of the subset a of X. By the mixing property Eq. (5.1) and Lebesgue theorem of dominated convergence, we have

$$ \begin{gathered} \lim_{n \to \infty} \mathrm{E}_{ p} \bigl\{ \mathbf{1}_{a}\bigl(x(a)\bigr) S ( p\bigl(. | x(0), x[n, n + k - 1] \bigr) \bigr\} \\ \quad = p(a) \mathrm{E}_{ p} \bigl\{ S ( p\bigl(. | \tau ^{ - n}x[n, n + k - 1] \bigr) \bigr\} \\ \quad \equiv p(a) \sum p(x\bigl([0, k - 1]\bigr) S ( p\bigl(. | x[0, k - 1] \bigr) \equiv p(a) d_{k}S_{k}(p) \end{gathered} $$
(5.11)

where we have used (4.12). So, by Eq. (5.10), we have proved Eq. (5.9). □

5.4 Proof of Theorem 5.2

We first derive several successive lemmas.

Lemma 5.5

Let

  • \(q(x, y, z)\) be a probability distribution on three variables x, y, z taking discrete values,

  • \(q(x)\) and \(q(x, y)\) the corresponding marginal laws,

  • \(q(z|x)\) and \(q(z|x, y)\) the corresponding conditional laws of z.

Denote by \(S_{Z}\) the entropy of the probability distribution of z. Then, we have the identity

$$\begin{aligned}& \sum_{x} q(x) S_{Z}(q(. | x) - \sum _{x, y} q(x, y) S_{Z}\biggl(q(. | x, y) = \sum_{x, y} q(x, y) S_{Z} \bigl( q(. | x, y) \bigr) | q(. | x) \biggr) \ge 0 \end{aligned}$$
(5.12)

Proof

We apply the definitions to the first member to obtain identity (5.12). □

Lemma 5.6

One has the identity

$$\begin{aligned}& \sum \overline{p} ( x0), x[n, n + k - 1] ) S \bigl( \overline{p}(. | x0), x[n, n + k - 1] \bigr) \\& \qquad {}- \sum \overline{p} \bigl( x[0, n + k - 1] \bigr) S ( \overline{p}\bigl(. | x[0, n + k - 1] \bigr) \\& \quad = \sum \overline{p} \bigl( x[0, n + k - 1] \bigr) S \bigl( \overline{p}\bigl(. | x[0, n + k - 1]\bigr) | \overline{p}\bigl(. | x(0), x[n; n + k - 1]\bigr) \bigr) \end{aligned}$$
(5.13)

where each summation is over the variables appearing in the concerned probabilities. For instance, the first sum on the left is over \(x(0), x(n),\ldots, x(n+k-1)\).

Proof of Lemma 5.6

We apply Lemma 5.5 to \(q = \overline{p}\) with the substitutions

$$ x \to \bigl( x(0),x[n, n + k - 1] \bigr), \qquad y \to x[1, n - 1, \qquad z \to x[n + k] $$

so that \((x, y) \to x[0, n + k - 1]\). □

Lemma 5.7

We have

$$\begin{aligned}& \lim_{k \to \infty} \lim_{n \to \infty} \sum \overline{p}\bigl(x[0, n + k - 1]\bigr) \\& \quad {}\times S ( \overline{p}\bigl(. | x[0, n + k - 1] \bigr) | \overline{p}\bigl(. | x(0), x[n, n + k - 1] \bigr) = 0 \end{aligned}$$
(5.14)

The proof of this lemma implies that the first member of (5.13) tends to 0 when n and k tend to be infinite, which may seem intuitive from the definition of mixing. However, rigorous proof of Lemma 5.7 requires several further steps, as shown in Appendix B. It allows one to complete the proof of the basic Theorem 5.2.

End of the proof of Theorem 5.2

We start with the identity (5.13) of Lemma 5.6. The second term of the right member of this identity is just

$$ - \sum \overline{p}\bigl(x[0, n + k - 1]\bigr) S \bigl( \overline{p} \bigl(. | x[0, n + k - 1]\bigr) \bigr) \equiv d_{n + k}S_{n + k}( \overline{p}) $$
(5.15)

We see that the limit when \(n \rightarrow \infty \) of the first term is, using Theorem 5.4, Eq. (5.9)

$$ \lim_{n \to \infty} \sum \overline{p}\bigl(x(0),x[n, n + k - 1]\bigr) S ( \overline{p}\bigl(. | x\bigl(x(0), x[n, n + k - 1]\bigr) \bigr) \equiv d_{k}S_{k}(p) $$
(5.16)

Now, in the identity (5.13), taking the limits when \(n \rightarrow \infty \) and then \(k \rightarrow \infty \), the first member tends to 0 by Lemma 5.7. Taking the same limits in Eq. (5.16), its first term tends to \(s(p)\). So, the second term of Eq. (5.13) \(d_{n + k}S_{n + k}(\overline{p})\), has a limit, which is \(s(p)\). Thus, we have proved that the nonstationary process has a production of entropy \(s(\overline{p}) = s(p)\). □

6 Markov approximations

6.1 The process \(p^{(T)}\) of memory T associated to p

In general, the process p has an infinite memory. Let T be a positive integer. We define a process \(p^{(T)}\) on X of memory T associated to the process p by the formulas

$$ \begin{gathered} p^{(T)}\bigl(x[0, k]\bigr) = p\bigl(x[0, k] \bigr)\quad \text{if }0 \le k \le T - 1 \\ p^{(T)}\bigl(x[0, k]\bigr) = p\bigl(x[0, T - 1]\bigr) \prod _{j = T}^{k} p \bigl( x(j) | x[j - T, j - 1] \bigr) \quad \text{if }k \ge T \end{gathered} $$
(6.1)

Distance between p and \(p^{(T)}\)

An asymmetric “distance” between p and \(p^{(T)}\) for n-step trajectories can be evaluated from the relative entropy of these two processes (see Sect. 6.4. below):

$$ S \bigl( p_{[0, n]}| p_{ [0, n]}^{(T)} \bigr) $$

This quantity is related to the total variation distance between \(p_{[0, n]}\) and \(p_{ [0, n]}^{(T)}\), as shown in Sect. 6.4.

Theorem 6.1

For every \(\varepsilon > 0\), there exists a time \(T_{\varepsilon }\) such that for \(n \geq T\geq T_{\varepsilon}\), one has

$$ 0 \le \frac{1}{n} S \bigl( p_{[0, n]}| p_{ [0, n]}^{(T)} \bigr) \le \varepsilon $$
(6.2)

So, the distance between \(p_{[0,n]}\) and \(p_{ [0, n]}^{(T)}\) tends to 0 when \(n \rightarrow \infty \).

Proof

Using the definition of the relative entropy, Eq. (4.7), one has

$$ S \bigl( p_{[0, n]}| p_{ [0, n]}^{(T)} \bigr) = - S ( p_{[0, n]} ) - \sum_{x[0,n]} p\bigl(x[0,n]\bigr) \ln p^{(T)}\bigl(x[0,n]\bigr) $$
(6.3)

On the other hand, it follows from Eq. (4.9) that

$$ - S ( p_{[0, n]} ) = - S(p_{0}) - \sum _{k = 1}^{n} d_{k} S_{k}(p) $$
(6.4)

and by definition (6.1), if \(n \geq T\)

$$ \begin{gathered} - \sum_{x[0,n]} p \bigl(x[0,n]\bigr) \ln p^{(T)}\bigl(x[0,n]\bigr) \\ \quad = S\bigl(p[0,T - 1]\bigr)- \sum_{j = T}^{n} \sum_{x[j - T,j]} p\bigl(x[j - T,j]\bigr) \ln p\bigl(x(j) | x [ j - T,j - 1]\bigr) \end{gathered} $$
(6.5)

By the stationarity of p, this is

$$ \begin{gathered} S(p_{[0,T - 1]}) - \sum _{j = T}^{n} \sum_{x[0,T]} p \bigl(x[0,T]\bigr) \ln p\bigl(x(T) | x [ 0,T - 1]\bigr) \\ \quad = S(p_{[0,T - 1]}) + (n - T + 1) d_{T}S_{T}(p) \end{gathered} $$
(6.6)

From Eqs. (6.4), (6.6), and (6.3), we obtain

$$ S \bigl( p_{[0, n]}| p_{ [0, n]}^{(T)} \bigr) = \sum_{k = T}^{k = n} \bigl( d_{T}S_{T}(p) - d_{k}S_{k}(p) \bigr) $$
(6.7)

According to Theorem 4.4, \(d_{k}S_{k}(p)\) decreases when k increases, so each term of the sum in Eq. (6.7) is ≥0, and \(d_{k}S_{k}(p) \to s(p)\). Choose \(T_{\varepsilon}\) so that or \(k \geq T \geq T_{\varepsilon }\)

$$ s(p) \le d_{k}S_{k}(p) \le s(p) + \varepsilon $$
(6.8)

Then, for \(n \geq T \geq T_{\epsilon }\)

$$ S \bigl( p_{[0, n]}| p_{[0, n]}^{T} \bigr) \le (n - T_{\varepsilon} ) \varepsilon $$
(6.9)

which completes the proof of Theorem 6.1. □

6.2 Partial histories of length T

Definitions

A partial history of length T is an element of \(X^{T}\). The nth history of length T is

$$ x^{(T)}(n) = \bigl( x(nT)\bigr), x(nT + 1), \ldots, x\bigl((n + 1)T - 1 \bigr) ) \in X^{T} $$
(6.10)

If M and N are positive integers (\(M < N\)), a sequence of partial histories is

$$ x^{(T)}(M, N] = \bigl( x^{(T)}(M), \ldots, x^{(T)}(N) \bigr) $$
(6.11)

We also denote by \(\tau ^{(T)}\) the translation of time T on histories of length T.

Theorem 6.2

(a) The process \(p^{(T)}\) induces a Markov process \(\tilde{p}^{(T)}\) on partial histories of length T by the formulas

$$ \begin{gathered} \tilde{p}^{(T)} \bigl(x^{(T)}(0) \bigr) = p\bigl(x[0, T - 1]\bigr) \\ \tilde{p}^{(T)} \bigl(x^{(T)}[0, N]\bigr)) = p^{(T)} \bigl(x\bigl[0, (N + 1)T - 1\bigr] \bigr) \end{gathered} $$
(6.12)

The transition probabilities between histories of length T are

$$ R \bigl( x^{(T)}(1)| x^{(T)}(0) \bigr) = \prod _{j = T}^{2T - 1} p \bigl( x(j)| x[j - T, j - 1] \bigr) $$
(6.13)

where, according to Eq. (6.10)

$$ x^{(T)}(0) = \bigl( x(0), \ldots, x(T - 1) \bigr),\qquad x^{(T)}(1) = \bigl( x(T), \ldots, x(2 T - 1) \bigr) $$
(6.14)

(b) The stationary probability of the Markov process \(\tilde{p}^{(T)}\) is \(\tilde{p}^{(T)}(x{}^{(T)}(0))\).

(c) The production of entropy of \(\tilde{p}^{(T)}\) is

$$ s\bigl(\tilde{p}^{(T)}\bigr) = \sum_{x^{{(T)}}(0)} \tilde{p}^{(T)} \bigl( x^{(T)}(0) \bigr) S ( R\bigl(. | x^{(T)}(0) \bigr) $$
(6.15)

where \(S( R(. | x^{(T)}(0) )\) is the entropy of the probability distribution \(x^{(T)}(1) \to R ( x^{(T)}(1) | x^{(T)}(0) )\).

Proof

Using the definitions of Eqs. (6.13) and (6.1), we have

$$\begin{aligned} \begin{aligned} \tilde{p}^{(T)}\bigl(x^{(T)}[0,N] \bigr) &= \tilde{p}^{(T)}\bigl(x^{(T)}(0)\bigr) \prod _{j = T}^{(N + 1)T - 1} p \bigl( x(j) | x[j - T, j - 1] \bigr) \\ &\equiv \tilde{p}^{(T)}\bigl(x^{(T)}(0)\bigr) \prod _{k = 1}^{N} R \bigl( x^{(T)}(k) | x^{(T)}(k - 1)] \bigr) \end{aligned} \end{aligned}$$
(6.16)

We now show that \(\tilde{p}^{(T)}(x^{(T)}(0))\) is the stationary probability. It will prove that (6.16) is the usual formula [16] for a Markov process (Eq. (3.4) in Sect. 3). One has

$$\begin{aligned}& \sum_{x^{(T)}(0)} R \bigl( x^{(T)}(1) | x^{(T)}(0)] \tilde{p}^{(T)} \bigl(x^{(T)}(0)\bigr) \bigr) \\& \quad = \sum_{x[0, T - 1]} p\bigl(x[0,T - 1]\bigr) p \bigl( x(T) | x[0,T - 1] \bigr) p \bigl( x(T + 1) | x[1,T] \bigr) \cdots \\& \qquad {}\cdots p \bigl( x(2T - 1) | x[T - 1, 2T - 2] \bigr) \\& \quad = \sum_{x[0, T - 1]} p\bigl(x[0,T]\bigr) \frac{p ( x[1, T + 1] )}{p ( x[1, T] )} p \bigl( x(T + 2) | x[2,T + ] \bigr) \cdots \\& \quad = \sum_{x[1, T - 1]} p\bigl(x[1,T + 1]\bigr) \frac{p ( x[2, T + 2] )}{p ( x[1, T + 1] )} \cdots = \cdots \\& \quad = p \bigl( x[T, 2T - 1] \bigr) \\& \quad = p \bigl( \tau ^{ - T} x[T, 2T - 1] \bigr) = \tilde{p}^{(T)} \bigl( \tau ^{ - 1} x^{{}_{(T)}}(1) \bigr) \end{aligned}$$
(6.17)

So, \(\tilde{p}^{(T)}(x^{(T)}(0))\) is indeed the stationary probability of the Markov process \(\tilde{p}^{(T)}\).

On the other hand, Eq. (6.15) is just the usual formula for the entropy production of a Markov process [16]. □

6.3 Comparison of p and \(\tilde{p}^{(T)}\)

The process p induces a stationary process on the partial histories of length T (denoted \(p_{T}\)) by

$$ p_{T} \bigl( x^{(T)}[M,N] \bigr) = p \bigl( x\bigl[MT, (N + 1)T - 1\bigr] \bigr) $$
(6.18)

The process \(p_{T}\) is exactly the same as p except that it is restricted to an integer number of time T.

The entropy production of \(p_{T}\) is

$$ s_{T}(p_{T}) = T s(p) $$
(6.19)

So, one can rewrite Theorem 6.1 as follows.

Theorem 6.3

Denote by \(S_{T}\)(. |.) the relative entropy of two processes defined on histories of length T. Then, for any \(\epsilon > 0\), there exists \(T _{\epsilon}\), independent of N, such that for \(T \geq T_{\epsilon }\) one has

$$ \frac{1}{NT} S_{T} \bigl( p_{T, [0, n]}] | \tilde{p}^{(T)}_{[0, N]} \bigr) \le \varepsilon $$
(6.20)

6.4 Distance between \(p_{T}\) and \(\tilde{p}^{(T)}\)

We can interpret this relation as follows. If p, q are probabilities on a finite space Z, the following Pinsker inequality [17] relates the relative entropy of p and q to the total variation distance of distribution p and q:

$$ \frac{1}{2} \biggl( \sum_{z \in Z} \bigl\vert p(z) - q(z) \bigr\vert \biggr)^{ 2} \le S(p| q) $$

This shows that \(S (p|q)\) represents an asymmetrical distance between p and q. Equation (6.20) implies that the absolute distance between the actual process \(p_{T}\) and the Markov process \(\tilde{p}^{(T)}\), divided by T, goes to 0 for long times T.

Theorem 6.4

One has for the production of entropy

$$ \lim_{T \to \infty} \frac{1}{T}s_{T}\bigl( \tilde{p}^{(T)}\bigr) = s(p) $$
(6.21)

Proof

We use the expression of the entropy production of a Markov process (Eq. (6.15))

$$\begin{aligned} s\bigl(\tilde{p}^{(T)}\bigr) =& - \sum _{x[0, T - 1]} p(x[0,T - 1] \prod_{j = T}^{2T - 1} p \bigl( x(j) | x[j - T, j - 1] \bigr) \\ &{}\times \sum_{k = T}^{2T - 1} \ln p \bigl( x(k) | x[k - T, k - 1] \bigr) \\ =& - \sum_{k = T}^{2T - 1} \sum p \bigl(x[0, T - 1] \bigr) \prod_{j = T}^{k} p \bigl( x(j) | x[j - T, j - 1] \bigr) \ln p \bigl( x(k) | x[k - T, k - 1] \bigr) \\ =& - \sum_{k = T}^{2T - 1} \sum p \bigl(x[1, T] \bigr) \prod_{j = T + 1}^{k} p \bigl( x(j) | x[j - T, j - 1] \bigr) \ln p \bigl( x(k) | x[k - T, k - 1] \bigr) \\ =& \cdots \\ =& - \sum_{k = T}^{2T - 1} \sum _{x[k - T,k]} p\bigl(x[k - T,k - 1]\bigr) \ln p \bigl( x(k) | x[k - T,k - 1] \bigr) \\ &{}\text{and by the stationarity of } p \text{ and Eq. (4.12)} \\ =& - T \sum_{x[0,T]} p\bigl(x[0,T]\bigr) \ln p \bigl( x(T)| x[0,T - 1] \bigr) = T d_{T}S_{T}(p) \end{aligned}$$

However, by Theorem 4.5, \(d_{T}S_{T}(p) \to s(p)\) if \(T\rightarrow \infty \), which gives Theorem 6.4. □

6.5 Attenuation of the memory

We come back to the process \(p_{T}\) on histories of length T. We now prove

Theorem 6.5

For a fixed integer N≥1, one has

$$ \begin{aligned}&\lim_{T \to \infty} \frac{1}{T}\sum _{x^{(T)} [0,N - 1]} p\bigl(x^{(T)}[0,N - 1]\bigr) \\ &\quad {}\times S_{T} \bigl( p_{T} \bigl(. | x^{(T)} [0,N - 1] \bigr) | p_{T} \bigl(. | x^{(T)} (N - 1) \bigr) \bigr) = 0 \end{aligned}$$
(6.22)

Proof

We use the definition of relative entropy and decompose the sum of Eq. (6.22) into two terms:

$$ \begin{aligned}A_{1} &\equiv - d_{N}S_{T,N}(p_{T}) \\ &= - \sum_{x^{(T)} [0,N - 1]} p\bigl(x^{(T)}[0,N - 1] \bigr)\times S_{T} \bigl( p\bigl(. | x^{(T)} [0,N - 1] \bigr) \bigr) \\ & = - ( S(p_{[0,NT - 1]} - S(p_{[0,(N - 1)T - 1]} ) = - \sum _{k = (N - 1)T}^{NT - 1} d_{k}S_{k}(p) \end{aligned} $$
(6.23)

and

$$ \begin{aligned} A_{2} \equiv{}& - \sum _{x^{(T)} [0,N - 1]} p\bigl(x^{(T)}[0,N - 1]\bigr) \\ &{}\times \sum_{x{}^{(T)}(N)} p_{T} \bigl( x{}^{(T)}(N) | x^{(T)} [0,N - 1] \bigr) \ln p_{T} \bigl( x{}^{(T)}(N) | x^{(T)} (N - 1) \bigr) \\ = {}&- \sum_{x^{(T)} [0,1]} p\bigl(x^{(T)}[0,1] \bigr) \ln p_{T} \bigl( x{}^{(T)}(1) | x^{(T)} (0) \bigr) \\ ={}& S ( p_{[0, 2T - 1]} ) - S ( p_{[0,T - 1]} ) = \sum _{k = T}^{2T - 1} d_{k}S_{k}(p) \end{aligned} $$
(6.24)

Both sums (6.23) and (6.24) contain T terms \(d_{k}S_{k}(p)\) which tend to \(s(p)\). This gives the result (6.22). □

As a consequence, if T is large enough, within a given accuracy, ε is possible to neglect the distance between the process at time NT, with complete history from time 0, and the process with history limited to the last period of length T, between times NT and \((N-1)T\). In practice, one can neglect the memory after times larger than T.

7 Conclusion

It has been rigorously proved that coarse-graining dynamical systems induce new systems that partially approximate the original systems. This conclusion is often anticipated intuitively in modeling physical or applied phenomena, which most generally needs simplifying and approximating actual observations. Because of its importance, this question has been, for a long time, the matter of many studies (see, for instance [18], and references therein), but it is difficult to obtain both a general approach and exact results on dynamical problems. Recently, it has been shown that innovative concepts introduced by Kolmogorov somewhat sixty years ago can be combined with the martingale theory to yield novel results in this domain. At first, this point of view was applied to classical Hamiltonian systems [1], and a major result was that under appropriate, realistic conditions, coarse-graining systems generate an approximate Markov system. Here, we have seen that the same reasoning applies to much more general, possibly stochastic, processes. Using a purely mathematical formalism, we obtained new, more general conclusions.

In particular, we have proved that the Kolmogorov entropy, introduced by Kolmogorov for ergodic stationary processes [4], also exists for a class of nonstationary processes defined for coarse-grained systems: these processes are obtained by imposing a nonstationary coarse-grained initial probability distribution, whereas the initial conditional distribution remains stationary in each grain. Such nonstationary coarse-grained distributions can be adopted in realistic mesoscopic systems if they are initially constrained to nonequilibrium. In contrast, local equilibrium is almost instantaneously re-established: these approximations are often valid in realistic examples [10, 19], which justifies studying this special class of nonstationary processes. Moreover, it has been proved that the asymptotic entropy production of these nonstationary processes is identical to the entropy production of the microscopic stationary process, provided this one is mixing. This is our main result, which allows one to approximate a large class of coarse-grained dynamical processes by Markov processes.

Alternatively, within the framework of the previous general theory, a forthcoming article [20] will present further exact results concerning the comparison of different coarse-grainings of dynamical systems that are of interest for modeling Markov and non-Markov processes.

References

  1. Gaveau, B., Moreau, M.: Chaos 30, 083104 (2020)

    Article  MathSciNet  Google Scholar 

  2. Kolmogorov, A.N.: Dokl. Akad. Nauk SSSR 119, 861 (1958)

    MathSciNet  Google Scholar 

  3. Kolmogorov, A.N.: Dokl. Akad. Nauk SSSR 124, 754 (1959)

    MathSciNet  Google Scholar 

  4. Arnold, V.I., Avez, A.: Ergodic problems of Classical Mechanics. Mathematical Physics, Monographs, Benjamin (1968)

  5. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)

    Google Scholar 

  6. Khinchin, A.I.: Mathematical Foundation of Information Theory. Dover, New York (1957)

    Google Scholar 

  7. Gaveau, B., Schulman, L.S.: J. Math. Phys. 37, 3897 (1996)

    Article  MathSciNet  Google Scholar 

  8. Huang, K.: Statistical Mechanics. CRC Press, Boca Raton (1987)

    Google Scholar 

  9. Gaveau, B., Schulman, L.S.: Eur. Phys. J. 224, 891 (2015)

    Google Scholar 

  10. Landau, L., Lifschitz, E.: Statistical Physics, 3rd edn. Pergamon, Elmsford (1969)

    Google Scholar 

  11. Bass, R.F.: Stochastic Processes. Cambridge Universiry Press, Cambridge (2011)

    Book  Google Scholar 

  12. Moreau, M., Gaveau, B.: Stochastic theory of coarse-grained deterministic systems: martingales and Markov approximations. In: Carpentieri, B. (ed.) Advances in Dynamical Systems Theory, Models, Algorithms (2021). https://doi.org/10.5772/Intechopen.95903

    Chapter  Google Scholar 

  13. Doob, J.: Stochastic Processes. Wiley, New York (1953)

    Google Scholar 

  14. Levy, P.: Théorie de l’addition des variables aléatoires. Gauthier-Villars, Paris (1937)

    Google Scholar 

  15. Bartle, R.G.: The Elements of Integration and Lebesgue Measure. Wiley-Interscience, New York (1995)

    Book  Google Scholar 

  16. Dynkin, E.B.: Theory of Markov Processes. Pergamon, Elmsford (2015)

    Google Scholar 

  17. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)

    Google Scholar 

  18. Achieser, N.I.: Theory of Approximation. Dover Books on Mathematics, Dover (2013)

    Google Scholar 

  19. Reif, F.: Fundamental of Statistical and Thermal Physics. McGraw-Hill, New York (1965)

    Google Scholar 

  20. Gaveau, B., Moreau, M.: Generalized kinetic theory of coarse-grained systems, entropy and Markov approximations. II: comparison between various coarse grainings. To be published

Download references

Acknowledgements

The authors thank their colleagues of the Laboratory of Theoretical Physics of Condensed Matter, Sorbonne Université, Paris, for their interest and many scientific discussions.

Funding

No funding.

Author information

Authors and Affiliations

Authors

Contributions

B.G. developed the mathematical aspects of this joint work. M.M. focused on the physical notion of coarse-graining and a more intuitive vision of various results presented here. All authors read and approved the final manuscript.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of Theorems 4.4 and 4.5

Theorem 4.4

For \(x = x(0)\in X\), the sequence of random variables \(p ( x \vert x[ - k, - 1] )\) is a martingale with respect to the sequence \(\mathcal{F}_{k}\) of σ-algebras generated by \(x([-k,-1])\). Moreover, these random variables are positively bounded by 1.

Proof

Consider the random variables

$$ \pi _{k} = p \bigl( x| x[ - k, - 1] \bigr) = p(x| \mathcal{F}_{k} ). $$

We have because \(\mathcal{F}_{k - 1} \subset \mathcal{F}_{k}\)

$$ \mathrm{E}(\pi _{k}| \mathcal{F}_{k - 1}) = \mathrm{E} \bigl( p(x| \mathcal{F}_{k} )| \mathcal{F}_{k - 1} \bigr) = p(x| \mathcal{F}_{k - 1} ) = \pi _{k - 1}, $$

where \(\mathrm{E}\{\}\) is the mathematical expectation for the measure p. So, \(\pi _{k}\) is a martingale on the σ-algebra \(\mathcal{F}_{N}\), and by the convergence theorem of martingales [13], it converges almost surely to a limit π when \(k\rightarrow \infty \). □

From the previous theorem, one can deduce the theorem of Kolmogorov-Shannon [46].

Theorem 4.5

For the stationary process p, one has

  1. (a)

    \(d_{k}S_{k}(p)\) has a limit \(s(p)\) for \(k \rightarrow \infty \)

  2. (b)

    One has

    $$ \lim_{n \to \infty} \frac{1}{n} S_{n}(p) = s(p) $$
    (A.1)

Proof

(a) We use Eq. (2.13) and the stationarity of p so that

$$ d_{k} S_{k}(p) = \sum_{x[ - k, - 1]} p \bigl(x[ - k, - 1]\bigr) S ( p\bigl(\bigl(.| x[ - k, - 1]\bigr) \bigr) \ge 0 $$
(A.2)

where

$$ S \bigl( p\bigl(. | x[ - k, - 1\bigr) \bigr) = - \sum _{x(0)} p \bigl( x(0)| x[ - k, - 1] \bigr) \ln p \bigl( x(0)| x[ - k, - 1] \bigr) $$
(A.3)

By Theorem 4.4, the sequence of random variables \(p ( x(0) \vert x[ - k, - 1] )\) is a martingale with respect to the sequence \(\mathcal{F}_{k}\) of σ-algebras generated by \(x([-k,-1])\). Moreover, these random variables are positively bounded by 1. So, by the theorem of convergences of martingales [4], this sequence converges p-almost surely to a certain random variable \(p ( x(0) \vert x[ - \infty , - 1] )\) (as well as any Lebesgue space \(L^{r}(X^{r},p)\) for \(1\leq r \leq +\infty \)). Furthermore, \(p ( x(0) \vert x[ - k, - 1] ) \ln p ( x(0) \vert x[ - k, - 1] )\) converges also p-almost surely, as well as the finite sum \(S ( p(. \vert x[ - k, - 1) )\) over \(x(0)\in X\), while staying uniformly bounded. By Lebesgue of dominated convergence [6], the integral over p of these random variables converges, so \(d_{k} S_{k}(p)\) converges.

(b) \(\frac{1}{n} S_{n}(p)\) is, up to \(\frac{1}{n} S_{n}(p_{0})\), the arithmetic sum of the first n differences \(d_{k}S_{k}(p)\) (Eq. (4.10)). □

Appendix B: Proofs of Lemma 5.7 and Theorem 5.2

Using the definition of and Lemma 4.1 for the conditional entropy, we have

$$ \begin{gathered} \sum \overline{p}\bigl(x [0, N + k - 1] \bigr) S ( \overline{p}\bigl(. | x [0, n + k - 1]\bigr) | \overline{p} \bigl(. | x(0), x[n, n + k - 1] \bigr) \\ \quad = \sum \frac{q(a(x_{0})}{p(a(x_{0})} p\bigl(x [0, n + k - 1]\bigr) S ( p \bigl(. | x [0, n + k - 1]\bigr) | p\bigl(. | x(0), x[n, n + k - 1] \bigr) \\ \quad \le C \sum p\bigl(x [0, n + k - 1]\bigr) S ( p\bigl(. | x [0, n + k - 1]\bigr) | p\bigl(. | x(0), x[n, n + k - 1] \bigr) \end{gathered} $$

where \(C= \max q(a)/p(a)\).

To prove Lemma 5.7, we prove

Lemma 5.8

One has

$$\begin{aligned}& \begin{aligned}&\lim_{k \to \infty} \lim_{n \to \infty} \sum p\bigl(x [0, n + k - 1]\bigr) \\ &\quad {}\times S ( p\bigl(. | x [0, n + k - 1]\bigr) | p\bigl(. | x(0), x[n, n + k - 1] \bigr) = 0 \end{aligned} \end{aligned}$$
(B.1)

Proof of Lemma 5.8

We split the sum in the first term of (5.16) into two terms

$$\begin{aligned}& \sum p\bigl(x [0, n + k - 1]\bigr) S ( p\bigl(. | x [0, n + k - 1]\bigr) | p\bigl(. | x(0), x[n, n + k - 1] \bigr) \\& \quad = E_{1} (n + k) + E_{2}(n, k) \end{aligned}$$
(B.2)

with

$$ - E_{1}(n + k) = \mathrm{E}_{p} \bigl\{ S \bigl( p\bigl(. | x [0, n + k - 1]\bigr) \bigr) \bigr\} $$
(B.3)

and

$$\begin{aligned} E_{2}(n, k) =& - \mathrm{E}_{p} \Bigl\{ \sum p \bigl( x(n + k) | x[0, n + k - 1] \bigr) \ln p \bigl( x(n + k) | x(0), x[n, n + k - 1] \bigr) \Bigr\} \end{aligned}$$
(B.4)

Lemma 5.8 is proved from the next two Lemmas. □

Lemma 5.9

We have

$$ \lim_{n + k \to \infty} E_{1}(n + k) = - s(p) $$
(B.5)

Proof of Lemma 5.9

By the stationarity of p, we have

$$ E_{1}(n + k) = \sum_{x(0)} \mathrm{E}_{p} \bigl\{ - p\bigl(x(0) | x[ - n - k, - 1]\bigr) \ln p \bigl(x(0) | x[ - n - k, - 1]\bigr) \bigr\} $$

The martingale \(p(x(0) \vert x( - n - k, - 1) \) is uniformly bounded and converges p-almost surely, and it also integrable, so that, by the convergence theorem of martingales [13]

$$ \lim_{n + k \to \infty} E_{1}(n + k) = \sum _{x(0)} \mathrm{E}_{p} \bigl\{ - p\bigl(x(0) | x[ - \infty , - 1]\bigr) \ln p\bigl(x(0) | x - \infty , - 1]\bigr) \bigr\} = - s(p). $$

 □

Lemma 5.10

One has

$$ \begin{gathered} \lim_{k \to \infty} \lim _{n \to \infty} E_{2}(n, k) \\ \quad = - \sum_{x(0)} \mathrm{E}_{p} \bigl\{ p \bigl( x(0) | x] {-} \infty , - 1] \bigr)\ln p \bigl( x(0) | x] {-} \infty , - 1] \bigr) \bigr\} \end{gathered} $$
(B.6)

Proof of Lemma 5.10

We have

$$ \begin{aligned} E_{2}(n, k) ={}& - \sum p \bigl( x[0, n + k - 1]\bigr) p\bigl(x(n + k) | x[0, n + k - 1] \bigr) \\ &{}\times \ln p \bigl( x(n + k) | x(0), x[n, n + k - 1] \bigr) \\ ={}& - \sum p\bigl(x[0, n + k]\bigr) \ln p \bigl( x(n + k) | x(0), x[n, n + k - 1] \bigr) \\ ={}& - \sum_{x(n + k)} \mathrm{E}_{p} \bigl\{ p(\bigl(x(n + k)\bigr) \bigr\} \ln p \bigl( x(n + k) | x(0), x[n, n + k - 1] \bigr) \\ ={}& \mathrm{E}_{p} \bigl\{ S ( p\bigl(. | x(0), x[n, n + k - 1] \bigr) \bigr\} \end{aligned} $$
(B.7)

Now, we have seen in Eq. (5.8) that this converges when \(n\rightarrow \infty \) to

$$ \begin{aligned} \lim_{n \to \infty} E_{2}(n, k) &= \mathrm{E}_{p} \bigl\{ S ( p\bigl(. | \tau ^{ - n}x[n, n + k - 1] \bigr) \bigr\} \\ &= \mathrm{E}_{p} \bigl\{ S \bigl( p\bigl(. | x[\grave{a}, k - 1]\bigr) \bigr) \bigr\} = d_{k}S_{k}(p) \quad \bigl(\text{by Eq. (4.12)}\bigr) \end{aligned} $$
(B.8)

So,

$$ \lim_{k \to \infty} \lim_{n \to \infty} E_{2}(n, k) = s(p) $$
(B.9)

Eqs. (B.8) and (B.9) prove Lemma 5.10. □

This concludes the proof of Theorem 5.2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gaveau, B., Moreau, M. Generalized kinetic theory of coarse-grained systems. I. Partial equilibrium and Markov approximations. Adv Cont Discr Mod 2024, 19 (2024). https://doi.org/10.1186/s13662-024-03810-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13662-024-03810-x

Mathematics Subject Classification

Keywords