An observation is defined as:
y
=
h
(
x
)
+
w
y=h(x)+w
y=h(x)+w
where
x
∈
R
n
x\in\mathbb{R}^n
x∈Rn and
y
∈
R
m
y\in\mathbb{R}^m
y∈Rm denote the unknown vector
and the measurement vector.
h
:
R
n
→
R
m
h:\mathbb{R}^n\rightarrow\mathbb{R}^m
h:Rn→Rm is a
function of
x
x
x and
w
w
w is the observation noise with the
power density
f
w
(
w
)
f_w(w)
fw(w).
It is assumed that
x
x
x is a random variable with an a priori
power density
f
x
(
x
)
f_x(x)
fx(x) before the observation.
The goal is to compute the "best" estimation of
x
x
x using the
observation.
The optimal estimation
x
^
\hat x
x^ is defined based on a cost
function
J
J
J:
x
^
o
p
t
=
arg
min
x
^
E
[
J
(
x
−
x
^
)
]
\hat x_{opt}=\underset{\hat x}{\arg\min} E[J(x-\hat x)]
x^opt=x^argminE[J(x−x^)]
Some typical cost functions:
Minimum Mean Square Error (
x
^
M
M
S
E
\hat x_{MMSE}
x^MMSE):
J
(
x
−
x
^
)
=
(
x
−
x
^
)
T
W
(
x
−
x
^
)
,
W
>
0
J(x-\hat x)=(x-\hat x)^\text{T} W(x-\hat x), W>0
J(x−x^)=(x−x^)TW(x−x^),W>0
Absolute Value (
x
^
A
B
S
\hat x_{ABS}
x^ABS):
J
(
x
−
x
^
)
=
∣
x
−
x
^
∣
J(x-\hat x)=|x-\hat x|
J(x−x^)=∣x−x^∣
Maximum a Posteriori (
x
^
M
A
P
\hat x_{MAP}
x^MAP):
J
(
x
−
x
^
)
=
{
0
if
∣
x
−
x
^
∣
≤
ϵ
1
if
∣
x
−
x
^
∣
>
ϵ
,
ϵ
→
0
J(x-\hat x)=\begin{cases} 0 & \text {if }|x-\hat x|\leq \epsilon \\\\\\ 1 & \text {if }|x-\hat x| \gt \epsilon\end{cases}, \epsilon\rightarrow 0
J(x−x^)=⎩⎨⎧01if ∣x−x^∣≤ϵif ∣x−x^∣>ϵ,ϵ→0
It can be shown that:
x
^
M
M
S
E
(
y
)
=
E
[
x
∣
y
]
\hat x_{MMSE}(y)=E[x|y]
x^MMSE(y)=E[x∣y]
∫
−
∞
x
^
A
B
S
(
y
)
f
x
∣
y
(
x
∣
y
)
d
x
=
∫
x
^
A
B
S
(
y
)
+
∞
f
x
∣
y
(
x
∣
y
)
d
x
\int_{-\infty}^{\hat x_{ABS}(y)}f_{x|y}(x|y)dx=\int_{\hat x_{ABS}(y)}^{+\infty}f_{x|y}(x|y)dx
∫−∞x^ABS(y)fx∣y(x∣y)dx=∫x^ABS(y)+∞fx∣y(x∣y)dx
x
^
M
A
P
=
arg
max
x
f
x
∣
y
(
x
∣
y
)
\hat x_{MAP}=\arg\max_{x} f_{x|y}(x|y)
x^MAP=argmaxxfx∣y(x∣y)
If the a posteriori density function
f
x
∣
y
(
x
∣
y
)
f_{x|y}(x|y)
fx∣y(x∣y) has only
one maximum and it is symmetric with respect to
E
[
x
∣
y
]
E[x|y]
E[x∣y] then
all the above estimates are equal to
E
[
x
∣
y
]
E[x|y]
E[x∣y].
In fact, assuming these conditions for
f
x
∣
y
(
x
∣
y
)
f_{x|y}(x|y)
fx∣y(x∣y),
E
(
x
∣
y
)
E(x|y)
E(x∣y) is the optimal estimation for any cost function
J
J
J if
J
(
0
)
=
0
J(0)=0
J(0)=0 and
J
(
x
−
x
^
)
J(x-\hat x)
J(x−x^) is nondecreasing with
distance (Sherman's Theorem).
Maximum Likelihood Estimation:
x
^
M
L
\hat x_{ML}
x^ML is the value
of
x
x
x that maximizes the probability of observing
y
y
y:
x
^
M
L
(
y
)
=
arg
max
x
f
y
∣
x
(
y
∣
x
)
\hat x_{ML}(y)=\underset{x}{\arg\max} f_{y|x}(y|x)
x^ML(y)=xargmaxfy∣x(y∣x)
It can be shown that
x
^
M
L
=
x
^
M
A
P
\hat x_{ML}=\hat x_{MAP}
x^ML=x^MAP if there is
no a priori information about
x
x
x.
Consider the following observation:
y
=
A
x
+
B
w
y=Ax+Bw
y=Ax+Bw where
w
∼
N
(
0
,
I
r
)
w\sim\mathcal{N}(0,I_r)
w∼N(0,Ir) is a Gaussian random vector and matrices
A
m
×
n
A_{m\times n}
Am×n and
B
m
×
r
B_{m\times r}
Bm×r are known.
In this observation,
x
x
x is estimable if
A
A
A has full column
rank otherwise there will be infinite solutions for the problem.
If
B
B
T
BB^\text{T}
BBT is invertible, then:
f
y
∣
x
(
y
∣
x
)
=
1
(
2
π
)
n
/
2
∣
B
B
T
∣
1
/
2
×
exp
(
−
1
2
[
(
y
−
A
x
)
T
(
B
B
T
)
−
1
(
y
−
A
x
)
]
)
\begin{align*}
f_{y|x}(y|x) &= \frac{1}{(2\pi)^{n / 2}|BB^\text{T}|^{1 / 2}} \\\\\\
&\quad \times \exp\left( -\frac{1}{2} \left[(y-Ax)^\text{T} (BB^\text{T})^{-1} (y-Ax)\right] \right)
\end{align*}
fy∣x(y∣x)=(2π)n/2∣BBT∣1/21×exp(−21[(y−Ax)T(BBT)−1(y−Ax)])
The maximum likelihood estimation can be computed as:
x
^
M
L
=
arg
min
x
(
y
−
A
x
)
T
(
B
B
T
)
−
1
(
y
−
A
x
)
=
(
A
T
(
B
B
T
)
−
1
A
)
−
1
A
T
(
B
B
T
)
−
1
y
\begin{align}\hat x_{ML}=&\arg\min_x (y-Ax)^\text{T}(BB^\text{T})^{-1}(y-Ax)\\\\=&(A^\text{T}(BB^\text{T})^{-1}A)^{-1}A^\text{T}(BB^\text{T})^{-1}y\end{align}
x^ML==argxmin(y−Ax)T(BBT)−1(y−Ax)(AT(BBT)−1A)−1AT(BBT)−1y
It is very interesting that
x
^
M
L
\hat x_{ML}
x^ML is the Weighted
Least Square (WLS) solution to the following equation:
y
=
A
x
y=Ax
y=Ax
with the weight matrix
W
=
B
B
T
W=BB^\text{T}
W=BBT i.e.
x
^
W
L
S
=
arg
min
x
(
y
−
A
x
)
T
W
(
y
−
A
x
)
\hat x_{WLS}=\arg\min_x (y-Ax)^\text{T} W(y-Ax)
x^WLS=argminx(y−Ax)TW(y−Ax)
x
^
M
L
\hat x_{ML}
x^ML is an unbiased estimation:
b
(
x
)
=
E
[
x
−
x
^
M
L
∣
x
]
=
E
[
(
A
T
(
B
B
T
)
−
1
A
)
−
1
A
T
(
B
B
T
)
−
1
y
−
x
∣
x
]
=
0
\begin{align*}
b(x) &= E[x - \hat{x}_{ML} | x] \\
&= E\left[(A^\text{T} (BB^\text{T})^{-1} A)^{-1} A^\text{T} (BB^\text{T})^{-1} y - x \mid x\right] = 0
\end{align*}
b(x)=E[x−x^ML∣x]=E[(AT(BBT)−1A)−1AT(BBT)−1y−x∣x]=0
The covariance of the estimation error is:
P
e
(
X
)
=
E
[
(
x
−
x
^
M
L
)
(
x
−
x
^
M
L
)
T
∣
x
]
=
(
A
T
(
B
B
T
)
−
1
A
)
−
1
P_e(X)=E[(x-\hat x_{ML})(x-\hat x_{ML})^\text{T}|x]=(A^\text{T}(BB^\text{T})^{-1}A)^{-1}
Pe(X)=E[(x−x^ML)(x−x^ML)T∣x]=(AT(BBT)−1A)−1
x
^
M
L
\hat x_{ML}
x^ML is efficient in the sense of Cramér Rao bound.
Example: Consider the following linear Gaussian observation:
y
=
a
x
+
w
y=ax+w
y=ax+w where
a
a
a is a nonzero real number and
w
∼
N
(
0
,
r
)
w\sim\mathcal{N}(0,r)
w∼N(0,r) is the observation noise.
Maximum a Posteriori Estimation: To compute
x
^
M
A
P
\hat
x_{MAP}
x^MAP, it is assumed that the a priori density of
x
x
x is
Gaussian with mean
m
x
m_x
mx and variance
p
x
p_x
px:
x
∼
N
(
m
x
,
p
x
)
x\sim\mathcal{N}(m_x,p_x)
x∼N(mx,px)
The conditions of Sherman's Theorem is satisfied and therefore:
x
^
M
A
P
=
E
[
x
∣
y
]
=
m
x
+
p
x
y
p
y
(
y
−
m
y
)
=
m
x
+
a
p
x
a
2
p
x
+
r
(
y
−
a
m
x
)
=
a
p
x
y
+
m
x
r
a
2
p
x
+
r
\begin{align*} \hat x_{MAP}=&E[x|y]\\
=&m_x+\frac{p_{xy}}{p_y}(y-m_y)\\
=&m_x+\frac{ap_{x}}{a^2p_x+r}(y-am_x)\\
=&\frac{ap_xy+m_xr}{a^2p_x+r} \end{align*}
x^MAP====E[x∣y]mx+pypxy(y−my)mx+a2px+rapx(y−amx)a2px+rapxy+mxr
Estimation bias:
b
M
A
P
=
E
[
x
−
x
^
M
A
P
]
=
m
x
−
a
p
x
e
[
y
]
+
r
m
x
a
2
p
x
+
r
=
m
x
−
a
2
p
x
m
x
+
r
m
x
a
2
p
x
+
r
=
0
b_{MAP}=E[x-\hat x_{MAP}]=m_x-\frac{ap_xe[y]+rm_x}{a^2p_x+r}=m_x-\frac{a^2p_xm_x+rm_x}{a^2p_x+r}=0
bMAP=E[x−x^MAP]=mx−a2px+rapxe[y]+rmx=mx−a2px+ra2pxmx+rmx=0
Estimation error covariance:
p
M
A
P
=
E
[
(
x
−
x
^
M
A
P
)
2
]
=
p
x
−
a
2
p
x
2
a
2
p
x
+
r
=
p
x
r
a
2
p
x
+
r
p_{MAP}=E[(x-\hat
x_{MAP})^2]=p_x-\frac{a^2p_x^2}{a^2p_x+r}=\frac{p_xr}{a^2p_x+r}
pMAP=E[(x−x^MAP)2]=px−a2px+ra2px2=a2px+rpxr
Maximum Likelihood Estimation: For this example, we have:
f
y
∣
x
(
y
∣
x
)
=
f
w
(
y
−
a
x
)
=
1
2
π
r
exp
(
−
(
y
−
a
x
)
2
2
r
)
f_{y|x}(y|x)=f_w(y-ax)=\frac{1}{\sqrt{2\pi r}}\exp(-\frac{(y-ax)^2}{2r})
fy∣x(y∣x)=fw(y−ax)=2πr1exp(−2r(y−ax)2)
With this information:
x
^
M
L
=
arg
max
x
f
y
∣
x
(
y
∣
x
)
=
y
a
\hat x_{ML}=\arg\max_x
f_{y|x}(y|x)=\frac{y}{a}
x^ML=argmaxxfy∣x(y∣x)=ay
Estimation bias:
b
M
L
=
E
[
x
−
x
^
M
L
∣
x
]
=
x
−
a
x
x
=
0
b_{ML}=E[x-\hat
x_{ML}|x]=x-\frac{ax}{x}=0
bML=E[x−x^ML∣x]=x−xax=0
Estimation error covariance:
p
M
L
=
E
[
(
x
−
x
^
M
L
)
2
∣
x
]
=
E
[
(
x
−
a
x
+
w
a
)
2
]
=
r
a
2
p_{ML}=E[(x-\hat
x_{ML})^2|x]=E[(x-\frac{ax+w}{a})^2]=\frac{r}{a^2}
pML=E[(x−x^ML)2∣x]=E[(x−aax+w)2]=a2r
Comparing
x
M
A
P
x_{MAP}
xMAP and
x
M
L
x_{ML}
xML, we have:
lim
p
x
→
+
∞
x
^
M
A
P
=
x
^
M
L
\lim_{p_x\rightarrow +\infty}\hat x_{MAP}=\hat x_{ML}
limpx→+∞x^MAP=x^ML
It means that if there is no a priori information about
x
x
x, the two estimations are equal.
For the error covariance, we have:
1
p
M
A
P
=
1
p
M
L
+
1
p
x
\frac{1}{p_{MAP}}=\frac{1}{p_{ML}}+\frac{1}{p_x}
pMAP1=pML1+px1
In other words, information after observation is the sum of
information of the observation and information before the
observation.
Estimation error covariance:
lim
p
x
→
+
∞
p
^
M
A
P
=
p
^
M
L
\lim_{p_x\rightarrow
+\infty}\hat p_{MAP}=\hat p_{ML}
limpx→+∞p^MAP=p^ML
It is possible to include a priori information in maximum likelihood
estimation.
A priori distribution of
x
x
x,
N
(
m
x
,
p
x
)
\mathcal{N}(m_x,p_x)
N(mx,px), can
be rewritten as the following observation:
m
x
=
x
+
v
m_x=x+v
mx=x+v where
v
∼
N
(
0
,
p
x
)
v\sim\mathcal{N}(0,p_x)
v∼N(0,px) is the observation noise.
Combined observation:
z
=
A
x
+
u
z=Ax+u
z=Ax+u where:
z
=
[
m
x
y
]
,
A
=
[
1
0
0
a
]
,
u
=
[
v
w
]
z=\left[\begin{matrix}
m_x\\\\y\end{matrix}\right], A=\left[\begin{matrix} 1 & 0\\\\0 & a\end{matrix}\right], u=\left[\begin{matrix}
v\\\\ w\end{matrix}\right]
z=mxy,A=100a,u=vw
The assumption is that
v
v
v and
w
w
w are independent.
Therefore:
u
∼
N
(
0
,
[
p
x
0
0
r
]
)
u\sim\mathcal{N}\left(0,\left[\begin{matrix} p_x& 0\\\\0 &
r\end{matrix}\right]\right)
u∼N0,px00r
Maximum liklihood estimation:
x
^
M
L
p
(
z
)
=
arg
max
x
f
z
∣
x
(
z
∣
x
)
=
arg
min
x
(
(
m
x
−
x
)
2
p
x
+
(
y
−
a
x
)
2
r
)
=
a
p
x
y
+
m
x
r
a
2
p
x
+
r
=
x
^
M
A
P
\begin{align*}\hat
x_{MLp}(z)=&\arg\max_x f_{z|x}(z|x)\\\\\\ =&\arg\min_x
\left(\frac{(m_x-x)^2}{p_x}+\frac{(y-ax)^2}{r}\right)\\\\\\
=&\frac{ap_xy+m_xr}{a^2p_x+r}=\hat x_{MAP}\end{align*}
x^MLp(z)===argxmaxfz∣x(z∣x)argxmin(px(mx−x)2+r(y−ax)2)a2px+rapxy+mxr=x^MAP
x
^
M
L
p
\hat x_{MLp}
x^MLp is unbiased and has the same error covariance
as
x
^
M
A
P
\hat x_{MAP}
x^MAP.
Therefore
x
^
M
L
p
\hat x_{MLp}
x^MLp and
x
^
M
A
P
\hat x_{MAP}
x^MAP are
equivalent.
Consider the following linear system:
{
x
(
k
+
1
)
=
A
(
k
)
x
(
k
)
+
w
(
k
)
y
(
k
)
=
C
(
k
)
x
(
k
)
+
v
(
k
)
\begin{cases}x(k+1)&=&A(k)x(k)+w(k) \\\\ y(k)&=&C(k)x(k)+v(k)\end{cases}
⎩⎨⎧x(k+1)y(k)==A(k)x(k)+w(k)C(k)x(k)+v(k)
where
x
(
k
)
∈
R
n
x(k)\in\mathbb{R}^n
x(k)∈Rn,
y
(
k
)
∈
R
m
y(k)\in\mathbb{R}^m
y(k)∈Rm denote the state vector
and measurement vector at time
t
k
t_k
tk.
w
(
k
)
∼
N
(
0
,
Q
(
k
)
)
w(k)\sim\mathcal{N}(0,Q(k))
w(k)∼N(0,Q(k)) and
v
(
k
)
∼
N
(
0
,
R
(
k
)
)
v(k)\sim\mathcal{N}(0,R(k))
v(k)∼N(0,R(k)) are independent Gaussian white
noise processes where
R
(
k
)
R(k)
R(k) is invertible.
It is assumed that there is an a priori estimation of x, denoted by
x
^
−
(
k
)
\hat x^-(k)
x^−(k), which is assumed to be unbiased with a Guassian
estimation error, independent of
w
w
w and
v
v
v:
e
−
(
k
)
∼
N
(
0
,
P
−
(
k
)
)
e^-(k)\sim\mathcal{N}(0,P^-(k))
e−(k)∼N(0,P−(k))
where
P
−
(
k
)
P^-(k)
P−(k) is invertible.
Kalman filter is a recursive algorithm to compute the state
estimation.
Output Measurement: Information in
x
^
−
(
k
)
\hat x^-(k)
x^−(k) and
y
(
k
)
y(k)
y(k) can be written as the following observation:
[
x
^
−
(
k
)
y
(
k
)
]
=
[
I
C
(
k
)
]
x
(
k
)
+
[
e
−
(
k
)
v
(
k
)
]
\left[\begin{matrix}\hat x^-(k)\\\\ y(k) \end{matrix}\right]=\left[\begin{matrix} I\\\\C(k)\end{matrix}\right] x(k)+\left[\begin{matrix} e^-(k)\\\\v(k)\end{matrix}\right]
x^−(k)y(k)=IC(k)x(k)+e−(k)v(k)
Considering the independence of
e
−
(
k
)
e^-(k)
e−(k) and
v
(
k
)
v(k)
v(k), we have:
[
e
−
(
k
)
v
(
k
)
]
∼
N
(
0
,
[
P
−
(
k
)
0
0
R
(
k
)
]
)
\left[\begin{matrix}
e^-(k)\\\\v(k) \end{matrix}\right]\sim\mathcal{N}\left(0,\left[\begin{matrix} P^-(k) &
0\\\\0& R(k) \end{matrix}\right]\right)
e−(k)v(k)∼N0,P−(k)00R(k)
Using the Weighted Least Square (WLS) and matrix inversion formula:
(
A
+
B
D
−
1
C
)
−
1
=
A
−
1
−
A
−
1
B
(
D
+
C
A
−
1
B
)
−
1
C
A
−
1
(A+BD^{-1}C)^{-1}=A^{-1}-A^{-1}B(D+CA^{-1}B)^{-1}CA^{-1}
(A+BD−1C)−1=A−1−A−1B(D+CA−1B)−1CA−1
Assuming:
K
(
k
)
=
P
−
(
k
)
C
T
(
k
)
[
C
(
k
)
P
−
(
k
)
C
T
(
k
)
+
R
(
k
)
]
−
1
K(k)=P^-(k)C^\text{T}(k)[C(k)P^-(k)C^\text{T}(k)+R(k)]^{-1}
K(k)=P−(k)CT(k)[C(k)P−(k)CT(k)+R(k)]−1
We have:
x
^
(
k
)
=
x
^
−
(
k
)
+
K
(
k
)
(
y
(
k
)
−
C
(
k
)
x
^
−
(
k
)
)
\hat x(k)=\hat x^-(k)+K(k)(y(k)-C(k)\hat x^-(k))
x^(k)=x^−(k)+K(k)(y(k)−C(k)x^−(k))
State estimation is the sum of a priori estimation and a
multiplicand of output prediction error. Since:
y
^
−
(
k
)
=
C
(
k
)
x
^
−
(
k
)
\hat y^-(k)=C(k)\hat x^-(k)
y^−(k)=C(k)x^−(k)
K
(
k
)
K(k)
K(k) is the Kalman filter gain.
Estimation error covariance:
P
(
k
)
=
(
I
−
K
(
k
)
C
(
k
)
)
P
−
(
k
)
P(k)=(I-K(k)C(k))P^-(k)
P(k)=(I−K(k)C(k))P−(k)
Information:
x
^
(
k
)
=
x
(
k
)
+
e
(
k
)
\hat x(k)=x(k)+e(k)
x^(k)=x(k)+e(k)
where
e
(
k
)
∼
N
(
0
,
P
(
k
)
)
e(k)\sim\mathcal{N}(0,P(k))
e(k)∼N(0,P(k))
State Update: To complete a recursive algorithm, we need to
compute
x
^
−
(
k
+
1
)
\hat x^-(k+1)
x^−(k+1) and
P
−
(
k
+
1
)
P^-(k+1)
P−(k+1).
Information:
x
^
(
k
)
=
x
(
k
)
+
e
(
k
)
0
=
[
−
I
A
(
k
)
]
[
x
(
k
+
1
)
x
(
k
)
]
+
w
(
k
)
\begin{align}\hat x(k)=&x(k)+e(k)\\\\\
0=&\left[\begin{matrix} -I& A(k)\end{matrix}\right]\left[\begin{matrix} x(k+1)\\\\x(k)
\end{matrix}\right]+w(k)\end{align}
x^(k)=0=x(k)+e(k)[−IA(k)]x(k+1)x(k)+w(k)
By removing
x
(
k
)
x(k)
x(k) from the above observation, we have:
A
(
k
)
x
^
(
k
)
=
x
(
k
+
1
)
+
A
(
k
)
e
(
k
)
−
w
(
k
)
A(k)\hat x(k)=x(k+1)+A(k)e(k)-w(k)
A(k)x^(k)=x(k+1)+A(k)e(k)−w(k)
It is easy to see:
x
^
−
(
k
+
1
)
=
A
(
k
)
x
^
(
k
)
\hat x^-(k+1)=A(k)\hat x(k)
x^−(k+1)=A(k)x^(k)
Estimation error:
e
−
(
k
+
1
)
=
A
(
k
)
e
(
k
)
−
w
(
k
)
e^-(k+1)=A(k)e(k)-w(k)
e−(k+1)=A(k)e(k)−w(k)
Estimation covariance:
P
−
(
k
+
1
)
=
A
(
k
)
P
(
k
)
A
T
(
k
)
+
Q
(
k
)
P^-(k+1)=A(k)P(k)A^\text{T}(k)+Q(k)
P−(k+1)=A(k)P(k)AT(k)+Q(k)
Summary:
Initial Conditions:
x
^
−
(
k
)
\hat x^-(k)
x^−(k) and its error covariance
P
−
(
k
)
P^-(k)
P−(k).
Gain Calculation:
K
(
k
)
=
P
−
(
k
)
C
T
(
k
)
[
C
(
k
)
P
−
(
k
)
C
T
(
k
)
+
R
(
k
)
]
−
1
K(k)=P^-(k)C^\text{T}(k)[C(k)P^-(k)C^\text{T}(k)+R(k)]^{-1}
K(k)=P−(k)CT(k)[C(k)P−(k)CT(k)+R(k)]−1
x
^
(
k
)
\hat x(k)
x^(k):
x
^
(
k
)
=
x
^
−
(
k
)
+
K
(
k
)
(
y
(
k
)
−
C
(
k
)
x
^
−
(
k
)
)
P
(
k
)
=
(
I
−
K
(
k
)
C
(
k
)
)
P
−
(
k
)
\begin{align*}\hat x(k)=&\hat x^-(k)+K(k)(y(k)-C(k)\hat x^-(k))\\\\\\ P(k)=&(I-K(k)C(k))P^-(k)
\end{align*}
x^(k)=P(k)=x^−(k)+K(k)(y(k)−C(k)x^−(k))(I−K(k)C(k))P−(k)
x
^
−
(
k
+
1
)
\hat x^-(k+1)
x^−(k+1):
x
^
−
(
k
+
1
)
=
A
(
k
)
x
^
(
k
)
P
−
(
k
+
1
)
=
A
(
k
)
P
(
k
)
A
T
(
k
)
+
Q
(
k
)
\begin{align*}\hat x^-(k+1)=&A(k)\hat x(k)\\\\ P^-(k+1)=&A(k)P(k)A^\text{T}(k)+Q(k)\end{align*}
x^−(k+1)=P−(k+1)=A(k)x^(k)A(k)P(k)AT(k)+Q(k)
Go to gain calculation and continue the loop for
k
+
1
k+1
k+1.
Remarks:
Estimation residue:
γ
(
k
)
=
y
(
k
)
−
C
(
k
)
x
^
−
(
k
)
\gamma(k)=y(k)-C(k)\hat x^-(k)
γ(k)=y(k)−C(k)x^−(k)
Residue covariance:
P
γ
(
k
)
=
C
(
k
)
P
−
(
k
)
C
T
(
k
)
+
R
(
k
)
P_\gamma(k)=C(k)P^-(k)C^\text{T}(k)+R(k)
Pγ(k)=C(k)P−(k)CT(k)+R(k)
The residue signal is used for monitoring the performance of Kalman
filter.
Modeling error, round off error, disturbance, correlation between
input and measurement noise and other factors might cause a biased
and colored residue.
The residue signal can be used in Fault Detection and Isolation
(FDI).
The standard Kalman filter is not numerically robust because it
contains matrix inversion. For example, the calculated error
covariance matrix might not be positive definite because of
computational errors.
There are different implementations of Kalman filter to improve the
standard Kalman filter in the following aspects:
Computational efficiency
Dealing with disturbance or unknown inputs
Dealing singular systems (difference algebraic equations)