Power law noise whitening

Power-law noise, characterized by a distribution that decreases with amplitude following a power-law decay, often exhibits long-term correlations that extend infinitely. Dealing with such correlated noise is inherently challenging, making traditional signal processing methods less effective. As processing uncorrelated noise samples is typically more straightforward, one approach to deal with colored noise is to apply whitening prior to further processing. Such a whitening transformation is a linear transformation that changes the input samples into a white noise sequence. In this exploration, we delve into a practical approach to power-law noise whitening, unraveling the intricacies of long-term correlations to unlock clearer and more actionable insights in signal processing.

Power spectral density

White noise is a random process with constant power spectral density (psd), i.e. having equal intensity at different frequencies. The term white noise is derived from white light, which contains all colors in the visible band. Similarly, in digital signal processing (DSP), white noise refers to a time-series of uncorrelated samples, typically zero-mean, with a flat power spectrum. Many DSP methods assume to deal with white noise processes, more specifically with additive white Gaussian noise (AWGN). For AWGN, the frequency distribution of power is uniform, and the probability distribution of values is governed by a normal probability density function (pdf).

In practical situations, we frequently come across correlated noise, wherein the current values are impacted by their preceding values, indicating a dependence on the process history. Such noise processes are commonly known as colored noise processes because they display non-uniform power spectra. A typical instance is power law noise, or $ 1 / f^\alpha $ noise, which is commonly observed in natural phenomena. The canonical case with $ \alpha = 1 $ is called pink noise or $ 1/f $ noise, which has been discovered in the statistical fluctuations of an extraordinarily diverse number of physical and biological systems. Another common process, with $ \alpha = 2 $, is called Brownian noise, also known as Brown noise or random walk noise, which is the type of noise produced by Brownian motion. Visualizations of these common noise processes in both the time and frequency domain are shown below.

Probability density function

The probability density function (PDF) for a power-law distribution can be expressed as: $$ x(\tau) = C \tau^{-\beta} \text{ for } \tau > 0 $$ where $C $ is a normalization constant, and $ \beta $ is the exponent that characterizes the power-law decay of the distribution. Typically, $$ C=\frac{\beta-1}{x_{min}^{1-\beta}} $$ where $x_{min}$ is the smallest of the possible $x$ values.

Likewise, $ \alpha $ characterizes the power-law decay of the power spectral density. In the context of digital signal processing, specifically when addressing power-law behavior in signals or noise, the typical relationship between the exponents $ \alpha $ and $ \beta $ is often expressed as: $$ \beta = \alpha + 1 $$ Power-law noise, characterized by a power-law distribution, often does not have finite moments, including the variance. Specifically, when $\beta < 2 $, the variance is infinite, and higher-order moments may also be undefined.

Power law noise generation

To date, no simple stochastic differential equations generating a power law process have been found, and even no generally recognized physical explanation of e.g. $ 1/f $ noise has been proposed ¹. Power law noise processes seem to exhibit infinitely 'long memory' or long range dependence and are hard to model in the vincinity of $ f=0 $, since a mean value cannot be computed. For that reason, we turn to approximations to model and generate power law noise.

PSD approximation

In a first approximation, we shape the power spectral density function of a white noise process as to produce a power law noise process by frequency domain filtering. Consider the Fourier transform of a power law noise process $ x(t) $, given by $$ X(f) = A f^{-\alpha/2} N(f) \text{ for } f > 0 $$ where $ A $ is a normlization constant, and $ N(f) = \sqrt{N_0} $ is the Fourier transform of a white noise process $ n(t) $. Now, the power spectral density of $ x(t) $ equals $$ S_{xx} (f) = \vert X(f) \vert^2 = A^2 f^{-\alpha} \vert N(f) \vert^2 = A^2 f^{-\alpha} N_0 \ . $$

Using the fundamental Python package NumPy, we can easily generate a power law noise process according to the power spectral density (PSD) method.

import numpy as np

N = np.fft.rfft(np.random.randn(nr_of_samples))
f = np.fft.rfftfreq(nr_of_samples)
H = (f + .5*f[1])**(-alpha/2)
H /= np.sqrt(np.mean(np.abs(H) ** 2))
X = N * H
x = np.fft.irfft(X)

Note that the power density within each discrete Fourier transform (DFT) bin is determined by the centre frequency of that bin. Obviously, pseudorandom noise is being used, since true randomness is hard to realize by a computer.

All-pole model

In a second approximation, let's restate a parametric $ 1 / f^\alpha $ noise model from literature ². This model corresponds to the classical result that the power spectrum of a uniform mixture of exponentially decaying autocorrelation functions has a 1/f form ¹. An autoregressive model of order $ p $ of a correlated random sequence $ x[k] $ is given by $$ \sum_{i=0}^p \theta_i x[k-i] = n[k] \, $$ where $ n[k] $ is a zero-mean uncorrelated sequence, i.e. white noise. The autoregressive AR(p) model specifies that the output $ x[k] $ depends linearly on previous output samples and on the white noise input. Assuming $ \theta_0 = 1 $ and rewriting the equation yields $$ x[k] = n[k] - \sum_{i=1}^p \theta_k x[k-i] \ . $$ The transfer function of a system that filters white noise to produce colored noise, specified by an AR(p) representation, is now given by $$ H(z) = \frac {1} {1 + \sum_{i=1}^p \theta_k z^{-i}} \, $$ which is an all-pole filter, so there are no frequencies where the response equals zero. Now, a power law, or $ 1 / f^\alpha $, noise process can be modelled by AR(p) representation with $$ \theta_i = \left ( i - 1 - \frac{\alpha}{2} \right ) \frac{\theta_{i-1}}{i} \text{ for } i > 0 \, $$ where again $ \alpha = 1 $ for pink noise; and $ \alpha = 2 $ for Brownian noise. In the latter case, $\theta_0 = 1, \theta_1 = -1, \theta_k = 0$ elsewhere.

To filter a white noise sequence, we apply the signal processing module of the Python package SciPy, which is an open-source software for mathematics, science, and engineering available for the Python programming language, see scipy.signal.

from scipy.signal import lfilter

min_magn = 0.01
h_a = [1]
k = 1
while 1:
   a = (k - 1 - alpha/2) * h_a[k-1]/k
   if abs(a) < min_magn:
      break
   k += 1
   h_a.extend([a])
x = lfilter([1], h_a, np.random.randn(nr_of_samples))

Note that the order of the approximation is determined by a threshold set to leave out poles with magnitudes below 0.01 in this case.

Estimated power spectral density functions of pink noise and Brownian noise pseudorandom process realizations, obtained by both methods are plotted below. The psd is normalized to the total power, and frequency is normalized to the sampling frequency.

Partial autocorrelation

Whitening of a random process is typically being done based on its autocovariance, which is a measure of the joint variability of two samples of a process at different points in time. Note that here, and in general in DSP, the terms correlation and covariance are used interchangeably. Strictly speaking, for random processes, correlation refers to standardized covariance, with +1 indicating perfect correlation and −1 indicating perfect anti-correlation. If the sample sequence considered has zero mean and unit variance, then correlation and covariance are indeed the same. Now, the autocorrelation function (acf) for lag $ \tau $ is defined as $$ r_{xx} (\tau) = \text{E} \left [ x(t + \tau) x(t) \right ] \ . $$ For power-law noise, the autocorrelation function extends infinitely because the enduring correlation persists indefinitely over time. In practice, it makes sense to consider the function only up to a finite lag $ \tau $ based on some information criterion, see e.g. the Akaike information criterion (AIC), or simply where $ r_{xx} (\tau) $ drops below some threshold value, resulting in a partial autocorrelation function. Using the fundamental Python package NumPy, we can define a simple function to compute a partial autocorrelation function (pacf) of a sequence of samples.

def pacf(x: np.array, max_lag: int) -> np.array:
    r = np.correlate(x, x, mode = 'full')[-len(x):]
    return r[:max_lag+1]/r[0]

In the figure below, the pacfs of white noise ($ \alpha = 0 $), pink noise ($ \alpha = 1 $), and Brownian noise ($ \alpha = 2 $) realizations are plotted for $ \tau < 10 $.

The autocorrelation function of a $ 1 / f $ process can be decribed by

$$ r_{xx}[k] = k^{-(1 - \alpha/2)} = k^{-0.5} \text{ for } \alpha=1 \ , $$ however, this model does not seem to generalize for arbitray $ \alpha $. In the figure above, this pink noise autocorrelation model is shown as a dotted line. Note that the autocorrelation function can also be obtained from the impulse response of the generating filter convoluted with itself. Since the impulse response consists of a summation of $ p $ exponentially decaying functions, this would probably not lead to a neat closed-form expresion for the autocorrelation function parameterized by $ \alpha $, but we still need to look into that.

Whitening transformation

For a sequence of samples $ \underline{x} $ of the random sequence $ x[k] $, the autocorrelation matrix $ \bold R $ is defined as $$ \bold R \triangleq \text{E} \left [ \underline{x} \, \underline{x}^T \right] \ , $$ where $ \text{E} $ is the expected value operator. By inverting or factorizing the autocorrelation matrix, we can obtain a whitening matrix $ \bold{W} $, i.e. $$ \bold{R}^{-1} = \bold{W}^T \bold{W} . $$ Now, a whitened sample sequence $ \underline{y} $ with unit variance is obtained by the transformation $$ \underline{y} = \bold W \underline{x} . $$ In practice, the inverted autocorrelation matrix $ \bold{R}^{-1} $ has to be estimated and updated as more samples become available. If we assume a weakly stationary random process, the covariance matrix has Toeplitz structure, which can be approximately diagonalized by the DFT matrix ³. In this case, the inverted covariance matrix can be expressed as $$ \bold{R}^{-1} = \bold{F} \text{diag}(\underline{\lambda})^{-1} \bold{F}^{-1} \ , $$ where $ \bold{F} $ is the DFT matrix, $ \underline{\lambda}$ is the vector of eigenvalues of $ \bold{R} $, and typically $ \bold{W} = \bold{F} \text{diag}(\underline{\lambda})^{-\frac{1}{2}} $. Depending on the size of the autocorrelation matrix this approximation can greatly speed up the process of obtaining and updating a whitening filter. However, in case of power law noise, there is even a simpler method.

Autoregressive model inversion

Since power law noise can be modelled as an all-pole filtered white noise process, a whitening filter can also be constructed as the inverted filter, i.e. $$ H(z) = {1 + \sum_{k=1}^p \theta_k z^{-k}} \ . $$ As an example, Brownian whitening with $ p = 1, \theta_0 = 1, \theta_1 = -1$ is plotted below.

To be able to invert the autoregressive model, its coefficients need to be known. One way to estimate the coefficients is through solving the Yule–Walker equations, which are given by $$ \bold R \underline{\theta} = - \underline{r} \ , $$ or $$ \begin{pmatrix} r_0 & r_1 & r_2 & \cdots & r_{N-1} \\ r_1 & r_0 & r_1 & \cdots & r_{N-2} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ r_{N-1} & r_{N-2} & r_{N-3} & \cdot & r_0 \end{pmatrix} \begin{pmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_N \end{pmatrix} = \begin{pmatrix} r_0 \\ r_1 \\ \vdots \\ r_N \end{pmatrix} \ , $$ where $ r_i $ is the i-th autocorrelation coefficient. Again, the inverted autocorrelation matrix $ \bold{R}^{-1} $ has to be estimated in practice and updated as more samples become available, which can become a cumbersome exercise.

Power law noise identification

If $ \alpha $ is known or can be estimated, an inverted filter is easily constructed from the model coefficients proposed by ², and we do not need to find a solution to the Yule-Walker equations. One approach would be to estimate the psd of the noise process, fit a straight line, and determine the slope. Another approach is to identify the power law noise from the lag 1 autocorrelation, as proposed in ⁴, which has substantially lower computational complexity.

If we plot the lag 1 coefficient of power law noise sequences of 1024 samples as a function of $ \alpha $, a sigmoidal relation can be observed on the interval $ 0 \leq \alpha \leq 2 $.

At this moment, I have not yet found an analytical way to compute the relationship, however this should be doable, since the acf equals the convolution of the AR impulse response with its time-reversed version. For now, it's sufficient to note that the first autocorrelation coefficient is approximated by $$ r_1 \approx \frac {1} {1 + e^{-4 \alpha - 3} } \ . $$ The logistic function is linearized as $$ \ln \left ( 1 / r_1 - 1\right ) \approx -4 \alpha - 3 \ , $$ hence $$ \alpha \approx \frac{3 - \ln \left ( 1 / r_1 - 1\right )} {4} \ . $$ Now that we have an easy way to determine the power law exponent from the first autocorrelation coefficient, let's estimate the coefficient for a sample stream.

Recursive estimation of autocorrelation coefficients

The $ j $-th partial autocorrelation coefficient of $ x $ is defined as $$ r_j = \frac {1} {N \sigma^2} \sum_{i=0}^{N} (x[i] - \mu) (x[i + j] - \mu) \ , $$ where $ \mu $ and $ \sigma $ are the mean and standard deviation of the process, respectively. If the process $ x $ is a sample stream, we can define a recursive algorithm that updates the estimate when a new sample becomes available, i.e. $$ r_1 [k] = \eta \ r_1 [k - 1] + (1 - \eta) \frac { (x[k] - \mu) (x[k - 1] - \mu)} { \sigma^2} \ , $$ where $ 0 < \eta \leq 1 $ is a forgetting factor, which gives exponentially less weight to older samples. Note that $ \mu $ and $ \sigma $ can be estimated based on the median and MAD, and $ \sigma \approx k \cdot \text{MAD} $ where $ k \approx 1.4826 $ for Gaussian noise. Just one memory cell is used to store the previous partial autocorrelation estimate, and another the store the previous sample.

An example implementation of a frugral instantenous lag 1 correlation coefficient estimation with minimal memory usage is given below.

def estimate_lag_1_correlation(x, eta):
    k = 1.4826
    med, mad, r_ = None, None, None
    r = np.zeros_like(x)
    for i, x_ in enumerate(x):
        alpha = 0.1 if mad is None else mad/100
        med = x_ if med is None else med + alpha if x_ > med else med - alpha
        x_ -= med
        mad = abs(x_) if mad is None else mad + alpha if abs(x_) > mad else mad - alpha

        if i > 0:
            if r_ is None:
                r_ = abs(x_ * x_prev_)
            else:
                sigma = k * mad
                r_ = eta * r_ + (1-eta) * x_ * x_prev_ / (sigma**2)
            r[i] = r_
        x_prev_ = x_

    return r

In the figure below, examples are shown. For $ \alpha = 0.5 $, identification of the power law process seems reasonable, but for higher values of $ \alpha $, the algorithm perfroms very poor. It seems that the long memory-effect of power law noise does not allow a frugral algorithm to estimate the statistics correctly.

Conclusion

In practice, noise sequences often have correlated samples, giving rise to non-flat power spectra. One common example of a colored noise process is power law noise, or $ 1/ f^\alpha $ noise.
Many DSP methods assume to deal with white noise processes, since processing uncorrelated noise samples is typically more straightforward compared to colored noise samples. One approach to deal with colored noise is to apply a whitening filter prior to further processing, such as outlier detection.
In general, colored noise can be approximately whitened through factorizing the autocorrelation matrix. Power law noise processes seem to exhibit infinitely long range dependence and as such only partial correlation functions can be estimated. Another approach is to whiten the noise process in the frequency domain.
Power law noise can be generated by either filtering white noise as to approximate the desired power spectral density of the colored noise spectrum, or by an autoregressive model (all-pole filter) to correlate white noise. For that reason, a whitening filter for power law noise can be constructed as the inverted autoregressive model.
Power law noise can be identified, or in other words $ \alpha $ can be estimated, either by evaluating the power spectral density, or by interpreting the lag 1 autocorrelation coefficient. The latter of which is easier to compute.
An attempt was made to design a frugral algorithm with extremely low memory usage and computatonal complexity as to estimate the lag 1 autocorrelation coefficient. This approach seems to result in biased estimates of $ \alpha $ for reasons currently unknown.

Lawrence M. Ward and Priscilla E. Greenwood. 1/f noise. Scholarpedia, 2(12):1537, December 2007. URL: http://www.scholarpedia.org/article/1/f_noise (visited on 2022-11-08), doi:10.4249/scholarpedia.1537. ↩↩
N.J. Kasdin. Discrete simulation of colored noise and stochastic processes and 1/f/sup /spl alpha// power law noise generation. Proceedings of the IEEE, 83(5):802–827, May 1995. Conference Name: Proceedings of the IEEE. doi:10.1109/5.381848. ↩↩
R. Gray. On the asymptotic eigenvalue distribution of Toeplitz matrices. IEEE Transactions on Information Theory, 18(6):725–730, November 1972. Conference Name: IEEE Transactions on Information Theory. doi:10.1109/TIT.1972.1054924. ↩
W.J. Riley and C.A. Greenhal. Power law noise identification using the lag 1 autocorrelation. In 18th European Frequency and Time Forum (EFTF 2004), 576–580. Guildford, UK, 2004. IEE. URL: https://digital-library.theiet.org/content/conferences/10.1049/cp_20040932 (visited on 2022-11-07), doi:10.1049/cp:20040932. ↩