Brook Luers Statistician, data scientist
CV     Teaching     Research     Blog    

The difference of two folded normal random variables

What is the distribution of when and are independent Gaussian random variables with the same variance?

CDF

The CDF of is

First assume that and consider the region

$$A = \left\{\vert X\vert - \vert Y\vert \le t\right\}$$:

The probability is

$$ \mathbb{P}(Z \le t) = \underset{A}{\int\int}f(x,y)\, \mathrm{d}x\mathrm{d}y. $$

Since and have the same variance and are independent, the distribution of is invariant under an orthogonal transformation . In other words,

where .

To make the integral easier, we can rotate the region so that its “elbows” align with the coordinate axes.
The triangular cutouts of this region are defined by 45-degree lines: , , etc. So we want the rotation matrix for a 45-degree, counterclockwise rotation around the origin:

Applying this rotation matrix, we get the region :

The probability of is easy to compute since we can split the region into disjoint rectangles. Denote .

Using independence of and and the above picture, we have

where is the CDF of a random variable and

$$ \left[\begin{array}{c}\tilde{\mu}_x\\\tilde{\mu}_y\end{array}\right] = W\left[\begin{array}{c}\mu_x\\ \mu_y\end{array}\right] = \frac{1}{\sqrt{2}}\left[\begin{array}{c} \mu_x-\mu_y \\ \mu_x+\mu_y\end{array}\right] $$

When the regions and look like this:

And the probability we want is

(The pictures make it clear that .)

PDF

After differentiating, the PDF is

where is the standard normal PDF and

Mean and variance

The mean and variance can be calculated based on the mean of and individually (refer to the folded normal distribution wiki page).

Verify using simulation

Here is an R implementation of the CDF and PDF for .

cdf_absdiff <- function(tval, mu_x, mu_y, sigma) {
  mu_x_tilde <- (1 / sqrt(2)) * (mu_x - mu_y) # rotated mu_x
  mu_y_tilde <- (1 / sqrt(2)) * (mu_x + mu_y) # rotated mu_y
  cxmin <- -tval / (sigma * sqrt(2)) - mu_x_tilde / sigma
  cymin <- -tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
  cyplus <- tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
  cxplus <- tval / (sigma * sqrt(2)) - mu_x_tilde / sigma
  ret <- ifelse(
    tval > 0,
    (1 - pnorm(cxmin)) * pnorm(cymin) + pnorm(cxplus) * (1 - pnorm(cyplus)) +
      pnorm(cyplus) - pnorm(cymin),
    pnorm(cxplus) * (1 - pnorm(cymin)) + (1 - pnorm(cxmin)) * pnorm(cyplus)
  )
  return(ret)
}

pdf_absdiff <- function(tval, mu_x, mu_y, sigma) {
  mu_x_tilde <- (1 / sqrt(2)) * (mu_x - mu_y) # rotated mu_x
  mu_y_tilde <- (1 / sqrt(2)) * (mu_x + mu_y) # rotated mu_y
  cxmin <- -tval / (sigma * sqrt(2)) - mu_x_tilde / sigma
  cymin <- -tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
  cyplus <- tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
  cxplus <- tval / (sigma * sqrt(2)) - mu_x_tilde / sigma
  
  ret <- ifelse(
    tval > 0,
    (1 / (sigma * sqrt(2))) * (
      pnorm(cymin) * dnorm(cxmin) - (1 - pnorm(cxmin)) * dnorm(cymin) +
        (1 - pnorm(cyplus)) * dnorm(cxplus) - 
        pnorm(cxplus) * dnorm(cyplus) +
        dnorm(cymin) + dnorm(cyplus)
    ),
    (1 / (sigma * sqrt(2))) * (
      pnorm(cyplus) * dnorm(cxmin) + pnorm(cxplus) * dnorm(cymin) + 
        (1 - pnorm(cymin)) * dnorm(cxplus) +
        (1 - pnorm(cxmin)) * dnorm(cyplus)
    )
  )
  return(ret)
}

mean_absnorm <- function(mu_x, sigma){
  return(sigma * sqrt(2/pi) * exp(-mu_x^2/(2*sigma^2)) + mu_x * (1 - 2 * pnorm(-mu_x/sigma)))
}
mean_absdiff <- function(mu_x, mu_y, sigma){
  return(mean_absnorm(mu_x, sigma) - mean_absnorm(mu_y, sigma))
}
var_absdiff <- function(mu_x, mu_y, sigma){
  m_absx <- mean_absnorm(mu_x, sigma)
  m_absy <- mean_absnorm(mu_y, sigma)
  return(m_absy^2 + m_absx^2 + 2*sigma^2 - mu_x^2 - mu_y^2)
}

The PDF and CDF calculated above agree with the empirical distribution of :

library(tidyverse)
sigma <- 2
mu_x <- 1.5
mu_y <- 5
set.seed(111)
N <- 50000
X <- rnorm(N, mean=mu_x, sd=sigma)
Y <- rnorm(N, mean=mu_y, sd=sigma)
Z <- abs(X) - abs(Y)
Z_ecdf <- ecdf(Z)
tseq <- seq(mean(Z) - 4 * sd(Z), mean(Z) + 4 * sd(Z), length.out=300)
mytheme <- theme(panel.background = element_rect(fill=NA,color='grey'),
                 panel.grid = element_line(color='lightgrey',linetype='dashed'),
                 legend.key = element_rect(fill=NA))
tibble(t = tseq,
       Z_empirical = Z_ecdf(tseq),
       Z_theory = cdf_absdiff(tseq,mu_x, mu_y, sigma)) %>%
  gather(Z_empirical, Z_theory, key='type', value='cdf') %>%
  ggplot(aes(x=t, y=cdf, color=type)) + geom_line() +
  mytheme + ylab("Pr(Z < t)") +
  scale_color_manual(values = c('Z_theory' = 'darkblue',
                                'Z_empirical' = 'orange'),
                     name='', labels=c('Theoretical CDF','Empirical CDF'))

ggplot(tibble(Z=Z), aes(x=Z, y=..density..)) + 
  geom_histogram(fill='white', color='black', bins = 100) +
  geom_line(aes(x=t,y=pdf),
            color = 'red',
            data = tibble(t = tseq, 
	    	          pdf = pdf_absdiff(tseq, mu_x, 
			      			  mu_y, sigma))) +
  mytheme