# The difference of two folded normal random variables

What is the distribution of $Z = \vert X\vert - \vert Y\vert$ when $X \sim N(\mu_x, \sigma^2)$ and $Y \sim N(\mu_y,\sigma^2)$ are independent Gaussian random variables with the same variance?

# CDF

The CDF of $Z$ is $F_Z(t) = \mathbb{P}(Z \le t) = \mathbb{P}(\vert X\vert - \vert Y\vert \le t).$

First assume that $t > 0$ and consider the region

$$A = \left\{\vert X\vert - \vert Y\vert \le t\right\}$$:

The probability $\mathbb{P}(Z \le t)$ is

$$\mathbb{P}(Z \le t) = \underset{A}{\int\int}f(x,y)\, \mathrm{d}x\mathrm{d}y.$$

Since $X$ and $Y$ have the same variance and are independent, the distribution of $(X, Y)^\intercal$ is invariant under an orthogonal transformation $W$. In other words,

where $WA = \left\{Wv : v \in A\right\}$.

To make the integral $\mathbb{P}(Z \le t) = \underset{A}{\int\int}f(x,y)\, \mathrm{d}x\mathrm{d}y$ easier, we can rotate the region $A$ so that its “elbows” align with the coordinate axes.
The triangular cutouts of this region are defined by 45-degree lines: $y= x - t$, $y = - x + t$, etc. So we want the rotation matrix for a 45-degree, counterclockwise rotation around the origin:

Applying this rotation matrix, we get the region $WA$:

The probability of $WA$ is easy to compute since we can split the region into disjoint rectangles. Denote $W\left[\begin{array}{c}X\\Y\end{array}\right] = (\tilde{X}, \tilde{Y})^\intercal$.

Using independence of $\tilde{X}$ and $\tilde{Y}$ and the above picture, we have

where $\Phi$ is the CDF of a $N(0,1)$ random variable and

$$\left[\begin{array}{c}\tilde{\mu}_x\\\tilde{\mu}_y\end{array}\right] = W\left[\begin{array}{c}\mu_x\\ \mu_y\end{array}\right] = \frac{1}{\sqrt{2}}\left[\begin{array}{c} \mu_x-\mu_y \\ \mu_x+\mu_y\end{array}\right]$$

When $% $ the regions $A$ and $WA$ look like this:

And the probability we want is

(The pictures make it clear that $\mathbb{P}(\vert X\vert - \vert Y\vert \le t) = 1 - \mathbb{P}(\vert X\vert - \vert Y\vert \le -t)$.)

# PDF

After differentiating, the PDF is

where $\phi$ is the standard normal PDF and %

# Mean and variance

The mean and variance can be calculated based on the mean of $\vert X\vert$ and $\vert Y\vert$ individually (refer to the folded normal distribution wiki page).

# Verify using simulation

Here is an R implementation of the CDF and PDF for $Z = \vert X\vert - \vert Y\vert$.

cdf_absdiff <- function(tval, mu_x, mu_y, sigma) {
mu_x_tilde <- (1 / sqrt(2)) * (mu_x - mu_y) # rotated mu_x
mu_y_tilde <- (1 / sqrt(2)) * (mu_x + mu_y) # rotated mu_y
cxmin <- -tval / (sigma * sqrt(2)) - mu_x_tilde / sigma
cymin <- -tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
cyplus <- tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
cxplus <- tval / (sigma * sqrt(2)) - mu_x_tilde / sigma
ret <- ifelse(
tval > 0,
(1 - pnorm(cxmin)) * pnorm(cymin) + pnorm(cxplus) * (1 - pnorm(cyplus)) +
pnorm(cyplus) - pnorm(cymin),
pnorm(cxplus) * (1 - pnorm(cymin)) + (1 - pnorm(cxmin)) * pnorm(cyplus)
)
return(ret)
}

pdf_absdiff <- function(tval, mu_x, mu_y, sigma) {
mu_x_tilde <- (1 / sqrt(2)) * (mu_x - mu_y) # rotated mu_x
mu_y_tilde <- (1 / sqrt(2)) * (mu_x + mu_y) # rotated mu_y
cxmin <- -tval / (sigma * sqrt(2)) - mu_x_tilde / sigma
cymin <- -tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
cyplus <- tval / (sigma * sqrt(2)) - mu_y_tilde / sigma
cxplus <- tval / (sigma * sqrt(2)) - mu_x_tilde / sigma

ret <- ifelse(
tval > 0,
(1 / (sigma * sqrt(2))) * (
pnorm(cymin) * dnorm(cxmin) - (1 - pnorm(cxmin)) * dnorm(cymin) +
(1 - pnorm(cyplus)) * dnorm(cxplus) -
pnorm(cxplus) * dnorm(cyplus) +
dnorm(cymin) + dnorm(cyplus)
),
(1 / (sigma * sqrt(2))) * (
pnorm(cyplus) * dnorm(cxmin) + pnorm(cxplus) * dnorm(cymin) +
(1 - pnorm(cymin)) * dnorm(cxplus) +
(1 - pnorm(cxmin)) * dnorm(cyplus)
)
)
return(ret)
}

mean_absnorm <- function(mu_x, sigma){
return(sigma * sqrt(2/pi) * exp(-mu_x^2/(2*sigma^2)) + mu_x * (1 - 2 * pnorm(-mu_x/sigma)))
}
mean_absdiff <- function(mu_x, mu_y, sigma){
return(mean_absnorm(mu_x, sigma) - mean_absnorm(mu_y, sigma))
}
var_absdiff <- function(mu_x, mu_y, sigma){
m_absx <- mean_absnorm(mu_x, sigma)
m_absy <- mean_absnorm(mu_y, sigma)
return(m_absy^2 + m_absx^2 + 2*sigma^2 - mu_x^2 - mu_y^2)
}


The PDF and CDF calculated above agree with the empirical distribution of $Z$:

library(tidyverse)
sigma <- 2
mu_x <- 1.5
mu_y <- 5
set.seed(111)
N <- 50000
X <- rnorm(N, mean=mu_x, sd=sigma)
Y <- rnorm(N, mean=mu_y, sd=sigma)
Z <- abs(X) - abs(Y)
Z_ecdf <- ecdf(Z)
tseq <- seq(mean(Z) - 4 * sd(Z), mean(Z) + 4 * sd(Z), length.out=300)
mytheme <- theme(panel.background = element_rect(fill=NA,color='grey'),
panel.grid = element_line(color='lightgrey',linetype='dashed'),
legend.key = element_rect(fill=NA))
tibble(t = tseq,
Z_empirical = Z_ecdf(tseq),
Z_theory = cdf_absdiff(tseq,mu_x, mu_y, sigma)) %>%
gather(Z_empirical, Z_theory, key='type', value='cdf') %>%
ggplot(aes(x=t, y=cdf, color=type)) + geom_line() +
mytheme + ylab("Pr(Z < t)") +
scale_color_manual(values = c('Z_theory' = 'darkblue',
'Z_empirical' = 'orange'),
name='', labels=c('Theoretical CDF','Empirical CDF'))


ggplot(tibble(Z=Z), aes(x=Z, y=..density..)) +
geom_histogram(fill='white', color='black', bins = 100) +
geom_line(aes(x=t,y=pdf),
color = 'red',
data = tibble(t = tseq,
pdf = pdf_absdiff(tseq, mu_x,
mu_y, sigma))) +
mytheme