Importance Sampling(중요도 샘플링)

Importance Sampling(중요도 샘플링)

PROGRAMMING/머신러닝 2024. 6. 19. 16:22

Importance Sampling은 Monte Carlo Simulation을 할 때 variance를 줄여주는 중요한 테크닉 중 하나다.

Importance Sampling을 ELBO를 증명하는데도 사용하길래 간단하게 정리해보았다.

Importance Sampling

Importance sampling은 probability measure를 바꾸므로써 Monte Carlo simulation에서의 variance를 줄이는 하나의 테크닉이다. Importance Sampling은 더 중요한 결과에 높은 가중치를 줌으로써 샘플링의 효율성을 높인다.

확률 변수 $X$의 확률 밀도 함수 $f$에 대해 $h(X)$의 기댓값은 다음과 같이 쓸 수 있다.

$\alpha = E[h(X)] = \int h(x)f(x) dx$

가장 일반적인 Monte Carlo 방법론을 사용하면 $\alpha$는 다음과 같이 추정할 수 있다.

$\hat{\alpha} = \hat{\alpha(n)} = \frac{1}{n} \sum_{i = 1}^{n} h(X_i)$

여기서 $X_1, X_2, ... X_n$은 $f$에서부터 독립적으로 생성된 샘플이다.

$g$는 다른 확률밀도함수로 다음을 만족한다고 하자.

$f(x) > 0 \Rarrow g(x) > 0$

그렇다면 우리는 $\alpha$를 다음과 같이 나타낼 수 있다.

$\alpha = \int h(x)\frac{f(x)}{g(x)}g(x) dx = \tilde{E}[h(X)\frac{f(X)}{g(X)}]$

여기서 $\tilde{E}$는 $g$에 대한 기댓값임을 나타낸다.

$X_1, X_2, ... X_n$은 $g$에서부터 독립적으로 생성된 샘플이라고 하면 importance sampling estimator with $g$는 다음과 같이 쓸 수 있다.

$\hat{\alpha_g} = \hat{\alpha_{g}(n)} = \frac{1}{n} \sum_{i = 1}^{n} h(X_i)\frac{f(X_i)}{g(X_i)}$

여기서 $\frac{f(X_i)}{g(X_i)}를 likelihood ratio 또는 Radon-Nikodym derivative라고 부른다.

Mean and Variance

$\hat{\alpha}$과 $\hat{\alpha_g}$는 모두 $\alpha$의 unbiased estimator이다. 그래서 우리는 variance만 비교해보고자 한다. Importance sampling estimator의 경우 다음과 같은 second moments를 갖는다.

$\tilde{E}[(h(X)\frac{f(X)}{g(X)})^2] = E[h^2(X)\frac{f(X)}{g(X)}$

증명

이는 $E[h^2(X)]$보다 작을 수도 있고, 클 수도 있는데, 그렇기 때문에 좋은 분포 g를 설정하는 것이 중요하다.

Effective Importance Sampling density $g$

함수 $h$가 양수라고 가정하자. $h(x)f(x)$를 nomalize해서 분포를 만들고(즉, $h(x)f(x)$의 그래프 아래 적분값이 1이 되도록 임의의 값으로 잘 나눠줬다고 생각해보자) 그 분포를 $g$라고 칭하자.

그러면 $g(x)$는 $h(x)f(x)$와 비례하는 값을 가질 것이며, $h(X_i)f(X_i)/g(X_i)$는 $X_i$에 상관없이 상수 값을 가질 것이다. 따라서 $\hat{\alpha_g}$는 zero-variance estimator일 것이다.

이와 같은 $g$를 찾을 수 있다는 말은 사실상 $f$와 $h$를 알고 있다는 말로, 실제로는 이러한 $g$를 찾는 것이 매우 어렵다. 하지만 최대한 $h(x)f(x)$와 비례하도록 $g$를 찾으므로써, 우리는 sampling의 효율성을 높일 수 있다.

그림으로 이해하는 예시

$h(x)$가 큰 값을 가지는 구간에서 샘플링할 확률(=$f(x)$)는 희박하다. 이 경우 몇 개 되지 않는 샘플들로 기댓값 $E[h(x)]$를 구하기 때문에 샘플들에 따라 기댓값이 크게 흔들릴 수 있다. 즉, variance가 크게 나온다는 말이다.

만약 우리가 $g$를 따라 샘플링한다면 기존의 variance보다 variance가 크게 줄어들 것을 기대할 수 있다.

Reference

[1] Monte Carlo Methods in Financial Engineering(Paul Glasserman, 2023) p255

[2] https://dilithjay.com/blog/importance-sampling

Importance Sampling - Clearly Explained

Importance Sampling is a tool that helps us tackle a common challenge: calculating expectations. While that might sound like a straightforward task, it often becomes a formidable problem, especially when dealing with high-dimensional data. In this blog pos

dilithjay.com

[3] https://medium.com/@amir_masoud/a-simple-tutorial-on-sampling-importance-and-monte-carlo-with-python-codes-8ce809b91465

A simple tutorial on Sampling Importance and Monte Carlo with Python codes

In this post, I will explain the Importance sampling technique. That is an approximation method showing up in machine learning topics.

medium.com

저작자표시 (새창열림)

댓글

ABOUT ME

공부하는 사람의 공부.c 공부하는 사람의 공부.c

Importance Sampling

Mean and Variance

Effective Importance Sampling density $g$

그림으로 이해하는 예시

Reference

티스토리툴바