Inferences for Deep Gaussian Process models in Pyro

In this tutorial, I want to illustrate how to use Pyro's Gaussian Processes module to create and train some deep Gaussian Process models. For the background on how to use this module, readers can check out some tutorials at

The first part is a fun example to run HMC with a 2-layer regression GP models while the second part uses SVI to classify digit numbers.

In [1]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns; sns.set()
from scipy.cluster.vq import kmeans2

import torch
import torch.nn as nn
from torch.distributions import constraints
from torch.distributions.transforms import AffineTransform
from torchvision import transforms

import pyro
import as gp
import pyro.distributions as dist
from pyro.contrib.examples.util import get_data_loader
from pyro.infer import MCMC, NUTS, Predictive, SVI, TraceMeanField_ELBO

warnings.formatwarning = (lambda message, category, *args, **kwargs:
                          "{}: {}\n".format(category.__name__, message))

HMC with Heaviside data

Let's create a dataset from Heaviside step function.

In [2]:
N = 20
X = torch.rand(N)
y = (X >= 0.5).float() + torch.randn(N) * 0.05
plt.plot(X.numpy(), y.numpy(), "kx");

We will make a 2-layer regression model.

Read more…

Sampling Hidden Markov Model with Pyro

To understand the multimodal phenomenon of unsupervised hidden Markov models (HMM) when reading some discussions in PyMC discourse, I decide to reimplement in Pyro various models from Stan. The main reference which we'll use is Stan User's Guide.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(palette="bright")
import torch
import warnings; warnings.simplefilter("ignore", FutureWarning)

# this post assumes a Pyro version in dev branch (dated 2019-01-01):
# pip install git+
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc import MCMC, NUTS


As in Stan user's guide, we use the notation categories for latent states and words for observations. The following data information is taken from Stan's example-models repository.

In [2]:
num_categories = 3
num_words = 10
num_supervised_data = 100
num_data = 600

transition_prior = torch.empty(num_categories).fill_(1.)
emission_prior = torch.empty(num_words).fill_(0.1)

transition_prob = dist.Dirichlet(transition_prior).sample(torch.Size([num_categories]))
emission_prob = dist.Dirichlet(emission_prior).sample(torch.Size([num_categories]))

We need to generate data randomly from the above transition probability and emission probability. In addition, we will generate an initial category from the equilibrium distribution of its Markov chain.

Read more…

Solutions to Project Euler's second 50 problems

These are codes to solve the second 50 problems in Project Euler. These python codes will give solutions in less than 1 second. This is achieved by using the excellent numba package.

Problem 51

In [1]:
def is_prime(n):
    if n < 5:
        return n == 2 or n == 3
    elif n % 2 == 0 or n % 3 == 0:
        return False
    for i in range(5, int(n**0.5)+1, 6):
        if n % i == 0 or n % (i+2) == 0:
            return False
    return True
In [2]:
def problem51(max_digits=6):
    position_list = [[i, j, k] for i in range(6) for j in range(i+1, 6)
                     for k in range(j+1, 6)]
    for origin in range(10**(max_digits-3)):
        for position in position_list:
            start_digit = 0 if position[0] == 0 else 1
            fail_count = start_digit
            answer = []
            for digit in range(start_digit, 10):
                origin_str = "{:0{}d}".format(origin, max_digits-3)
                expanded_digits = [str(digit)] * 6
                expanded_digits[position[0]] = origin_str[0]
                expanded_digits[position[1]] = origin_str[1]
                expanded_digits[position[2]] = origin_str[2]
                generated_number = int("".join(expanded_digits))
                if is_prime(generated_number) == False:
                    fail_count += 1
                    if fail_count == 3:
            if len(answer) == 8:
                return answer[0]

CPU times: user 40 ms, sys: 0 ns, total: 40 ms
Wall time: 40.9 ms

Problem 52

In [3]:
def problem52(limit=10**7):
    for i in range(1, limit):
        sorted_i = sorted(str(i))
        if sorted_i == sorted(str(2*i)):
            if sorted_i == sorted(str(3*i)):
                if sorted_i == sorted(str(4*i)):
                    if sorted_i == sorted(str(5*i)):
                        if sorted_i == sorted(str(6*i)):
                            return i

CPU times: user 208 ms, sys: 0 ns, total: 208 ms
Wall time: 205 ms

Problem 53

How to build a Grapheme-to-Phoneme (G2P) model using PyTorch


Grapheme-to-Phoneme (G2P) model is one of the core components of a typical Text-to-Speech (TTS) system, e.g. WaveNet and Deep Voice. In this notebook, we will try to replicate the Encoder-decoder LSTM model from the paper

Throughout this tutorial, we will learn how to:

  • Implement a sequence-to-sequence (seq2seq) model
  • Implement global attention into seq2seq model
  • Use beam-search decoder
  • Use Levenshtein distance to compute phoneme-error-rate (PER)
  • Use torchtext package


First, we will import necessary modules. You can install PyTorch as suggested in its main page. To install torchtext, simply call

pip install git+

Due to this bug, it is important to update your torchtext to the lastest version (using the above installing command is enough).

In [1]:
import argparse
import os
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
from torch.nn.utils import clip_grad_norm
import as data

argparse is a default python module which is used for command-line script parsing. To run this notebook as a python script, simply comment out all the markdown cell and change the following code cell to the real argparse code.

Read more…

Some solutions to Rudin's complex analysis book

The following notebook contains some solutions to the complex analysis part of the Big Rudin book that I studied at POSTECH. This post is also a chance for me to test the different between MathJax and KaTeX in Nikola, to see which one has better render. It turns out that KaTeX is much faster than MathJax. As a note, to make KaTeX work with inline mode from Jupyter notebook, we have to change the default auto-render code in the theme (as suggested in this file, the renderMathInElement part).

Chapter 10 - Elementary Properties of Holomorphic Functions

1. The following fact was tacitly used in this chapter: If $A$ and $B$ are disjoint subsets of the plane, if $A$ is compact, and if $B$ is closed, then there exists a $\delta > 0$ such that $|\alpha - \beta| \geq \delta$ for all $\alpha \in A$ and $\beta \in B$. Prove this, with an arbitrary metric space in place of the plane.

Proof. Let $A$ be a compact set and $B$ be a closed set in a metric space such that $A\cap B = \varnothing$. Let $\delta = \inf_{\alpha,\beta}d(\alpha,\beta)$ where the infimum is taken with all $\alpha \in A$ and $\beta\in B$, and let $\{a_n\}$, $\{b_n\}$ be two sequences in $A$ and $B$ correspondingly such that $\lim_{n\to\infty}d(a_n,b_n) = \delta$. Suppose that $\delta = 0$. Because $A$ is compact, there exists $c\in A$ and a subsequence $\{a_{n_k}\}$ of $\{a_n\}$ such that $\lim_{k\to\infty}d(a_{n_k},c) = 0$. Hence $\lim_{k\to\infty}d(b_{n_k},c) = 0$ because $\lim_{k\to\infty}d(a_{n_k},b_{n_k}) = 0$. Hence $c \in \bar{B}=B$, which is a contradiction with the hypothesis that $A\cap B = \varnothing$. So $\delta > 0$. We get the conclusion. $\Box$

Read more…

Solutions to Project Euler's first 50 problems

These are codes to solve the first 50 problems in Project Euler. The thing is that I try to make these python codes give solutions in less than 1 second. This is achieved by using the excellent numba package. This post is written in an IPython notebook.

In [1]:
# !mkdir pe
In [2]:
def problem1():
    n = 1000
    s = 0
    for i in range(1, n):
        if i % 3 == 0 or i % 5 == 0:
            s += i
    return s

%timeit problem1()
1000 loops, best of 3: 245 µs per loop
In [3]:
def problem2():
    n = 4000000
    s = 0
    a, b = 1, 2
    while a <= n:
        if a % 2 == 0:
            s += a
        a, b = b, a+b
    return s

%timeit problem2()
100000 loops, best of 3: 8.44 µs per loop