Inferences for Deep Gaussian Process models in Pyro

In this tutorial, I want to illustrate how to use Pyro's Gaussian Processes module to create and train some deep Gaussian Process models. For the background on how to use this module, readers can check out some tutorials at http://pyro.ai/examples/.

The first part is a fun example to run HMC with a 2-layer regression GP models while the second part uses SVI to classify digit numbers.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns; sns.set()
from scipy.cluster.vq import kmeans2

import torch
import torch.nn as nn
from torch.distributions.transforms import AffineTransform
from torchvision import transforms

import pyro
import pyro.contrib.gp as gp
import pyro.distributions as dist
import pyro.infer as infer
import pyro.infer.mcmc as mcmc
from pyro.contrib import autoname
from pyro.contrib.examples.util import get_data_loader

pyro.set_rng_seed(0)

HMC with Heaviside data

Let's create a dataset from Heaviside step function.

In [2]:
N = 25
X = torch.rand(N)
y = (X >= 0.5).float() + torch.randn(N) * 0.01
plt.plot(X.numpy(), y.numpy(), "kx");

We will make a 2-layer regression model. We use the decorator name_count to avoid the conflict of parameter names from two layers.

Read more…

Sampling Hidden Markov Model with Pyro

To understand the multimodal phenomenon of unsupervised hidden Markov models (HMM) when reading some discussions in PyMC discourse, I decide to reimplement in Pyro various models from Stan. The main reference which we'll use is Stan User's Guide.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(palette="bright")
import torch
import warnings; warnings.simplefilter("ignore", FutureWarning)

# this post assumes a Pyro version in dev branch (dated 2019-01-01):
# pip install git+https://github.com/uber/pyro@4e42613
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc import MCMC, NUTS

pyro.set_rng_seed(1)

As in Stan user's guide, we use the notation categories for latent states and words for observations. The following data information is taken from Stan's example-models repository.

In [2]:
num_categories = 3
num_words = 10
num_supervised_data = 100
num_data = 600

transition_prior = torch.empty(num_categories).fill_(1.)
emission_prior = torch.empty(num_words).fill_(0.1)

transition_prob = dist.Dirichlet(transition_prior).sample(torch.Size([num_categories]))
emission_prob = dist.Dirichlet(emission_prior).sample(torch.Size([num_categories]))

We need to generate data randomly from the above transition probability and emission probability. In addition, we will generate an initial category from the equilibrium distribution of its Markov chain.

Read more…

Solutions to Project Euler's second 50 problems

These are codes to solve the second 50 problems in Project Euler. These python codes will give solutions in less than 1 second. This is achieved by using the excellent numba package.

Problem 51

In [1]:
def is_prime(n):
    if n < 5:
        return n == 2 or n == 3
    elif n % 2 == 0 or n % 3 == 0:
        return False
    for i in range(5, int(n**0.5)+1, 6):
        if n % i == 0 or n % (i+2) == 0:
            return False
    return True
In [2]:
%%time
def problem51(max_digits=6):
    position_list = [[i, j, k] for i in range(6) for j in range(i+1, 6)
                     for k in range(j+1, 6)]
    for origin in range(10**(max_digits-3)):
        for position in position_list:
            start_digit = 0 if position[0] == 0 else 1
            fail_count = start_digit
            answer = []
            for digit in range(start_digit, 10):
                origin_str = "{:0{}d}".format(origin, max_digits-3)
                expanded_digits = [str(digit)] * 6
                expanded_digits[position[0]] = origin_str[0]
                expanded_digits[position[1]] = origin_str[1]
                expanded_digits[position[2]] = origin_str[2]
                generated_number = int("".join(expanded_digits))
                if is_prime(generated_number) == False:
                    fail_count += 1
                    if fail_count == 3:
                        break
                else:
                    answer.append(generated_number)
            if len(answer) == 8:
                return answer[0]

print(problem51())
121313
CPU times: user 40 ms, sys: 0 ns, total: 40 ms
Wall time: 40.9 ms

Problem 52

In [3]:
%%time
def problem52(limit=10**7):
    for i in range(1, limit):
        sorted_i = sorted(str(i))
        if sorted_i == sorted(str(2*i)):
            if sorted_i == sorted(str(3*i)):
                if sorted_i == sorted(str(4*i)):
                    if sorted_i == sorted(str(5*i)):
                        if sorted_i == sorted(str(6*i)):
                            return i

print(problem52())
142857
CPU times: user 208 ms, sys: 0 ns, total: 208 ms
Wall time: 205 ms

Problem 53

How to build a Grapheme-to-Phoneme (G2P) model using PyTorch

introduction

Grapheme-to-Phoneme (G2P) model is one of the core components of a typical Text-to-Speech (TTS) system, e.g. WaveNet and Deep Voice. In this notebook, we will try to replicate the Encoder-decoder LSTM model from the paper https://arxiv.org/abs/1506.00196.

Throughout this tutorial, we will learn how to:

  • Implement a sequence-to-sequence (seq2seq) model
  • Implement global attention into seq2seq model
  • Use beam-search decoder
  • Use Levenshtein distance to compute phoneme-error-rate (PER)
  • Use torchtext package

setup

First, we will import necessary modules. You can install PyTorch as suggested in its main page. To install torchtext, simply call

pip install git+https://github.com/pytorch/text.git

Due to this bug, it is important to update your torchtext to the lastest version (using the above installing command is enough).

In [1]:
import argparse
import os
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.optim as optim
from torch.nn.utils import clip_grad_norm
import torchtext.data as data

argparse is a default python module which is used for command-line script parsing. To run this notebook as a python script, simply comment out all the markdown cell and change the following code cell to the real argparse code.

Read more…

Some solutions to Rudin's complex analysis book

The following notebook contains some solutions to the complex analysis part of the Big Rudin book that I studied at POSTECH. This post is also a chance for me to test the different between MathJax and KaTeX in Nikola, to see which one has better render. It turns out that KaTeX is much faster than MathJax. As a note, to make KaTeX work with inline mode from Jupyter notebook, we have to change the default auto-render code in the theme (as suggested in this file, the renderMathInElement part).

Chapter 10 - Elementary Properties of Holomorphic Functions

1. The following fact was tacitly used in this chapter: If $A$ and $B$ are disjoint subsets of the plane, if $A$ is compact, and if $B$ is closed, then there exists a $\delta > 0$ such that $|\alpha - \beta| \geq \delta$ for all $\alpha \in A$ and $\beta \in B$. Prove this, with an arbitrary metric space in place of the plane.

Proof. Let $A$ be a compact set and $B$ be a closed set in a metric space such that $A\cap B = \varnothing$. Let $\delta = \inf_{\alpha,\beta}d(\alpha,\beta)$ where the infimum is taken with all $\alpha \in A$ and $\beta\in B$, and let $\{a_n\}$, $\{b_n\}$ be two sequences in $A$ and $B$ correspondingly such that $\lim_{n\to\infty}d(a_n,b_n) = \delta$. Suppose that $\delta = 0$. Because $A$ is compact, there exists $c\in A$ and a subsequence $\{a_{n_k}\}$ of $\{a_n\}$ such that $\lim_{k\to\infty}d(a_{n_k},c) = 0$. Hence $\lim_{k\to\infty}d(b_{n_k},c) = 0$ because $\lim_{k\to\infty}d(a_{n_k},b_{n_k}) = 0$. Hence $c \in \bar{B}=B$, which is a contradiction with the hypothesis that $A\cap B = \varnothing$. So $\delta > 0$. We get the conclusion. $\Box$

Read more…

Solutions to Project Euler's first 50 problems

These are codes to solve the first 50 problems in Project Euler. The thing is that I try to make these python codes give solutions in less than 1 second. This is achieved by using the excellent numba package. This post is written in an IPython notebook.

In [1]:
# !mkdir pe
In [2]:
def problem1():
    n = 1000
    s = 0
    for i in range(1, n):
        if i % 3 == 0 or i % 5 == 0:
            s += i
    return s

print(problem1())
%timeit problem1()
233168
1000 loops, best of 3: 245 µs per loop
In [3]:
def problem2():
    n = 4000000
    s = 0
    a, b = 1, 2
    while a <= n:
        if a % 2 == 0:
            s += a
        a, b = b, a+b
    return s

print(problem2())
%timeit problem2()
4613732
100000 loops, best of 3: 8.44 µs per loop