Auto-Batched Joint Distributions: A Gentle Tutorial

TensorFlow.org で表示

Google Colab で実行

GitHub でソースを表示

ノートブックをダウンロード

はじめに

TensorFlow Probability (TFP) は、ユーザーが確率的グラフィカルモデルを数学的な形式で簡単に表現できるようにすることで、確率的推論を容易にする多数の JointDistribution 抽象化を提供します。抽象化により、モデルからサンプリングし、モデルからのサンプルの対数確率を評価するためのメソッドが生成されます。このチュートリアルでは、元の JointDistribution 抽象化の後に開発された「自動バッチ処理」バリアントを見ていきます。自動バッチ処理されていない元の抽象化と比較して、自動バッチ処理されたバージョンは使いやすく人間工学的であるため、多くのモデルをより少ないボイラープレートで表現できます。このコラボでは、単純なモデルを詳細に調査し、自動バッチ処理が解決する問題を明らかにし、TFP 形状の概念について詳しく説明します。

自動バッチ処理が導入される前は、確率モデルを表現するためのさまざまな構文スタイルに対応する JointDistribution のいくつかの異なるバリアントがありました (JointDistributionSequential、JointDistributionNamed、JointDistributionCoroutineなど)。自動バッチ処理では、これらすべての AutoBatched バリアントを利用できます。このチュートリアルでは、JointDistributionSequential と JointDistributionSequentialAutoBatched の違いを見ていきますが、ここで行うことはすべて、基本的に変更せずに他のバリアントに適用できます。

依存関係と前提条件

Import and set ups

import functools
import numpy as np

import tensorflow.compat.v2 as tf
tf.enable_v2_behavior()

import tensorflow_probability as tfp

tfd = tfp.distributions

前提条件: ベイズ回帰問題

非常に単純なベイズ回帰シナリオを検討します。

\[ \begin{align*} m & \sim \text{Normal}(0, 1) \\ b & \sim \text{Normal}(0, 1) \\ Y & \sim \text{Normal}(mX + b, 1) \end{align*} \]

このモデルでは、m および b は標準正規分布から抽出されます。観測値 Y は、平均が確率変数 m および b、および、いくつかの (非ランダム、既知の) 共変量 X に依存する正規分布から抽出されます。(簡単にするために、この例では、すべての確率変数のスケールが既知であると想定します。)

このモデルで推論を実行するには、共変量 X と観測値 Y の両方を知る必要があります。ただし、このチュートリアルでは、X のみが必要なので、単純なダミー X を定義します。

X = np.arange(7)
X

array([0, 1, 2, 3, 4, 5, 6])

デシデラタ

確率的推論では、多くの場合、以下の 2 つの基本的な演算を実行します。

sample: モデルからサンプルを抽出する
log_prob: モデルからのサンプルの対数確率を計算します。

TFP の JointDistribution 抽象化の主な利点 (および確率的プログラミングへの他の多くのアプローチ) として、ユーザーはモデルを一回を記述すると、sample および log_prob の両方の計算を実行できます。

データセットに 7 つの点 (X.shape = (7,)) があることに注意して、JointDistribution のデシデラタを述べます。

sample() は、スカラー勾配、スカラーバイアス、およびベクトル観測値にそれぞれ対応する、形状 [(), (), (7,)] を持つ Tensors のリストを生成する必要があります。
log_prob(sample()) はスカラーを生成する必要があります (特定の勾配、バイアス、および観測値の対数確率)。
sample([5, 3]) はモデルからのサンプルの (5, 3)-バッチを表す形状が[(5, 3), (5, 3), (5, 3, 7)]の Tensors のリストを生成する必要があります。
log_prob(sample([5, 3])) は形状 (5, 3) の Tensor を生成する必要があります。

次に、一連の JointDistribution モデルを見て、上記の目標を達成する方法を確認しながら TFP の形状についても見ていきます。

ネタバレ注意: ボイラープレートを追加せずに上記のデシデラタを満たすには自動バッチ処理を使用します。

最初の試み、`JointDistributionSequential`

jds = tfd.JointDistributionSequential([
    tfd.Normal(loc=0., scale=1.),   # m
    tfd.Normal(loc=0., scale=1.),   # b
    lambda b, m: tfd.Normal(loc=m*X + b, scale=1.) # Y
])

これは、モデルをコードに直接変換したものです。勾配 m とバイアス b は単純です。Y は、lambda 関数を使用して定義されます。一般的なパターンは、JointDistributionSequential (JDS) の \(k\) の lambda 関数がモデル内の事前の \(k\) 分布を使用することです。「逆」の順序に注意してください。

sample_distributions を呼び出します。これは、サンプルとサンプルの生成に使用された基礎となる「サブディストリビューション」を返します。（sample を呼び出すことでサンプルだけを作成することもできます。分布はチュートリアルの後半で使用するので、用意しておくと便利です。) 生成されたサンプルには問題ありません。

dists, sample = jds.sample_distributions()
sample

[<tf.Tensor: shape=(), dtype=float32, numpy=-1.668757>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.6585061>,
 <tf.Tensor: shape=(7,), dtype=float32, numpy=
 array([ 0.18573815, -1.79962   , -1.8106272 , -3.5971394 , -6.6625295 ,
        -7.308844  , -9.832693  ], dtype=float32)>]

ただし、log_prob は、望ましくない形状の結果を生成します。

jds.log_prob(sample)

<tf.Tensor: shape=(7,), dtype=float32, numpy=
array([-4.4777603, -4.6775575, -4.7430477, -4.647725 , -4.5746684,
       -4.4368567, -4.480562 ], dtype=float32)>

また、複数のサンプリングは機能しません。

try:
  jds.sample([5, 3])
except tf.errors.InvalidArgumentError as e:
  print(e)

Incompatible shapes: [5,3] vs. [7] [Op:Mul]

問題がどこにあるか見てみましょう。

簡単な見直し: バッチとイベントの形状

TFP では、通常の (JointDistribution ではない) 確率分布には<em data-md-type="emphasis">イベント形状</em>と<em data-md-type="emphasis">バッチ形状</em>があります。TFP を効果的に使用するには、これらの違いを理解することが重要です。

イベントの形状は、分布からの 1 つの抽出の形状を表します。抽出は次元間で依存する場合があります。スカラー分布の場合、イベントの形状は [] です。5 次元の MultivariateNormal の場合、イベントの形状は [5] です。
バッチ形状は、独立した、同一に分散されていない抽出である「バッチ」の分布を表します。単一の Python オブジェクトで分布のバッチを表すことは、TFP が大規模な効率を達成するための重要な方法の 1 つです。

ここでは、分布からの単一のサンプルで log_prob を呼び出す場合、結果は常にバッチの形状と一致する (つまり、右端の次元を持つ) 形状になります。

形状の詳細については、「TensorFlow 分布の形状について」のチュートリアルを参照してください。

`log_prob(sample())` がスカラーを生成しない理由

バッチとイベントの形状に関する知識を使用して、log_prob(sample()) で何が起こっているかを調べてみましょう。サンプルは以下のとおりです。

sample

[<tf.Tensor: shape=(), dtype=float32, numpy=-1.668757>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.6585061>,
 <tf.Tensor: shape=(7,), dtype=float32, numpy=
 array([ 0.18573815, -1.79962   , -1.8106272 , -3.5971394 , -6.6625295 ,
        -7.308844  , -9.832693  ], dtype=float32)>]

分布は以下のとおりです。

dists

[<tfp.distributions.Normal 'Normal' batch_shape=[] event_shape=[] dtype=float32>,
 <tfp.distributions.Normal 'Normal' batch_shape=[] event_shape=[] dtype=float32>,
 <tfp.distributions.Normal 'JointDistributionSequential_sample_distributions_Normal' batch_shape=[7] event_shape=[] dtype=float32>]

対数確率は、部分の (一致した) 要素での劣確率分布の対数確率を合計することによって計算されます。

log_prob_parts = [dist.log_prob(s) for (dist, s) in zip(dists, sample)]
log_prob_parts

[<tf.Tensor: shape=(), dtype=float32, numpy=-2.3113134>,
 <tf.Tensor: shape=(), dtype=float32, numpy=-1.1357536>,
 <tf.Tensor: shape=(7,), dtype=float32, numpy=
 array([-1.0306933, -1.2304904, -1.2959809, -1.200658 , -1.1276014,
        -0.9897899, -1.0334952], dtype=float32)>]

sum(log_prob_parts) - jds.log_prob(sample)

<tf.Tensor: shape=(7,), dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0.], dtype=float32)>

したがって、log_prob_parts の 3 番目のサブコンポーネントが 7 テンソルであるため、対数確率計算が 7 テンソルを返すと説明できます。しかし、なぜでしょうか？

数学の定式化で Y の分布に対応する dists の最後の要素には、[7] の batch_shape があることがわかります。言い換えると、Y での分布は、7 つの独立した法線のバッチです (平均が異なり、この場合は同じスケールです)。

問題が何だかわかりました。JDS では、Y の分布には batch_shape=[7] があります。JDS のサンプルは、m と b のスカラーと、7 つの独立した法線の「バッチ」を表しています。log_prob は、7 つの別々の対数確率を計算します。それぞれが m と bを抽出する対数確率、そして、X[i] での単一の観測 Y[i] を表しています。

`log_prob(sample())` を `Independent` で修正する

dists[2] にはevent_shape=[] と batch_shape=[7] があることを思い出してください。

dists[2]

<tfp.distributions.Normal 'JointDistributionSequential_sample_distributions_Normal' batch_shape=[7] event_shape=[] dtype=float32>

バッチの次元をイベントの次元に変換する TFP の Independent メタ分布を使用することにより、これを event_shape=[7] と batch_shape=[] の分布に変換できます。(Y の分布であり、_i が Independent ラッピングの代わりになるため、名前をy_dist_i に変更します。)

y_dist_i = tfd.Independent(dists[2], reinterpreted_batch_ndims=1)
y_dist_i

<tfp.distributions.Independent 'IndependentJointDistributionSequential_sample_distributions_Normal' batch_shape=[] event_shape=[7] dtype=float32>

これで、7 ベクトルの log_prob はスカラーになります。

y_dist_i.log_prob(sample[2])

<tf.Tensor: shape=(), dtype=float32, numpy=-7.9087086>

裏で、Independent はバッチ全体の和を計算します。

y_dist_i.log_prob(sample[2]) - tf.reduce_sum(dists[2].log_prob(sample[2]))

<tf.Tensor: shape=(), dtype=float32, numpy=0.0>

実際、これを使用して新しい jds_i を作成できます (繰り返しますが、i は Independent を表します)。ここで、log_prob はスカラーを返します。

jds_i = tfd.JointDistributionSequential([
    tfd.Normal(loc=0., scale=1.),   # m
    tfd.Normal(loc=0., scale=1.),   # b
    lambda b, m: tfd.Independent(   # Y
        tfd.Normal(loc=m*X + b, scale=1.),
        reinterpreted_batch_ndims=1)
])

jds_i.log_prob(sample)

<tf.Tensor: shape=(), dtype=float32, numpy=-11.355776>

注意事項:

jds_i.log_prob(s) は tf.reduce_sum(jds.log_prob(s)) と同じではありません。前者は、同時分布の「正しい」対数確率を生成します。後者は 7 テンソルの合計であり、その各要素は m、b の対数確率、および対数確率 Y の単一要素の合計です。したがって、m と b がオーバーカウントされます。(log_prob(m) + log_prob(b) + log_prob(Y) では、TFP は TF および NumPy のブロードキャストルール (ベクトルにスカラーを追加すると、ベクトルサイズの結果が生成される) に従うため、例外をスローせずに結果を返します。)
この特定のケースでは、Independent(Normal(...))の代わりに MultivariateNormalDiag を使用して、問題を解決し、同じ結果を達成できます。MultivariateNormalDiag はベクトル値分布です（つまり、すでにベクトルイベント形状を持っています）。確かに MultivariateNormalDiag は、Independent と Normal を合わせて実装できます。ベクトルVが与えられた場合、n1 = Normal(loc=V) および n2 = MultivariateNormalDiag(loc=V) からのサンプルは区別できません。これらの分布の違いは、n1.log_prob(n1.sample()) がベクトルであり、n2.log_prob(n2.sample()) がスカラーであることです。

複数のサンプル

複数のサンプリングは機能しません。

try:
  jds_i.sample([5, 3])
except tf.errors.InvalidArgumentError as e:
  print(e)

Incompatible shapes: [5,3] vs. [7] [Op:Mul]

理由を考えてみましょう。jds_i.sample([5, 3]) を呼び出すと、最初にm と b のサンプルを抽出します。それぞれの形状は (5, 3) です。次に、次の方法で Normal 分布を構築します。

tfd.Normal(loc=m*X + b, scale=1.)

ただし、m の形状が (5, 3) で、X の形状が 7 の場合、それらを乗算することはできません。そのためにエラーが発生します。

m = tfd.Normal(0., 1.).sample([5, 3])
try:
  m * X
except tf.errors.InvalidArgumentError as e:
  print(e)

Incompatible shapes: [5,3] vs. [7] [Op:Mul]

この問題を解決するために、Y の分布に必要なプロパティについて考えてみましょう。jds_i.sample([5, 3]) を呼び出した場合、m と b の両方の形状が (5, 3) になります。Y 分布で sample を呼び出すと、どのような形状になるでしょうか？明らかに (5, 3, 7) です。バッチポイントごとに、X と同じサイズのサンプルが必要です。TensorFlow のブロードキャスト機能を使用すると、次のように次元を追加できます。

m[..., tf.newaxis].shape

TensorShape([5, 3, 1])

(m[..., tf.newaxis] * X).shape

TensorShape([5, 3, 7])

m と b の両方に軸を追加すると、複数のサンプルをサポートする新しい JDS を定義できます。

jds_ia = tfd.JointDistributionSequential([
    tfd.Normal(loc=0., scale=1.),   # m
    tfd.Normal(loc=0., scale=1.),   # b
    lambda b, m: tfd.Independent(   # Y
        tfd.Normal(loc=m[..., tf.newaxis]*X + b[..., tf.newaxis], scale=1.),
        reinterpreted_batch_ndims=1)
])

shaped_sample = jds_ia.sample([5, 3])
shaped_sample

[<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
 array([[-1.1133379 ,  0.16390413, -0.24177533],
        [-1.1312429 , -0.6224666 , -1.8182136 ],
        [-0.31343174, -0.32932565,  0.5164407 ],
        [-0.0119963 , -0.9079621 ,  2.3655841 ],
        [-0.26293617,  0.8229698 ,  0.31098196]], dtype=float32)>,
 <tf.Tensor: shape=(5, 3), dtype=float32, numpy=
 array([[-0.02876974,  1.0872147 ,  1.0138507 ],
        [ 0.27367726, -1.331534  , -0.09084719],
        [ 1.3349475 , -0.68765205,  1.680652  ],
        [ 0.75436825,  1.3050154 , -0.9415123 ],
        [-1.2502679 , -0.25730947,  0.74611956]], dtype=float32)>,
 <tf.Tensor: shape=(5, 3, 7), dtype=float32, numpy=
 array([[[-1.8258233e+00, -3.0641669e-01, -2.7595463e+00, -1.6952467e+00,
          -4.8197951e+00, -5.2986512e+00, -6.6931367e+00],
         [ 3.6438566e-01,  1.0067395e+00,  1.4542470e+00,  8.1155670e-01,
           1.8868095e+00,  2.3877139e+00,  1.0195159e+00],
         [-8.3624744e-01,  1.2518480e+00,  1.0943471e+00,  1.3052304e+00,
          -4.5756745e-01, -1.0668410e-01, -7.0669651e-02]],
 
        [[-3.1788960e-01,  9.2615485e-03, -3.0963073e+00, -2.2846246e+00,
          -3.2269263e+00, -6.0213070e+00, -7.4806519e+00],
         [-3.9149747e+00, -3.5155020e+00, -1.5669601e+00, -5.0759468e+00,
          -4.5065498e+00, -5.6719379e+00, -4.8012795e+00],
         [ 1.3053948e-01, -8.0493152e-01, -4.7845001e+00, -4.9721808e+00,
          -7.1365709e+00, -9.6198196e+00, -9.7951422e+00]],
 
        [[ 2.0621397e+00,  3.4639853e-01,  7.0252883e-01, -1.4311566e+00,
           3.3790007e+00,  1.1619035e+00, -8.9105040e-01],
         [-7.8956139e-01, -8.5023916e-01, -9.7148323e-01, -2.6229355e+00,
          -2.7150445e+00, -2.4633870e+00, -2.1841538e+00],
         [ 7.7627432e-01,  2.2401071e+00,  3.7601702e+00,  2.4245868e+00,
           4.0690269e+00,  4.0605016e+00,  5.1753912e+00]],
 
        [[ 1.4275590e+00,  3.3346462e+00,  1.5374103e+00, -2.2849756e-01,
           9.1219616e-01, -3.1220305e-01, -3.2643962e-01],
         [-3.1910419e-02, -3.8848895e-01,  9.9946201e-02, -2.3619974e+00,
          -1.8507402e+00, -3.6830821e+00, -5.4907336e+00],
         [-7.1941972e-02,  2.1602919e+00,  4.9575748e+00,  4.2317696e+00,
           9.3528280e+00,  1.0526063e+01,  1.5262107e+01]],
 
        [[-2.3257759e+00, -2.5343289e+00, -3.5342445e+00, -4.0423255e+00,
          -3.2361765e+00, -3.3434000e+00, -2.6849220e+00],
         [ 1.5006512e-02, -1.9866472e-01,  7.6781356e-01,  1.6228745e+00,
           1.4191239e+00,  2.6655579e+00,  4.4663467e+00],
         [ 2.6599693e+00,  1.2663836e+00,  1.7162113e+00,  1.4839669e+00,
           2.0559487e+00,  2.5976877e+00,  2.5977583e+00]]], dtype=float32)>]

jds_ia.log_prob(shaped_sample)

<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[-12.483114 , -10.139662 , -11.514159 ],
       [-11.656767 , -17.201958 , -12.132455 ],
       [-17.838818 ,  -9.474525 , -11.24898  ],
       [-13.95219  , -12.490049 , -17.123957 ],
       [-14.487818 , -11.3755455, -10.576363 ]], dtype=float32)>

追加のチェックとして、単一のバッチポイントの対数確率が以前の確率と一致することを確認します。

(jds_ia.log_prob(shaped_sample)[3, 1] -
 jds_i.log_prob([shaped_sample[0][3, 1],
                 shaped_sample[1][3, 1],
                 shaped_sample[2][3, 1, :]]))

<tf.Tensor: shape=(), dtype=float32, numpy=0.0>

優れた自動バッチ処理

これで、すべてのデシデラタを処理する JointDistribution のバージョンができました。log_prob は、tfd.Independent の使用によりスカラーを返し、軸を追加してブロードキャストを修正したため、複数のサンプリングが機能するようになりました。

しかし、JointDistributionSequentialAutoBatched (JDSAB) と呼ばれるより簡単で優れた方法があります。

jds_ab = tfd.JointDistributionSequentialAutoBatched([
    tfd.Normal(loc=0., scale=1.),   # m
    tfd.Normal(loc=0., scale=1.),   # b
    lambda b, m: tfd.Normal(loc=m*X + b, scale=1.) # Y
])

jds_ab.log_prob(jds.sample())

<tf.Tensor: shape=(), dtype=float32, numpy=-12.954952>

shaped_sample = jds_ab.sample([5, 3])
jds_ab.log_prob(shaped_sample)

<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[-12.191533 , -10.43885  , -16.371655 ],
       [-13.292994 , -11.97949  , -16.788685 ],
       [-15.987699 , -13.435732 , -10.6029   ],
       [-10.184758 , -11.969714 , -14.275676 ],
       [-12.740775 , -11.5654125, -12.990162 ]], dtype=float32)>

jds_ab.log_prob(shaped_sample) - jds_ia.log_prob(shaped_sample)

<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]], dtype=float32)>

これはどのように機能するのでしょうか？深く理解するためにコードを読むこともできますが、ここでは、ほとんどのユースケースに十分な簡単な概要を提供します。

最初の問題は、Y の分布にbatch_shape=[7] および event_shape=[] があったことで、Independent を使用して、バッチの次元をイベントの次元に変換しました。JDSAB は、要素の分布のバッチ形状を無視し、バッチ形状をモデルの全体的なプロパティとして扱い、[] と見なされます (batch_ndims > 0 を設定して特に指定されていない限り)。結果は、上記で手動で行ったように、tfd.Independent を使用して要素の分布の{nbsp}全バッチ次元をイベント次元に変換するのと同じです。
2 番目の問題は、m と b の形状を変換して、複数のサンプルを作成するときに X で適切にブロードキャストできるようにする必要があることでした。JDSAB では、単一のサンプルを生成するモデルを記述し、TensorFlow の vectorized_map を使用して、モデル全体を「リフト」して複数のサンプルを生成します。 (この機能は、JAX の vmap に似ています。)

バッチ形状の問題をより詳細に調査するために、元のエラーのある同時分布 jds、バッチごとに修正された分布 jds_i と jds_ia、および自動バッチ処理された jds_ab のバッチ形状を比較します。

jds.batch_shape

[TensorShape([]), TensorShape([]), TensorShape([7])]

jds_i.batch_shape

[TensorShape([]), TensorShape([]), TensorShape([])]

jds_ia.batch_shape

[TensorShape([]), TensorShape([]), TensorShape([])]

jds_ab.batch_shape

TensorShape([])

元の jds には、さまざまなバッチ形状の劣確率分布があることがわかります。jds_i と jds_ia では、同じ (空の) バッチ形状で劣確率分布を作成することにより、これを修正します。jds_ab には 1 つの (空の) バッチ形状があります。

JointDistributionSequentialAutoBatched はいくつかの追加の一般性を無料で提供しています。共変量 X (および暗黙的に観測値 Y) を 2 次元にするとします。

X = np.arange(14).reshape((2, 7))
X

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13]])

JointDistributionSequentialAutoBatched は変更なしで機能します (X の形状は jds_ab.log_prob によってキャッシュされるため、モデルを再定義する必要があります)。

jds_ab = tfd.JointDistributionSequentialAutoBatched([
    tfd.Normal(loc=0., scale=1.),   # m
    tfd.Normal(loc=0., scale=1.),   # b
    lambda b, m: tfd.Normal(loc=m*X + b, scale=1.) # Y
])

shaped_sample = jds_ab.sample([5, 3])
shaped_sample

[<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
 array([[ 0.1813647 , -0.85994506,  0.27593774],
        [-0.73323774,  1.1153806 ,  0.8841938 ],
        [ 0.5127983 , -0.29271227,  0.63733214],
        [ 0.2362284 , -0.919168  ,  1.6648189 ],
        [ 0.26317367,  0.73077047,  2.5395133 ]], dtype=float32)>,
 <tf.Tensor: shape=(5, 3), dtype=float32, numpy=
 array([[ 0.09636458,  2.0138032 , -0.5054413 ],
        [ 0.63941646, -1.0785882 , -0.6442188 ],
        [ 1.2310615 , -0.3293852 ,  0.77637213],
        [ 1.2115169 , -0.98906034, -0.07816773],
        [-1.1318136 ,  0.510014  ,  1.036522  ]], dtype=float32)>,
 <tf.Tensor: shape=(5, 3, 2, 7), dtype=float32, numpy=
 array([[[[-1.9685398e+00, -1.6832136e+00, -6.9127172e-01,
            8.5992378e-01, -5.3123581e-01,  3.1584005e+00,
            2.9044402e+00],
          [-2.5645006e-01,  3.1554163e-01,  3.1186538e+00,
            1.4272424e+00,  1.2843871e+00,  1.2266440e+00,
            1.2798605e+00]],
 
         [[ 1.5973477e+00, -5.3631151e-01,  6.8143606e-03,
           -1.4910895e+00, -2.1568544e+00, -2.0513713e+00,
           -3.1663666e+00],
          [-4.9448099e+00, -2.8385928e+00, -6.9027486e+00,
           -5.6543546e+00, -7.2378774e+00, -8.1577444e+00,
           -9.3582869e+00]],
 
         [[-2.1233239e+00,  5.8853775e-02,  1.2024102e+00,
            1.6622503e+00, -1.9197327e-01,  1.8647723e+00,
            6.4322817e-01],
          [ 3.7549341e-01,  1.5853541e+00,  2.4594500e+00,
            2.1952972e+00,  1.7517658e+00,  2.9666045e+00,
            2.5468128e+00]]],
 
 
        [[[ 8.9906776e-01,  6.7375046e-01,  7.3354661e-01,
           -9.9894643e-01, -3.4606690e+00, -3.4810467e+00,
           -4.4315586e+00],
          [-3.0670738e+00, -6.3628020e+00, -6.2538433e+00,
           -6.8091092e+00, -7.7134805e+00, -8.6319380e+00,
           -8.6904278e+00]],
 
         [[-2.2462025e+00, -3.3060855e-01,  1.8974400e-01,
            3.1422038e+00,  4.1483402e+00,  3.5642972e+00,
            4.8709240e+00],
          [ 4.7880130e+00,  5.8790064e+00,  9.6695948e+00,
            7.8112822e+00,  1.2022618e+01,  1.2411858e+01,
            1.4323385e+01]],
 
         [[-1.0189297e+00, -7.8115642e-01,  1.6466728e+00,
            8.2378983e-01,  3.0765080e+00,  3.0170646e+00,
            5.1899948e+00],
          [ 6.5285158e+00,  7.8038850e+00,  6.4155884e+00,
            9.0899811e+00,  1.0040427e+01,  9.1404457e+00,
            1.0411951e+01]]],
 
 
        [[[ 4.5557004e-01,  1.4905317e+00,  1.4904103e+00,
            2.9777462e+00,  2.8620450e+00,  3.4745665e+00,
            3.8295493e+00],
          [ 3.9977460e+00,  5.7173767e+00,  7.8421035e+00,
            6.3180594e+00,  6.0838981e+00,  8.2257290e+00,
            9.6548376e+00]],
 
         [[-7.0750320e-01, -3.5972297e-01,  4.3136525e-01,
           -2.3301599e+00, -5.0374687e-01, -2.8338656e+00,
           -3.4453444e+00],
          [-3.1258626e+00, -3.4687450e+00, -1.2045374e+00,
           -4.0196013e+00, -5.8831010e+00, -4.2965469e+00,
           -4.1388311e+00]],
 
         [[ 2.1969774e+00,  2.4614549e+00,  2.2314475e+00,
            1.8392437e+00,  2.8367062e+00,  4.8600502e+00,
            4.2273531e+00],
          [ 6.1879644e+00,  5.1792760e+00,  6.1141996e+00,
            5.6517797e+00,  8.9979610e+00,  7.5938139e+00,
            9.7918644e+00]]],
 
 
        [[[ 1.5249090e+00,  1.1388919e+00,  8.6903995e-01,
            3.0762129e+00,  1.5128503e+00,  3.5204377e+00,
            2.4760864e+00],
          [ 3.4166217e+00,  3.5930209e+00,  3.1694956e+00,
            4.5797420e+00,  4.5271711e+00,  2.8774328e+00,
            4.7288942e+00]],
 
         [[-2.3095846e+00, -2.0595703e+00, -3.0093951e+00,
           -3.8594103e+00, -4.9681158e+00, -6.4256043e+00,
           -5.5345035e+00],
          [-6.4306297e+00, -7.0924540e+00, -8.4075985e+00,
           -1.0417805e+01, -1.1727266e+01, -1.1196255e+01,
           -1.1333830e+01]],
 
         [[-7.0419472e-01,  1.4568675e+00,  3.7946482e+00,
            4.8489718e+00,  6.6498446e+00,  9.0224218e+00,
            1.1153137e+01],
          [ 1.0060651e+01,  1.1998097e+01,  1.5326431e+01,
            1.7957514e+01,  1.8323889e+01,  2.0160881e+01,
            2.1269085e+01]]],
 
 
        [[[-2.2360647e-01, -1.3632748e+00, -7.2704530e-01,
            2.3558271e-01, -1.0381399e+00,  1.9387857e+00,
           -3.3694571e-01],
          [ 1.6015106e-01,  1.5284677e+00, -4.8567140e-01,
           -1.7770648e-01,  2.1919653e+00,  1.3015286e+00,
            1.3877077e+00]],
 
         [[ 1.3688663e+00,  2.6602898e+00,  6.6657305e-01,
            4.6554832e+00,  5.7781887e+00,  4.9115267e+00,
            4.8446012e+00],
          [ 5.1983776e+00,  6.2297459e+00,  6.3848300e+00,
            8.4291229e+00,  7.1309576e+00,  1.0395646e+01,
            8.5736713e+00]],
 
         [[ 1.2675294e+00,  5.2844582e+00,  5.1331611e+00,
            8.9993315e+00,  1.0794343e+01,  1.4039831e+01,
            1.5731170e+01],
          [ 1.9084715e+01,  2.2191265e+01,  2.3481146e+01,
            2.5803375e+01,  2.8632090e+01,  3.0234968e+01,
            3.1886738e+01]]]], dtype=float32)>]

jds_ab.log_prob(shaped_sample)

<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[-28.90071 , -23.052422, -19.851362],
       [-19.775568, -25.894997, -20.302256],
       [-21.10754 , -23.667885, -20.973007],
       [-19.249458, -20.87892 , -20.573763],
       [-22.351208, -25.457762, -24.648403]], dtype=float32)>

一方、慎重に作成された JointDistributionSequential は機能しなくなりました。

jds_ia = tfd.JointDistributionSequential([
    tfd.Normal(loc=0., scale=1.),   # m
    tfd.Normal(loc=0., scale=1.),   # b
    lambda b, m: tfd.Independent(   # Y
        tfd.Normal(loc=m[..., tf.newaxis]*X + b[..., tf.newaxis], scale=1.),
        reinterpreted_batch_ndims=1)
])

try:
  jds_ia.sample([5, 3])
except tf.errors.InvalidArgumentError as e:
  print(e)

Incompatible shapes: [5,3,1] vs. [2,7] [Op:Mul]

これを修正するには、m と b の両方に 2 番目の tf.newaxis を追加して、形状に一致させ、Independent の呼び出しで reinterpreted_batch_ndims を 2 に増やす必要があります。この場合、自動バッチ処理に形状の問題を処理させる方が手早く簡単で、より人間工学的です。

繰り返しますが、このノートブックでは JointDistributionSequentialAutoBatched を見てきましたが、JointDistribution の他のバリアントには同等の AutoBatched があることに注意してください。(JointDistributionCoroutine を使用する場合、JointDistributionCoroutineAutoBatched には、Root ノードを指定する必要がなくなるという追加の利点があります。JointDistributionCoroutine を使用したことがない場合は、この説明を無視しても問題ありません。）

最後に

このノートブックでは、JointDistributionSequentialAutoBatched を紹介し、簡単な例を詳しく説明しました。TFP の形状と自動バッチ処理について理解を深めてもらえたら幸いです。