此笔记本演示了 TF Hub 上可用的 BigBiGAN 模型。
BigBiGAN 通过添加可用于无监督表示学习的编码器模块,对标准 (Big)GAN 进行了扩展。大致来说,在给定实际数据 x
的情况下,编码器可以通过预测潜在的 z
来使生成器逆转。有关这些模型的更多信息,请参阅 arXiv 上的 BigBiGAN 论文 [1]。
- (可选)在下面的第一个代码单元中更新所选的
,为不同的编码器架构加载 BigBiGAN 生成器。 - 点击 Runtime > Run all 按顺序运行每个单元。然后,下方会自动显示输出(包括 BigBiGAN 样本的可视化和重构)。
注:如果遇到任何问题,可以点击 Runtime > Restart and run all...,重启运行时并从头开始运行所有单元。
首先,设置模块路径。默认情况下,我们从 <a href="https://tfhub.dev/deepmind/bigbigan-resnet50/1">https://tfhub.dev/deepmind/bigbigan-resnet50/1</a>
使用基于 ResNet-50 的较小编码器加载 BigBiGAN 模型。要加载基于 RevNet-50-x4 的较大模型以获得最佳的表示学习结果,请注释掉有效的 module_path
module_path = 'https://tfhub.dev/deepmind/bigbigan-resnet50/1' # ResNet-50
# module_path = 'https://tfhub.dev/deepmind/bigbigan-revnet50x4/1' # RevNet-50 x4
import io
import IPython.display
import PIL.Image
from pprint import pformat
import numpy as np
import tensorflow.compat.v1 as tf
import tensorflow_hub as hub
def imgrid(imarray, cols=4, pad=1, padval=255, row_major=True):
"""Lays out a [N, H, W, C] image array as a single image grid."""
pad = int(pad)
if pad < 0:
raise ValueError('pad must be non-negative')
cols = int(cols)
assert cols >= 1
N, H, W, C = imarray.shape
rows = N // cols + int(N % cols != 0)
batch_pad = rows * cols - N
assert batch_pad >= 0
post_pad = [batch_pad, pad, pad, 0]
pad_arg = [[0, p] for p in post_pad]
imarray = np.pad(imarray, pad_arg, 'constant', constant_values=padval)
H += pad
W += pad
grid = (imarray
.reshape(rows, cols, H, W, C)
.transpose(0, 2, 1, 3, 4)
.reshape(rows*H, cols*W, C))
if pad:
grid = grid[:-pad, :-pad]
return grid
def interleave(*args):
"""Interleaves input arrays of the same shape along the batch axis."""
if not args:
raise ValueError('At least one argument is required.')
a0 = args[0]
if any(a.shape != a0.shape for a in args):
raise ValueError('All inputs must have the same shape.')
if not a0.shape:
raise ValueError('Inputs must have at least one axis.')
out = np.transpose(args, [1, 0] + list(range(2, len(a0.shape) + 1)))
out = out.reshape(-1, *a0.shape[1:])
return out
def imshow(a, format='png', jpeg_fallback=True):
"""Displays an image in the given format."""
a = a.astype(np.uint8)
data = io.BytesIO()
PIL.Image.fromarray(a).save(data, format)
im_data = data.getvalue()
disp = IPython.display.display(IPython.display.Image(im_data))
except IOError:
if jpeg_fallback and format != 'jpeg':
print ('Warning: image was too large to display in format "{}"; '
'trying jpeg instead.').format(format)
return imshow(a, format='jpeg')
return disp
def image_to_uint8(x):
"""Converts [-1, 1] float array to [0, 255] uint8."""
x = np.asarray(x)
x = (256. / 2.) * (x + 1.)
x = np.clip(x, 0, 255)
x = x.astype(np.uint8)
return x
加载 BigBiGAN TF Hub 模块并显示其可用功能
# module = hub.Module(module_path, trainable=True, tags={'train'}) # training
module = hub.Module(module_path) # inference
for signature in module.get_signature_names():
print('Signature:', signature)
print('Inputs:', pformat(module.get_input_info_dict(signature)))
print('Outputs:', pformat(module.get_output_info_dict(signature)))
Signature: discriminate Inputs: {'x': <hub.ParsedTensorInfo shape=(?, 128, 128, 3) dtype=float32 is_sparse=False>, 'z': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>} Outputs: {'score_x': <hub.ParsedTensorInfo shape=(?,) dtype=float32 is_sparse=False>, 'score_xz': <hub.ParsedTensorInfo shape=(?,) dtype=float32 is_sparse=False>, 'score_z': <hub.ParsedTensorInfo shape=(?,) dtype=float32 is_sparse=False>} Signature: default Inputs: {'x': <hub.ParsedTensorInfo shape=(?, 256, 256, 3) dtype=float32 is_sparse=False>} Outputs: {'default': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>} Signature: generate Inputs: {'z': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>} Outputs: {'default': <hub.ParsedTensorInfo shape=(?, 128, 128, 3) dtype=float32 is_sparse=False>, 'upsampled': <hub.ParsedTensorInfo shape=(?, 256, 256, 3) dtype=float32 is_sparse=False>} Signature: encode Inputs: {'x': <hub.ParsedTensorInfo shape=(?, 256, 256, 3) dtype=float32 is_sparse=False>} Outputs: {'avepool_feat': <hub.ParsedTensorInfo shape=(?, 2048) dtype=float32 is_sparse=False>, 'bn_crelu_feat': <hub.ParsedTensorInfo shape=(?, 4096) dtype=float32 is_sparse=False>, 'default': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>, 'z_mean': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>, 'z_sample': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>, 'z_stdev': <hub.ParsedTensorInfo shape=(?, 120) dtype=float32 is_sparse=False>}
class BigBiGAN(object):
def __init__(self, module):
"""Initialize a BigBiGAN from the given TF Hub module."""
self._module = module
def generate(self, z, upsample=False):
"""Run a batch of latents z through the generator to generate images.
z: A batch of 120D Gaussian latents, shape [N, 120].
Returns: a batch of generated RGB images, shape [N, 128, 128, 3], range
[-1, 1].
outputs = self._module(z, signature='generate', as_dict=True)
return outputs['upsampled' if upsample else 'default']
def make_generator_ph(self):
"""Creates a tf.placeholder with the dtype & shape of generator inputs."""
info = self._module.get_input_info_dict('generate')['z']
return tf.placeholder(dtype=info.dtype, shape=info.get_shape())
def gen_pairs_for_disc(self, z):
"""Compute generator input pairs (G(z), z) for discriminator, given z.
z: A batch of latents (120D standard Gaussians), shape [N, 120].
Returns: a tuple (G(z), z) of discriminator inputs.
# Downsample 256x256 image x for 128x128 discriminator input.
x = self.generate(z)
return x, z
def encode(self, x, return_all_features=False):
"""Run a batch of images x through the encoder.
x: A batch of data (256x256 RGB images), shape [N, 256, 256, 3], range
[-1, 1].
return_all_features: If True, return all features computed by the encoder.
Otherwise (default) just return a sample z_hat.
Returns: the sample z_hat of shape [N, 120] (or a dict of all features if
outputs = self._module(x, signature='encode', as_dict=True)
return outputs if return_all_features else outputs['z_sample']
def make_encoder_ph(self):
"""Creates a tf.placeholder with the dtype & shape of encoder inputs."""
info = self._module.get_input_info_dict('encode')['x']
return tf.placeholder(dtype=info.dtype, shape=info.get_shape())
def enc_pairs_for_disc(self, x):
"""Compute encoder input pairs (x, E(x)) for discriminator, given x.
x: A batch of data (256x256 RGB images), shape [N, 256, 256, 3], range
[-1, 1].
Returns: a tuple (downsample(x), E(x)) of discriminator inputs.
# Downsample 256x256 image x for 128x128 discriminator input.
x_down = tf.nn.avg_pool(x, ksize=2, strides=2, padding='SAME')
z = self.encode(x)
return x_down, z
def discriminate(self, x, z):
"""Compute the discriminator scores for pairs of data (x, z).
(x, z) must be batches with the same leading batch dimension, and joint
scores are computed on corresponding pairs x[i] and z[i].
x: A batch of data (128x128 RGB images), shape [N, 128, 128, 3], range
[-1, 1].
z: A batch of latents (120D standard Gaussians), shape [N, 120].
A dict of scores:
score_xz: the joint scores for the (x, z) pairs.
score_x: the unary scores for x only.
score_z: the unary scores for z only.
inputs = dict(x=x, z=z)
return self._module(inputs, signature='discriminate', as_dict=True)
def reconstruct_x(self, x, use_sample=True, upsample=False):
"""Compute BigBiGAN reconstructions of images x via G(E(x)).
x: A batch of data (256x256 RGB images), shape [N, 256, 256, 3], range
[-1, 1].
use_sample: takes a sample z_hat ~ E(x). Otherwise, deterministically
use the mean. (Though a sample z_hat may be far from the mean z,
typically the resulting recons G(z_hat) and G(z) are very
upsample: if set, upsample the reconstruction to the input resolution
(256x256). Otherwise return the raw lower resolution generator output
Returns: a batch of recons G(E(x)), shape [N, 256, 256, 3] if
`upsample`, otherwise [N, 128, 128, 3].
if use_sample:
z = self.encode(x)
z = self.encode(x, return_all_features=True)['z_mean']
recons = self.generate(z, upsample=upsample)
return recons
def losses(self, x, z):
"""Compute per-module BigBiGAN losses given data & latent sample batches.
x: A batch of data (256x256 RGB images), shape [N, 256, 256, 3], range
[-1, 1].
z: A batch of latents (120D standard Gaussians), shape [M, 120].
For the original BigBiGAN losses, pass batches of size N=M=2048, with z's
sampled from a 120D standard Gaussian (e.g., np.random.randn(2048, 120)),
and x's sampled from the ImageNet (ILSVRC2012) training set with the
"ResNet-style" preprocessing from:
A dict of per-module losses:
disc: loss for the discriminator.
enc: loss for the encoder.
gen: loss for the generator.
# Compute discriminator scores on (x, E(x)) pairs.
# Downsample 256x256 image x for 128x128 discriminator input.
scores_enc_x_dict = self.discriminate(*self.enc_pairs_for_disc(x))
scores_enc_x = tf.concat([scores_enc_x_dict['score_xz'],
scores_enc_x_dict['score_z']], axis=0)
# Compute discriminator scores on (G(z), z) pairs.
scores_gen_z_dict = self.discriminate(*self.gen_pairs_for_disc(z))
scores_gen_z = tf.concat([scores_gen_z_dict['score_xz'],
scores_gen_z_dict['score_z']], axis=0)
disc_loss_enc_x = tf.reduce_mean(tf.nn.relu(1. - scores_enc_x))
disc_loss_gen_z = tf.reduce_mean(tf.nn.relu(1. + scores_gen_z))
disc_loss = disc_loss_enc_x + disc_loss_gen_z
enc_loss = tf.reduce_mean(scores_enc_x)
gen_loss = tf.reduce_mean(-scores_gen_z)
return dict(disc=disc_loss, enc=enc_loss, gen=gen_loss)
bigbigan = BigBiGAN(module)
# Make input placeholders for x (`enc_ph`) and z (`gen_ph`).
enc_ph = bigbigan.make_encoder_ph()
gen_ph = bigbigan.make_generator_ph()
# Compute samples G(z) from encoder input z (`gen_ph`).
gen_samples = bigbigan.generate(gen_ph)
# Compute reconstructions G(E(x)) of encoder input x (`enc_ph`).
recon_x = bigbigan.reconstruct_x(enc_ph, upsample=True)
# Compute encoder features used for representation learning evaluations given
# encoder input x (`enc_ph`).
enc_features = bigbigan.encode(enc_ph, return_all_features=True)
# Compute discriminator scores for encoder pairs (x, E(x)) given x (`enc_ph`)
# and generator pairs (G(z), z) given z (`gen_ph`).
disc_scores_enc = bigbigan.discriminate(*bigbigan.enc_pairs_for_disc(enc_ph))
disc_scores_gen = bigbigan.discriminate(*bigbigan.gen_pairs_for_disc(gen_ph))
# Compute losses.
losses = bigbigan.losses(enc_ph, gen_ph)
创建 TensorFlow 会话并初始化变量
init = tf.global_variables_initializer()
sess = tf.Session()
首先,我们对来自标准高斯(通过 np.random.randn
)的生成器输入 z
进行采样,并显示其生成的图像,从而对预训练 BigBiGAN 生成器的样本进行可视化。到目前为止,我们并没有超越标准 GAN 的上限,目前仅使用了生成器(并忽略了编码器)。
feed_dict = {gen_ph: np.random.randn(32, 120)}
_out_samples = sess.run(gen_samples, feed_dict=feed_dict)
print('samples shape:', _out_samples.shape)
imshow(imgrid(image_to_uint8(_out_samples), cols=4))
samples shape: (32, 128, 128, 3)
从 TF-Flowers 数据集加载 test_images
BigBiGAN 在 ImageNet 上进行了训练,但由于它太大而无法在本演示中使用,因此我们使用较小的 TF-Flowers [1] 数据集作为可视化重构和计算编码器特征的输入。
在下面的单元中,我们加载 TF-Flowers(如果需要,请下载数据集),并将 256x256 RGB 图像样本的固定批次存储在 NumPy 数组 test_images
[1] https://tensorflow.google.cn/datasets/catalog/tf_flowers
def get_flowers_data():
"""Returns a [32, 256, 256, 3] np.array of preprocessed TF-Flowers samples."""
import tensorflow_datasets as tfds
ds, info = tfds.load('tf_flowers', split='train', with_info=True)
# Just get the images themselves as we don't need labels for this demo.
ds = ds.map(lambda x: x['image'])
# Filter out small images (with minor edge length <256).
ds = ds.filter(lambda x: tf.reduce_min(tf.shape(x)[:2]) >= 256)
# Take the center square crop of the image and resize to 256x256.
def crop_and_resize(image):
imsize = tf.shape(image)[:2]
minor_edge = tf.reduce_min(imsize)
start = (imsize - minor_edge) // 2
stop = start + minor_edge
cropped_image = image[start[0] : stop[0], start[1] : stop[1]]
resized_image = tf.image.resize_bicubic([cropped_image], [256, 256])[0]
return resized_image
ds = ds.map(crop_and_resize)
# Convert images from [0, 255] uint8 to [-1, 1] float32.
ds = ds.map(lambda image: tf.cast(image, tf.float32) / (255. / 2.) - 1)
# Take the first 32 samples.
ds = ds.take(32)
return np.array(list(tfds.as_numpy(ds)))
test_images = get_flowers_data()
现在,我们通过编码器传递真实图像并通过生成器传回,以此方式在给定图像 x
的情况下计算 G(E(x))
,从而实现对 BigBiGAN 重构的可视化。输入图像 x
请注意,重构并不是输入图像的像素级完美匹配;相反,它们倾向于捕获输入的高级语义内容,同时“忽略”大部分低级细节。这表明 BigBiGAN 编码器可能会学习捕获关于图像的高级语义信息(即,我们希望在表示学习方法中看到的那些信息)的类型。
还要注意,256x256 输入图像的原始重构的分辨率为生成器生成的较低分辨率 (128x128)。出于可视化的目的,我们会对它们进行上采样。
test_images_batch = test_images[:16]
_out_recons = sess.run(recon_x, feed_dict={enc_ph: test_images_batch})
print('reconstructions shape:', _out_recons.shape)
inputs_and_recons = interleave(test_images_batch, _out_recons)
print('inputs_and_recons shape:', inputs_and_recons.shape)
imshow(imgrid(image_to_uint8(inputs_and_recons), cols=2))
reconstructions shape: (16, 256, 256, 3) inputs_and_recons shape: (32, 256, 256, 3)
_out_features = sess.run(enc_features, feed_dict={enc_ph: test_images_batch})
print('AvePool features shape:', _out_features['avepool_feat'].shape)
print('BN+CReLU features shape:', _out_features['bn_crelu_feat'].shape)
AvePool features shape: (16, 2048) BN+CReLU features shape: (16, 4096)
最后,我们将在编码器和生成器对的批次上计算判别器得分和损失。这些损失可以传递给优化器来训练 BigBiGAN。
我们将上述图像批次用作编码器输入 x
,将编码器得分作为 D(x, E(x))
进行计算。对于生成器输入,我们通过 np.random.randn
从 120D 标准高斯对 z
进行采样,将生成器得分作为 D(G(z), z)
判别器预测 (x, z)
对的联合得分 score_xz
,以及 x
和 z
的一元得分 score_x
和 score_z
。经过训练,它可为编码器对给出高(正)分,并为生成器对给出低(负)分。尽管一元 score_z
在两种情况下均为负值,但这对于下面的代码基本成立,这表明编码器输出 E(x)
feed_dict = {enc_ph: test_images, gen_ph: np.random.randn(32, 120)}
_out_scores_enc, _out_scores_gen, _out_losses = sess.run(
[disc_scores_enc, disc_scores_gen, losses], feed_dict=feed_dict)
print('Encoder scores:', {k: v.mean() for k, v in _out_scores_enc.items()})
print('Generator scores:', {k: v.mean() for k, v in _out_scores_gen.items()})
print('Losses:', _out_losses)
Encoder scores: {'score_x': 1.4621654, 'score_z': -0.5047065, 'score_xz': 0.6935733} Generator scores: {'score_x': -0.10094194, 'score_z': -0.37436545, 'score_xz': -0.55352914} Losses: {'disc': 1.5240736, 'enc': 0.5524961, 'gen': 0.3429455}