ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

卷积神经网络(Convolutional Neural Network, CNN)

在 tensorFlow.google.cn 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码 下载 notebook

本教程展示了如何训练一个简单的卷积神经网络 (CNN) 来对 CIFAR 图像进行分类。由于本教程使用的是 Keras Sequential API,创建和训练模型只需要几行代码。

导入 TensorFlow

import tensorflow as tf

from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

下载并准备 CIFAR10 数据集

CIFAR10 数据集包含 10 类,共 60000 张彩色图片,每类图片有 6000 张。此数据集中 50000 个样例被作为训练集,剩余 10000 个样例作为测试集。类之间相互独立,不存在重叠的部分。

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 11s 0us/step
170508288/170498071 [==============================] - 11s 0us/step

验证数据

为了验证数据集看起来是否正确,我们绘制训练集中的前 25 张图像并在每张图像下方显示类名称:

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i])
    # The CIFAR labels happen to be arrays, 
    # which is why you need the extra index
    plt.xlabel(class_names[train_labels[i][0]])
plt.show()

png

构造卷积神经网络模型

下方展示的 6 行代码声明了了一个常见卷积神经网络,由几个 Conv2DMaxPooling2D 层组成。

CNN 将形状为 (image_height, image_width, color_channels) 的张量作为输入,忽略批次大小。如果您不熟悉这些维度,color_channels 是指 (R,G,B)。在此示例中,您将配置 CNN 以处理形状为 (32, 32, 3) 的输入,即 CIFAR 图像的格式。您可以通过将参数 input_shape 传递给第一层来实现此目的。

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
2021-08-13 23:29:11.828389: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:11.834948: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:11.835837: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:11.837560: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-13 23:29:11.838139: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:11.839052: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:11.839889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:12.414772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:12.415693: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:12.416590: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-13 23:29:12.417423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14648 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0

到目前为止,模型的架构如下:

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
=================================================================
Total params: 56,320
Trainable params: 56,320
Non-trainable params: 0
_________________________________________________________________

在上面的结构中,您可以看到每个 Conv2D 和 MaxPooling2D 层的输出都是一个三维的张量 (Tensor),其形状描述了 (height, width, channels)。越深的层中,宽度和高度都会收缩。每个 Conv2D 层输出的通道数量 (channels) 取决于声明层时的第一个参数(如:上面代码中的 32 或 64)。这样,由于宽度和高度的收缩,您便可以(从运算的角度)增加每个 Conv2D 层输出的通道数量 (channels)。

增加 Dense 层

为了完成模型,您需要将卷积基(形状为 (4, 4, 64))的最后一个输出张量馈送到一个或多个 Dense 层以执行分类。Dense 层将向量作为输入(即 1 维),而当前输出为 3 维张量。首先,将 3 维输出展平(或展开)为 1 维,然后在顶部添加一个或多个 Dense 层。CIFAR 有 10 个输出类,因此使用具有 10 个输出的最终 Dense 层。

model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

下面是模型的完整架构:

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 1024)              0         
_________________________________________________________________
dense (Dense)                (None, 64)                65600     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                650       
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________

网络摘要显示 (4, 4, 64) 输出在经过两个 Dense 层之前被展平为形状为 (1024) 的向量。

编译并训练模型

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10, 
                    validation_data=(test_images, test_labels))
2021-08-13 23:29:14.408866: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2021-08-13 23:29:15.380587: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8100
2021-08-13 23:29:15.907573: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
1563/1563 [==============================] - 8s 4ms/step - loss: 1.5172 - accuracy: 0.4504 - val_loss: 1.2841 - val_accuracy: 0.5363
Epoch 2/10
1563/1563 [==============================] - 5s 3ms/step - loss: 1.1503 - accuracy: 0.5940 - val_loss: 1.1014 - val_accuracy: 0.6110
Epoch 3/10
1563/1563 [==============================] - 5s 3ms/step - loss: 1.0032 - accuracy: 0.6456 - val_loss: 0.9837 - val_accuracy: 0.6519
Epoch 4/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.9108 - accuracy: 0.6812 - val_loss: 0.9624 - val_accuracy: 0.6644
Epoch 5/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.8434 - accuracy: 0.7027 - val_loss: 1.0100 - val_accuracy: 0.6579
Epoch 6/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.7845 - accuracy: 0.7260 - val_loss: 0.8792 - val_accuracy: 0.6950
Epoch 7/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.7466 - accuracy: 0.7363 - val_loss: 0.8500 - val_accuracy: 0.7090
Epoch 8/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.7040 - accuracy: 0.7530 - val_loss: 0.8558 - val_accuracy: 0.7053
Epoch 9/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.6688 - accuracy: 0.7652 - val_loss: 0.8817 - val_accuracy: 0.7115
Epoch 10/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.6387 - accuracy: 0.7736 - val_loss: 0.8384 - val_accuracy: 0.7170

评估模型

plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
plt.show()

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

png

313/313 - 1s - loss: 0.8384 - accuracy: 0.7170
print(test_acc)
0.7170000076293945

您的简单 CNN 的测试准确率已达到 70% 以上。对于只有几行的代码来说,效果不错!对于另一种 CNN 风格,请参阅适合专家的 TensorFlow 2 快速入门示例,此示例使用了 Keras 子类化 API 和 tf.GradientTape