此页面由 Cloud Translation API 翻译。
Switch to English

使用TensorFlow服务训练和服务TensorFlow模型

本指南训练了一个神经网络模型来对衣服的图像进行分类,例如运动鞋和衬衫 ,保存训练后的模型,然后将其与TensorFlow Serving一起使用 。重点是TensorFlow Serving,而不是TensorFlow中的建模和训练,因此,有关专注于建模和训练的完整示例,请参见基本分类示例

本指南使用tf.keras (高级API)在TensorFlow中构建和训练模型。

import sys

# Confirm that we're using Python 3
assert sys.version_info.major is 3, 'Oops, not running Python 3. Use Runtime > Change runtime type'
# TensorFlow and tf.keras
print("Installing dependencies for Colab environment")
!pip install -Uq grpcio==1.26.0

import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
import os
import subprocess

print('TensorFlow version: {}'.format(tf.__version__))
Installing dependencies for Colab environment
[K     |████████████████████████████████| 2.4MB 4.6MB/s 
[?25hInstalling TensorFlow
TensorFlow 2.x selected.
TensorFlow version: 2.1.0-rc1

建立模型

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# scale the values to 0.0 to 1.0
train_images = train_images / 255.0
test_images = test_images / 255.0

# reshape for feeding into the model
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype))
print('test_images.shape: {}, of {}'.format(test_images.shape, test_images.dtype))
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step

train_images.shape: (60000, 28, 28, 1), of float64
test_images.shape: (10000, 28, 28, 1), of float64

训练和评估模型

让我们使用最简单的CNN,因为我们不关注建模部分。

model = keras.Sequential([
  keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3, 
                      strides=2, activation='relu', name='Conv1'),
  keras.layers.Flatten(),
  keras.layers.Dense(10, activation=tf.nn.softmax, name='Softmax')
])
model.summary()

testing = False
epochs = 5

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=epochs)

test_loss, test_acc = model.evaluate(test_images, test_labels)
print('\nTest accuracy: {}'.format(test_acc))
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 13, 13, 8)         80        
_________________________________________________________________
flatten (Flatten)            (None, 1352)              0         
_________________________________________________________________
Softmax (Dense)              (None, 10)                13530     
=================================================================
Total params: 13,610
Trainable params: 13,610
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 11s 185us/sample - loss: 0.5466 - accuracy: 0.8087
Epoch 2/5
60000/60000 [==============================] - 5s 79us/sample - loss: 0.4032 - accuracy: 0.8580
Epoch 3/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.3613 - accuracy: 0.8712
Epoch 4/5
60000/60000 [==============================] - 5s 75us/sample - loss: 0.3406 - accuracy: 0.8797
Epoch 5/5
60000/60000 [==============================] - 4s 75us/sample - loss: 0.3247 - accuracy: 0.8848
10000/10000 [==============================] - 1s 73us/sample - loss: 0.3510 - accuracy: 0.8747

Test accuracy: 0.8747000098228455

保存模型

要将训练后的模型加载到TensorFlow Serving中,我们首先需要将其保存为SavedModel格式。这将在定义明确的目录层次结构中创建一个protobuf文件,并将包括版本号。 TensorFlow Serving允许我们选择在进行推理请求时要使用的模型版本或“可服务”版本。每个版本将导出到给定路径下的不同子目录。

# Fetch the Keras session and save the model
# The signature definition is defined by the input and output tensors,
# and stored with the default serving key
import tempfile

MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))

tf.keras.models.save_model(
    model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

print('\nSaved model:')
!ls -l {export_path}
export_path = /tmp/1

Warning:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: /tmp/1/assets

Saved model:
total 84
drwxr-xr-x 2 root root  4096 Jan  7 23:15 assets
-rw-r--r-- 1 root root 74086 Jan  7 23:15 saved_model.pb
drwxr-xr-x 2 root root  4096 Jan  7 23:15 variables

检查您保存的模型

我们将使用命令行实用程序saved_model_clisaved_model_cli中查看MetaGraphDefs (模型)和SignatureDefs (您可以调用的方法)。请参阅TensorFlow指南中有关SavedModel CLI的讨论

saved_model_cli show --dir {export_path} --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['Conv1_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 28, 28, 1)
        name: serving_default_Conv1_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['Softmax'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Defined Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #3
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None
    Option #4
      Callable with:
        Argument #1
          Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None

  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input')

  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None
    Option #3
      Callable with:
        Argument #1
          Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None
    Option #4
      Callable with:
        Argument #1
          Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None

这告诉了我们很多关于我们的模型的信息!在这种情况下,我们只是训练了模型,所以我们已经知道了输入和输出,但是如果不这样做,这将是重要的信息。它并不能告诉我们所有信息,例如,这是灰度图像数据,但这是一个很好的开始。

使用TensorFlow Serving服务模型

将TensorFlow Serving发行版URI添加为软件包来源:

由于该Colab在Debian环境中运行,因此我们准备使用Aptitude安装TensorFlow Serving。我们将tensorflow-model-server软件包添加到Aptitude知道的软件包列表中。请注意,我们以root用户身份运行。

# This is the same as you would do from your command line, but without the [arch=amd64], and no sudo
# You would instead do:
# echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
# curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update
deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2943  100  2943    0     0  11496      0 --:--:-- --:--:-- --:--:-- 11496
OK
Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease [3,626 B]
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Get:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release [564 B]
Get:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release.gpg [833 B]
Hit:8 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Hit:9 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:10 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:11 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 Packages [354 B]
Get:12 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ Packages [81.6 kB]
Get:13 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:14 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server-universal amd64 Packages [364 B]
Get:15 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic InRelease [15.4 kB]
Get:17 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Packages [30.4 kB]
Get:18 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:19 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic/main Sources [1,749 kB]
Get:20 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [796 kB]
Get:21 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [1,073 kB]
Get:22 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [776 kB]
Get:23 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [21.3 kB]
Get:24 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [10.8 kB]
Get:25 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [1,324 kB]
Get:26 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [35.5 kB]
Get:27 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic/main amd64 Packages [844 kB]
Fetched 7,019 kB in 4s (1,913 kB/s)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
21 packages can be upgraded. Run 'apt list --upgradable' to see them.

安装TensorFlow服务

这就是您所需要的-一个命令行!

apt-get install tensorflow-model-server
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-430
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  tensorflow-model-server
0 upgraded, 1 newly installed, 0 to remove and 21 not upgraded.
Need to get 140 MB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 tensorflow-model-server all 2.0.0 [140 MB]
Fetched 140 MB in 2s (78.8 MB/s)
Selecting previously unselected package tensorflow-model-server.
(Reading database ... 145674 files and directories currently installed.)
Preparing to unpack .../tensorflow-model-server_2.0.0_all.deb ...
Unpacking tensorflow-model-server (2.0.0) ...
Setting up tensorflow-model-server (2.0.0) ...

开始运行TensorFlow服务

这是我们开始运行TensorFlow Serving并加载模型的地方。加载后,我们可以开始使用REST进行推理请求。有一些重要的参数:

  • rest_api_port :用于REST请求的端口。
  • model_name :您将在REST请求的URL中使用它。可以是任何东西。
  • model_base_path :这是保存模型的目录的路径。
os.environ["MODEL_DIR"] = MODEL_DIR
%%bash --bg 
nohup tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=fashion_model \
  --model_base_path="${MODEL_DIR}" >server.log 2>&1

Starting job # 0 in a separate thread.

tail server.log
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

在TensorFlow Serving中向您的模型提出请求

首先,让我们看一下测试数据中的一个随机示例。

def show(idx, title):
  plt.figure()
  plt.imshow(test_images[idx].reshape(28,28))
  plt.axis('off')
  plt.title('\n\n{}'.format(title), fontdict={'size': 16})

import random
rando = random.randint(0,len(test_images)-1)
show(rando, 'An Example Image: {}'.format(class_names[test_labels[rando]]))

png

好的,这看起来很有趣。这让您认识有多难?现在让我们为三个推理请求的批处理创建JSON对象,并查看我们的模型对事物的识别程度:

import json
data = json.dumps({"signature_name": "serving_default", "instances": test_images[0:3].tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))
Data: {"signature_name": "serving_default", "instances": ...  [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]]]}

发出REST请求

最新版本

我们将一个预测请求作为POST发送到我们服务器的REST端点,并将其传递给三个示例。我们将要求服务器通过不指定特定版本来提供可服务版本的最新版本。

!pip install -q requests

import requests
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/fashion_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']

show(0, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
  class_names[np.argmax(predictions[0])], np.argmax(predictions[0]), class_names[test_labels[0]], test_labels[0]))

png

servable的特定版本

现在,让我们指定我们的servable的特定版本。由于只有一个,所以我们选择版本1。我们还将查看所有三个结果。

headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/fashion_model/versions/1:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']

for i in range(0,3):
  show(i, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
    class_names[np.argmax(predictions[i])], np.argmax(predictions[i]), class_names[test_labels[i]], test_labels[i]))

png

png

png

导入Fashion MNIST数据集

本指南使用Fashion MNIST数据集,其中包含10个类别的70,000张灰度图像。图像显示了低分辨率(28 x 28像素)的单个衣​​物,如下所示:

时尚MNIST精灵
图1. Fashion-MNIST示例 (由Zalando,MIT许可)。

Fashion MNIST旨在替代经典MNIST数据集-通常用作计算机视觉的机器学习程序的“ Hello,World”。您可以直接从TensorFlow访问Fashion MNIST,只需导入和加载数据即可。