此页面由 Cloud Translation API 翻译。
Switch to English

使用合成图图正规化的情感分类

查看上TensorFlow.org GitHub上查看源代码

概观

这款笔记本进行分类电影评论为使用审查文本。这是分类,一个重要和广泛适用的一种机器学习问题的一个例子。

我们将通过从给定输入构建图形展示这款笔记本采用图形正规化。一般配方用于构建使用神经系统的学习(NSL)框架的曲线图,正则化模型,当输入不包含明确的曲线图如下:

  1. 创建输入每个文本样本的嵌入。这可以通过使用预训练模式,如做word2vec旋转BERT等。
  2. 通过使用相似度建立基于这些的嵌入的图度量诸如“L2”的距离,“余弦”距离等节点在图中对应于样品和边缘图中的对应于相似性对样品之间。
  3. 生成从上述合成的图形和样品特征的训练数据。得到的训练数据将包含除原来的节点功能邻居功能。
  4. 创建一个神经网络作为使用Keras顺序,功能性的,或亚类的API基础模型。
  5. 包裹与GraphRegularization包装类,其由NSL框架提供的基本模型,来创建一个新的图Keras模型。这种新的模式将包括一个图形正规化损失其培训目标的调整项。
  6. 培训和评估图表Keras模型。

要求

  1. 安装神经系统的学习包。
  2. 安装tensorflow毂。
pip install --quiet neural-structured-learning
pip install --quiet tensorflow-hub

依赖进口

 import matplotlib.pyplot as plt
import numpy as np

import neural_structured_learning as nsl

import tensorflow as tf
import tensorflow_hub as hub

# Resets notebook state
tf.keras.backend.clear_session()

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print(
    "GPU is",
    "available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE")
 
Version:  2.3.0
Eager mode:  True
Hub version:  0.8.0
GPU is NOT AVAILABLE

IMDB数据集

IMDB数据集包含从50,000电影评论文本互联网电影数据库 。这些被分成了培训25000个点评和测试25000条的评论。在训练和测试组是平衡的 ,这意味着它们含有的正面和负面评论的数量不变。

在本教程中,我们将使用IMDB数据集的预处理版本。

下载预处理IMDB数据集

在IMDB数据集来包装的TensorFlow。它已经被预处理,使得评(单词序列)已经被转换为整数,其中每个整数表示在字典中的特定词的序列。

下面的代码下载的IMDB数据集(或使用缓存副本,如果它已经被下载):

 imdb = tf.keras.datasets.imdb
(pp_train_data, pp_train_labels), (pp_test_data, pp_test_labels) = (
    imdb.load_data(num_words=10000))
 
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17465344/17464789 [==============================] - 0s 0us/step

这个论点num_words=10000保持在训练数据的前10,000个最频繁出现的词。罕见字废弃保持词汇管理的大小。

探索数据

让我们花点时间来了解数据的格式。数据集来预处理:每个实施例中是表示电影评论的话整数数组。每个标签是0或1的整数值,其中,0是负的综述,和1为正审查。

 print('Training entries: {}, labels: {}'.format(
    len(pp_train_data), len(pp_train_labels)))
training_samples_count = len(pp_train_data)
 
Training entries: 25000, labels: 25000

评论的文本已经转换为整数,其中每个整数表示在词典中的特定词。这里是第一次审查的样子:

 print(pp_train_data[0])
 
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

电影评论可能是不同的长度。下面的代码示出了在第一和第二评论的字的数量。由于输入到神经网络必须是相同的长度,我们需要解决这一点。

 len(pp_train_data[0]), len(pp_train_data[1])
 
(218, 189)

转换整数回字

这可能是有用的知道如何整数转换回相应的文本。在这里,我们将创建一个辅助函数来查询包含整数字符串映射字典对象:

 def build_reverse_word_index():
  # A dictionary mapping words to an integer index
  word_index = imdb.get_word_index()

  # The first indices are reserved
  word_index = {k: (v + 3) for k, v in word_index.items()}
  word_index['<PAD>'] = 0
  word_index['<START>'] = 1
  word_index['<UNK>'] = 2  # unknown
  word_index['<UNUSED>'] = 3
  return dict((value, key) for (key, value) in word_index.items())

reverse_word_index = build_reverse_word_index()

def decode_review(text):
  return ' '.join([reverse_word_index.get(i, '?') for i in text])
 
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step

现在,我们可以使用decode_review函数来显示的第一次审查的文字:

 decode_review(pp_train_data[0])
 
"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"

施工图

图表建设涉及创建文本样本的嵌入,然后使用类似性函数比较的嵌入。

在进一步讨论之前,我们先创建一个目录来存放本教程创建的工件。

mkdir -p /tmp/imdb

创建示例的嵌入

我们将使用预训练的旋转的嵌入在创建的嵌入tf.train.Example在输入每个样品格式。我们将在所得到的嵌入存储TFRecord格式具有附加的功能,表示各样品的ID一起。这是很重要的,并将使我们匹配样品的嵌入与后来在图形对应的节点。

 pretrained_embedding = 'https://  hub.tensorflow.google.cn  /google/tf2-preview/gnews-swivel-20dim/1'

hub_layer = hub.KerasLayer(
    pretrained_embedding, input_shape=[], dtype=tf.string, trainable=True)
 
 def _int64_feature(value):
  """Returns int64 tf.train.Feature."""
  return tf.train.Feature(int64_list=tf.train.Int64List(value=value.tolist()))


def _bytes_feature(value):
  """Returns bytes tf.train.Feature."""
  return tf.train.Feature(
      bytes_list=tf.train.BytesList(value=[value.encode('utf-8')]))


def _float_feature(value):
  """Returns float tf.train.Feature."""
  return tf.train.Feature(float_list=tf.train.FloatList(value=value.tolist()))


def create_embedding_example(word_vector, record_id):
  """Create tf.Example containing the sample's embedding and its ID."""

  text = decode_review(word_vector)

  # Shape = [batch_size,].
  sentence_embedding = hub_layer(tf.reshape(text, shape=[-1,]))

  # Flatten the sentence embedding back to 1-D.
  sentence_embedding = tf.reshape(sentence_embedding, shape=[-1])

  features = {
      'id': _bytes_feature(str(record_id)),
      'embedding': _float_feature(sentence_embedding.numpy())
  }
  return tf.train.Example(features=tf.train.Features(feature=features))


def create_embeddings(word_vectors, output_path, starting_record_id):
  record_id = int(starting_record_id)
  with tf.io.TFRecordWriter(output_path) as writer:
    for word_vector in word_vectors:
      example = create_embedding_example(word_vector, record_id)
      record_id = record_id + 1
      writer.write(example.SerializeToString())
  return record_id


# Persist TF.Example features containing embeddings for training data in
# TFRecord format.
create_embeddings(pp_train_data, '/tmp/imdb/embeddings.tfr', 0)
 
25000

构建图

现在,我们有样品的嵌入,我们将使用它们来构建一个相似图,即,在该图中的节点将对应于样品和边缘在该曲线图将对应于相似性对节点之间。

神经系统的学习提供了一个图形的建筑库来构建基于样本的嵌入的图表。它采用余弦相似的相似性度量它们之间的嵌入比较和构建边缘。它也允许我们指定相似性阈值,其可被用来丢弃从最终图形不同的边缘。在这个例子中,使用0.99作为相似性阈值,我们使用具有445327双向边的图结束。

 nsl.tools.build_graph(['/tmp/imdb/embeddings.tfr'],
                      '/tmp/imdb/graph_99.tsv',
                      similarity_threshold=0.99)
 

样品特征

我们创造的样品特征为使用我们的问题tf.train.Example格式和坚持他们TFRecord格式。每个样品将包括以下三个特点:

  1. ID:样品的节点ID。
  2. 关键词 :含字标识一个Int64列表。
  3. 标签 :一个单件的int64识别目标类的审查。
 def create_example(word_vector, label, record_id):
  """Create tf.Example containing the sample's word vector, label, and ID."""
  features = {
      'id': _bytes_feature(str(record_id)),
      'words': _int64_feature(np.asarray(word_vector)),
      'label': _int64_feature(np.asarray([label])),
  }
  return tf.train.Example(features=tf.train.Features(feature=features))

def create_records(word_vectors, labels, record_path, starting_record_id):
  record_id = int(starting_record_id)
  with tf.io.TFRecordWriter(record_path) as writer:
    for word_vector, label in zip(word_vectors, labels):
      example = create_example(word_vector, label, record_id)
      record_id = record_id + 1
      writer.write(example.SerializeToString())
  return record_id

# Persist TF.Example features (word vectors and labels) for training and test
# data in TFRecord format.
next_record_id = create_records(pp_train_data, pp_train_labels,
                                '/tmp/imdb/train_data.tfr', 0)
create_records(pp_test_data, pp_test_labels, '/tmp/imdb/test_data.tfr',
               next_record_id)
 
50000

与图邻国加强训练数据

由于我们的样本特征和合成图中,我们可以生成神经系统的学习增强训练数据。的NSL框架提供了一个库,以图表结合和样品特征以产生用于图形正规化最终训练数据。得到的训练数据将包括原始样品的功能,以及其相应的邻居的功能。

在本教程中,我们考虑无向边和使用最多的每个样品3楼的邻居,以增加训练数据与图表的邻居。

 nsl.tools.pack_nbrs(
    '/tmp/imdb/train_data.tfr',
    '',
    '/tmp/imdb/graph_99.tsv',
    '/tmp/imdb/nsl_train_data.tfr',
    add_undirected_edges=True,
    max_nbrs=3)
 

基本型号

我们现在已经准备好建立一个没有图表正则化示范基地。为了建立这个模型,我们既可以用在构建图中使用的嵌入物,或者我们也可以学习新的嵌入联合随着分类任务。对于这款笔记本的目的,我们将做后者。

全局变量

 NBR_FEATURE_PREFIX = 'NL_nbr_'
NBR_WEIGHT_SUFFIX = '_weight'
 

超参数

我们将使用的实例HParams到inclue用于训练和评估各种超参数和常量。我们简要地描述每个下面其中的一部分:

  • num_classes:有2类- 正面负面

  • max_seq_length:这是在这个例子中,每个电影评论认为单词的最大数量。

  • vocab_size:这是考虑这个例子中,词汇量的大小。

  • DISTANCE_TYPE:这是用于正规化与邻国样品距离度量。

  • graph_regularization_multiplier:此控制在整体的损失函数的曲线正则化项的相对权重。

  • num_neighbors:用于图形正规化邻居的数目。该值必须是小于或等于max_nbrs调用时上面使用参数nsl.tools.pack_nbrs

  • num_fc_units:单位在神经网络的完全连接层的数目。

  • train_epochs:训练时期的数量。

  • 的batch_size:用于培训和评估批量大小。

  • eval_steps:批次的工艺推定评估前的数字是完整的。如果设置为None ,测试集中的所有实例进行评估。

 class HParams(object):
  """Hyperparameters used for training."""
  def __init__(self):
    ### dataset parameters
    self.num_classes = 2
    self.max_seq_length = 256
    self.vocab_size = 10000
    ### neural graph learning parameters
    self.distance_type = nsl.configs.DistanceType.L2
    self.graph_regularization_multiplier = 0.1
    self.num_neighbors = 2
    ### model architecture
    self.num_embedding_dims = 16
    self.num_lstm_dims = 64
    self.num_fc_units = 64
    ### training parameters
    self.train_epochs = 10
    self.batch_size = 128
    ### eval parameters
    self.eval_steps = None  # All instances in the test set are evaluated.

HPARAMS = HParams()
 

准备数据

审查-的阵列整数-必须被送入神经网络之前被转换成张量。这种转换可以做到几个方面:

  • 转换数组到的矢量0秒和1指示字出现,类似于一个热编码秒。例如,序列[3, 5]将成为一个10000维矢量,即全零除了指数35 ,它们是那些。然后,让这个我们在第一层网络,一个Dense层,可以处理浮点矢量数据。这种方法需要占用大量内存,不过,需要num_words * num_reviews大小矩阵。

  • 可替代地,我们可以垫阵列,以便它们都具有相同的长度,然后创建形状的整数张量max_length * num_reviews 。我们可以使用能够处理这种形状作为我们网络中的第一层嵌入层。

在本教程中,我们将使用第二种方法。

由于电影评论必须是相同的长度,我们将使用pad_sequence下面定义的函数,以规范的长度。

 def make_dataset(file_path, training=False):
  """Creates a `tf.data.TFRecordDataset`.

  Args:
    file_path: Name of the file in the `.tfrecord` format containing
      `tf.train.Example` objects.
    training: Boolean indicating if we are in training mode.

  Returns:
    An instance of `tf.data.TFRecordDataset` containing the `tf.train.Example`
    objects.
  """

  def pad_sequence(sequence, max_seq_length):
    """Pads the input sequence (a `tf.SparseTensor`) to `max_seq_length`."""
    pad_size = tf.maximum([0], max_seq_length - tf.shape(sequence)[0])
    padded = tf.concat(
        [sequence.values,
         tf.fill((pad_size), tf.cast(0, sequence.dtype))],
        axis=0)
    # The input sequence may be larger than max_seq_length. Truncate down if
    # necessary.
    return tf.slice(padded, [0], [max_seq_length])

  def parse_example(example_proto):
    """Extracts relevant fields from the `example_proto`.

    Args:
      example_proto: An instance of `tf.train.Example`.

    Returns:
      A pair whose first value is a dictionary containing relevant features
      and whose second value contains the ground truth labels.
    """
    # The 'words' feature is a variable length word ID vector.
    feature_spec = {
        'words': tf.io.VarLenFeature(tf.int64),
        'label': tf.io.FixedLenFeature((), tf.int64, default_value=-1),
    }
    # We also extract corresponding neighbor features in a similar manner to
    # the features above during training.
    if training:
      for i in range(HPARAMS.num_neighbors):
        nbr_feature_key = '{}{}_{}'.format(NBR_FEATURE_PREFIX, i, 'words')
        nbr_weight_key = '{}{}{}'.format(NBR_FEATURE_PREFIX, i,
                                         NBR_WEIGHT_SUFFIX)
        feature_spec[nbr_feature_key] = tf.io.VarLenFeature(tf.int64)

        # We assign a default value of 0.0 for the neighbor weight so that
        # graph regularization is done on samples based on their exact number
        # of neighbors. In other words, non-existent neighbors are discounted.
        feature_spec[nbr_weight_key] = tf.io.FixedLenFeature(
            [1], tf.float32, default_value=tf.constant([0.0]))

    features = tf.io.parse_single_example(example_proto, feature_spec)

    # Since the 'words' feature is a variable length word vector, we pad it to a
    # constant maximum length based on HPARAMS.max_seq_length
    features['words'] = pad_sequence(features['words'], HPARAMS.max_seq_length)
    if training:
      for i in range(HPARAMS.num_neighbors):
        nbr_feature_key = '{}{}_{}'.format(NBR_FEATURE_PREFIX, i, 'words')
        features[nbr_feature_key] = pad_sequence(features[nbr_feature_key],
                                                 HPARAMS.max_seq_length)

    labels = features.pop('label')
    return features, labels

  dataset = tf.data.TFRecordDataset([file_path])
  if training:
    dataset = dataset.shuffle(10000)
  dataset = dataset.map(parse_example)
  dataset = dataset.batch(HPARAMS.batch_size)
  return dataset


train_dataset = make_dataset('/tmp/imdb/nsl_train_data.tfr', True)
test_dataset = make_dataset('/tmp/imdb/test_data.tfr')
 

构建模型

神经网络是通过堆叠创建层,这需要两个主要的架构决策:

  • 有多少层模型中使用?
  • 隐藏着多少单位使用每层?

在这个例子中,输入数据由字索引的阵列。预测的标记是0或1。

我们将使用一个双向LSTM在本教程中我们的基本模型。

 # This function exists as an alternative to the bi-LSTM model used in this
# notebook.
def make_feed_forward_model():
  """Builds a simple 2 layer feed forward neural network."""
  inputs = tf.keras.Input(
      shape=(HPARAMS.max_seq_length,), dtype='int64', name='words')
  embedding_layer = tf.keras.layers.Embedding(HPARAMS.vocab_size, 16)(inputs)
  pooling_layer = tf.keras.layers.GlobalAveragePooling1D()(embedding_layer)
  dense_layer = tf.keras.layers.Dense(16, activation='relu')(pooling_layer)
  outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense_layer)
  return tf.keras.Model(inputs=inputs, outputs=outputs)


def make_bilstm_model():
  """Builds a bi-directional LSTM model."""
  inputs = tf.keras.Input(
      shape=(HPARAMS.max_seq_length,), dtype='int64', name='words')
  embedding_layer = tf.keras.layers.Embedding(HPARAMS.vocab_size,
                                              HPARAMS.num_embedding_dims)(
                                                  inputs)
  lstm_layer = tf.keras.layers.Bidirectional(
      tf.keras.layers.LSTM(HPARAMS.num_lstm_dims))(
          embedding_layer)
  dense_layer = tf.keras.layers.Dense(
      HPARAMS.num_fc_units, activation='relu')(
          lstm_layer)
  outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense_layer)
  return tf.keras.Model(inputs=inputs, outputs=outputs)


# Feel free to use an architecture of your choice.
model = make_bilstm_model()
model.summary()
 
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
words (InputLayer)           [(None, 256)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 256, 16)           160000    
_________________________________________________________________
bidirectional (Bidirectional (None, 128)               41472     
_________________________________________________________________
dense (Dense)                (None, 64)                8256      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
=================================================================
Total params: 209,793
Trainable params: 209,793
Non-trainable params: 0
_________________________________________________________________

这些层被有效地顺序地堆叠来构建分类器:

  1. 所述第一层是一个Input层,其取整数编码的词汇。
  2. 接下来的层是Embedding层,它接受整数编码的词汇和查找每个字索引嵌入矢量。这些向量学习为模型火车。所述载体添加的尺寸到输出阵列。将得到的尺寸为: (batch, sequence, embedding)
  3. 接着,双向LSTM层返回对于每个实施例的固定长度输出向量。
  4. 这个固定长度的输出矢量是通过全连接(管道Dense带64个隐藏单元)层。
  5. 最后一层致密地结合单输出节点相连接。使用sigmoid的激活函数,该值是0和1之间的浮动,表示概率,或置信度。

隐藏单元

上面的模型具有两个中间或“隐藏的”层,输入和输出,和不包括之间Embedding层。的输出(单元,节点,或神经元)的数量是用于所述层中的表示空间的维数。换句话说,自由网络的量学习的内部表示时是允许的。

如果模型具有更多的隐藏单元(a更高维表示的空间),和/或多个层,则网络可以学习更复杂的表示。然而,它使得网络更加耗费计算,可能会导致不必要的学习模式,模式,提高培训数据,但不能对测试数据的性能。这就是所谓的过度拟合

损失函数和优化

模型需要一个损失函数和培训的优化。由于这是一个二元分类问题和模型输出的概率(以S活化的单单元层),我们将使用binary_crossentropy损失函数。

 model.compile(
    optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
 

创建验证组

训练时,我们要检查的数据还没有见过该模型的准确性。创建通过设置除了原有的训练数据的一小部分验证组 。 (为什么现在不使用测试集?我们的目标是仅使用训练数据来制定和调整我们的模型,然后用测试数据只是一次评估我们的精度)。

在本文中,我们需要大约10%的初始训练样本(25000 10%)作为训练标记数据,其余作为验证数据。由于初始火车/测试分裂为50/50(每25000个样本)时,有效火车/验证/检验拆分我们现在已经是5/45/50。

请注意,“train_dataset”已分批和洗牌。

 validation_fraction = 0.9
validation_size = int(validation_fraction *
                      int(training_samples_count / HPARAMS.batch_size))
print(validation_size)
validation_dataset = train_dataset.take(validation_size)
train_dataset = train_dataset.skip(validation_size)
 
175

训练模型

列车在小批量模型。虽然培训,监控模型的有效性检验定损和准确性:

 history = model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=HPARAMS.train_epochs,
    verbose=1)
 
Epoch 1/10

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/functional.py:543: UserWarning: Input dict contained keys ['NL_nbr_0_words', 'NL_nbr_1_words', 'NL_nbr_0_weight', 'NL_nbr_1_weight'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])

21/21 [==============================] - 19s 925ms/step - loss: 0.6930 - accuracy: 0.5092 - val_loss: 0.6924 - val_accuracy: 0.5006
Epoch 2/10
21/21 [==============================] - 19s 894ms/step - loss: 0.6890 - accuracy: 0.5465 - val_loss: 0.7294 - val_accuracy: 0.5698
Epoch 3/10
21/21 [==============================] - 19s 883ms/step - loss: 0.6785 - accuracy: 0.6208 - val_loss: 0.6489 - val_accuracy: 0.7043
Epoch 4/10
21/21 [==============================] - 19s 890ms/step - loss: 0.6592 - accuracy: 0.6400 - val_loss: 0.6523 - val_accuracy: 0.6866
Epoch 5/10
21/21 [==============================] - 19s 883ms/step - loss: 0.6413 - accuracy: 0.6923 - val_loss: 0.6335 - val_accuracy: 0.7004
Epoch 6/10
21/21 [==============================] - 21s 982ms/step - loss: 0.6053 - accuracy: 0.7188 - val_loss: 0.5716 - val_accuracy: 0.7183
Epoch 7/10
21/21 [==============================] - 18s 879ms/step - loss: 0.5204 - accuracy: 0.7619 - val_loss: 0.4511 - val_accuracy: 0.7930
Epoch 8/10
21/21 [==============================] - 19s 882ms/step - loss: 0.4719 - accuracy: 0.7758 - val_loss: 0.4244 - val_accuracy: 0.8094
Epoch 9/10
21/21 [==============================] - 18s 880ms/step - loss: 0.3695 - accuracy: 0.8431 - val_loss: 0.3567 - val_accuracy: 0.8487
Epoch 10/10
21/21 [==============================] - 19s 891ms/step - loss: 0.3504 - accuracy: 0.8500 - val_loss: 0.3219 - val_accuracy: 0.8652

评估模型

现在,让我们看看如何模型执行。两个值将被退回。损失(数字代表了我们的错误,值越小越好),和准确性。

 results = model.evaluate(test_dataset, steps=HPARAMS.eval_steps)
print(results)
 
196/196 [==============================] - 17s 85ms/step - loss: 0.4116 - accuracy: 0.8221
[0.4116455018520355, 0.8221200108528137]

创建精度/损失随时间变化的曲线图

model.fit()返回一个History ,其中包含与训练过程中发生的一切Dictionary对象:

 history_dict = history.history
history_dict.keys()
 
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

有四个项目:培训和验证过程中为每个监测指标。我们可以利用这些来绘制比较的训练和验证的损失,以及培训和验证准确度:

 acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(acc) + 1)

# "-r^" is for solid red line with triangle markers.
plt.plot(epochs, loss, '-r^', label='Training loss')
# "-b0" is for solid blue line with circle markers.
plt.plot(epochs, val_loss, '-bo', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc='best')

plt.show()
 

PNG

 plt.clf()   # clear figure

plt.plot(epochs, acc, '-r^', label='Training acc')
plt.plot(epochs, val_acc, '-bo', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='best')

plt.show()
 

PNG

注意训练损失与每个历元和与每个历元的训练精度增大 而减小 。这是预期使用梯度下降优化时,应该尽量减少在每个迭代上所需的数量。

图正规化

我们现在准备使用基本模型,我们建立了上述尝试图形正规化。我们将使用GraphRegularization由神经系统的学习框架来包装基地(双LSTM)模式,包括图形正规化提供包装类。的培训和评价图的正则化模型中的其余步骤是类似于基本模型。

创建图表的正规化模型

为了评估图正规化的增量效益,我们将创建一个新的基础模型实例。这是因为model已经训练了几个迭代和重用这个训练模型来创建一个图形的正规化模式将不会是一个公平的比较model

 # Build a new base LSTM model.
base_reg_model = make_bilstm_model()
 
 # Wrap the base model with graph regularization.
graph_reg_config = nsl.configs.make_graph_reg_config(
    max_neighbors=HPARAMS.num_neighbors,
    multiplier=HPARAMS.graph_regularization_multiplier,
    distance_type=HPARAMS.distance_type,
    sum_over_axis=-1)
graph_reg_model = nsl.keras.GraphRegularization(base_reg_model,
                                                graph_reg_config)
graph_reg_model.compile(
    optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
 

训练模型

 graph_reg_history = graph_reg_model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=HPARAMS.train_epochs,
    verbose=1)
 
Epoch 1/10

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:432: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

21/21 [==============================] - 22s 1s/step - loss: 0.6930 - accuracy: 0.5246 - scaled_graph_loss: 2.9800e-06 - val_loss: 0.6929 - val_accuracy: 0.4998
Epoch 2/10
21/21 [==============================] - 21s 988ms/step - loss: 0.6909 - accuracy: 0.5200 - scaled_graph_loss: 7.8452e-06 - val_loss: 0.6838 - val_accuracy: 0.5917
Epoch 3/10
21/21 [==============================] - 21s 980ms/step - loss: 0.6656 - accuracy: 0.6277 - scaled_graph_loss: 6.1205e-04 - val_loss: 0.6591 - val_accuracy: 0.6905
Epoch 4/10
21/21 [==============================] - 21s 981ms/step - loss: 0.6395 - accuracy: 0.6846 - scaled_graph_loss: 0.0016 - val_loss: 0.5860 - val_accuracy: 0.7171
Epoch 5/10
21/21 [==============================] - 21s 980ms/step - loss: 0.5388 - accuracy: 0.7573 - scaled_graph_loss: 0.0043 - val_loss: 0.4910 - val_accuracy: 0.7844
Epoch 6/10
21/21 [==============================] - 21s 989ms/step - loss: 0.4105 - accuracy: 0.8281 - scaled_graph_loss: 0.0146 - val_loss: 0.3353 - val_accuracy: 0.8612
Epoch 7/10
21/21 [==============================] - 21s 986ms/step - loss: 0.3416 - accuracy: 0.8681 - scaled_graph_loss: 0.0203 - val_loss: 0.4134 - val_accuracy: 0.8209
Epoch 8/10
21/21 [==============================] - 21s 981ms/step - loss: 0.4230 - accuracy: 0.8273 - scaled_graph_loss: 0.0144 - val_loss: 0.4755 - val_accuracy: 0.7696
Epoch 9/10
21/21 [==============================] - 22s 1s/step - loss: 0.4905 - accuracy: 0.7950 - scaled_graph_loss: 0.0080 - val_loss: 0.3862 - val_accuracy: 0.8382
Epoch 10/10
21/21 [==============================] - 21s 978ms/step - loss: 0.3384 - accuracy: 0.8754 - scaled_graph_loss: 0.0215 - val_loss: 0.3002 - val_accuracy: 0.8811

评估模型

 graph_reg_results = graph_reg_model.evaluate(test_dataset, steps=HPARAMS.eval_steps)
print(graph_reg_results)
 
196/196 [==============================] - 16s 84ms/step - loss: 0.3852 - accuracy: 0.8301
[0.385225385427475, 0.830079972743988]

创建精度/损失随时间变化的曲线图

 graph_reg_history_dict = graph_reg_history.history
graph_reg_history_dict.keys()
 
dict_keys(['loss', 'accuracy', 'scaled_graph_loss', 'val_loss', 'val_accuracy'])

总共有五个条目在词典:训练损耗,训练精度,训练图形丢失,验证损失和验证准确性。我们可以绘制它们放在一起进行比较。请注意,图损耗训练期间仅计算。

 acc = graph_reg_history_dict['accuracy']
val_acc = graph_reg_history_dict['val_accuracy']
loss = graph_reg_history_dict['loss']
graph_loss = graph_reg_history_dict['scaled_graph_loss']
val_loss = graph_reg_history_dict['val_loss']

epochs = range(1, len(acc) + 1)

plt.clf()   # clear figure

# "-r^" is for solid red line with triangle markers.
plt.plot(epochs, loss, '-r^', label='Training loss')
# "-gD" is for solid green line with diamond markers.
plt.plot(epochs, graph_loss, '-gD', label='Training graph loss')
# "-b0" is for solid blue line with circle markers.
plt.plot(epochs, val_loss, '-bo', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc='best')

plt.show()
 

PNG

 plt.clf()   # clear figure

plt.plot(epochs, acc, '-r^', label='Training acc')
plt.plot(epochs, val_acc, '-bo', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='best')

plt.show()
 

PNG

半监督学习的动力

半监督学习,更具体地说,在本教程的背景图正规化,可真厉害,当训练数据的量小。由于缺乏训练数据是通过利用训练样本,这是不可能在传统的监督学习之间的相似性补偿。

我们定义监督比率作为训练样本,样本的总数,其中包括培训,验证和测试样品的比例。在该笔记本中,我们使用的0.05(即,标记的数据的5%),用于训练两个基本模型以及图形-正则模型监管比率。说明我们的监督比对模型的准确性在下面的细胞的影响。

 # Accuracy values for both the Bi-LSTM model and the feed forward NN model have
# been precomputed for the following supervision ratios.

supervision_ratios = [0.3, 0.15, 0.05, 0.03, 0.02, 0.01, 0.005]

model_tags = ['Bi-LSTM model', 'Feed Forward NN model']
base_model_accs = [[84, 84, 83, 80, 65, 52, 50], [87, 86, 76, 74, 67, 52, 51]]
graph_reg_model_accs = [[84, 84, 83, 83, 65, 63, 50],
                        [87, 86, 80, 75, 67, 52, 50]]

plt.clf()  # clear figure

fig, axes = plt.subplots(1, 2)
fig.set_size_inches((12, 5))

for ax, model_tag, base_model_acc, graph_reg_model_acc in zip(
    axes, model_tags, base_model_accs, graph_reg_model_accs):

  # "-r^" is for solid red line with triangle markers.
  ax.plot(base_model_acc, '-r^', label='Base model')
  # "-gD" is for solid green line with diamond markers.
  ax.plot(graph_reg_model_acc, '-gD', label='Graph-regularized model')
  ax.set_title(model_tag)
  ax.set_xlabel('Supervision ratio')
  ax.set_ylabel('Accuracy(%)')
  ax.set_ylim((25, 100))
  ax.set_xticks(range(len(supervision_ratios)))
  ax.set_xticklabels(supervision_ratios)
  ax.legend(loc='best')

plt.show()
 
<Figure size 432x288 with 0 Axes>

PNG

可以观察到,随着superivision比率减小,模型的准确性也降低。这是真实的基础模型两者以及用于图形的正则化模型,而不考虑所用的模型结构。然而,请注意,图的正则化模型进行比两者的体系结构为基础的模型更好。特别地,对于双LSTM模型中,当监督比为0.01,图形-正规化模型的精度是比基础模型的高20%。这主要是因为半监督学习的图形的正则化模型,其中训练样本之间的结构相似,除了训练样本本身使用的。

结论

我们已经展示了采用使用即使输入不包含一个明确的图表中的神经系统的学习(NSL)框架图正规化。我们认为对我们合成根据审查的嵌入相似图IMDB电影评论的情感分类的任务。我们鼓励用户通过改变超参数,监督的量,并通过使用不同的模型结构进一步实验。