{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "headers" }, "source": [ "Project: /overview/_project.yaml\n", "Book: /overview/_book.yaml\n", "\n", "\n", "\n", "\n", "\n", "\n", "{% comment %}\n", "The source of truth file can be found [here]: http://google3/zz\n", "{% endcomment %}" ] }, { "cell_type": "markdown", "metadata": { "id": "metadata" }, "source": [ "
在 TensorFlow.org 上查看 | \n", "在 Google Colab 中运行 | \n", "在 GitHub 上查看源代码 | \n", "下载笔记本 | \n", "查看 TF Hub 模型 | \n", "
tf.keras
的更高级的文本分类教程,请参阅 [MLCC 文本分类指南](https://developers.google.com/machine-learning/guides/text-classification/)。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2023-11-08T00:26:11.846301Z",
"iopub.status.busy": "2023-11-08T00:26:11.846069Z",
"iopub.status.idle": "2023-11-08T00:26:15.916636Z",
"shell.execute_reply": "2023-11-08T00:26:15.915589Z"
},
"id": "IHTzYqKZ7auw"
},
"outputs": [],
"source": [
"!pip install tensorflow-hub\n",
"!pip install tensorflow-datasets"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2023-11-08T00:26:15.921494Z",
"iopub.status.busy": "2023-11-08T00:26:15.920871Z",
"iopub.status.idle": "2023-11-08T00:26:19.650731Z",
"shell.execute_reply": "2023-11-08T00:26:19.649624Z"
},
"id": "2ew7HTbPpCJH"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-11-08 00:26:16.384421: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-08 00:26:16.384470: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-08 00:26:16.386118: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Version: 2.15.0-rc1\n",
"Eager mode: True\n",
"Hub version: 0.15.0\n",
"GPU is available\n"
]
}
],
"source": [
"import os\n",
"import numpy as np\n",
"\n",
"import tensorflow as tf\n",
"import tensorflow_hub as hub\n",
"import tensorflow_datasets as tfds\n",
"\n",
"print(\"Version: \", tf.__version__)\n",
"print(\"Eager mode: \", tf.executing_eagerly())\n",
"print(\"Hub version: \", hub.__version__)\n",
"print(\"GPU is\", \"available\" if tf.config.list_physical_devices(\"GPU\") else \"NOT AVAILABLE\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iAsKG535pHep"
},
"source": [
"## 下载 IMDB 数据集\n",
"\n",
"[IMDB 评论](https://github.com/tensorflow/datasets)或 [TensorFlow Datasets](https://tensorflow.google.cn/datasets) 上提供了 IMDB 数据集。以下代码可将 IMDB 数据集下载到您的机器(或 Colab 运行时)上:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2023-11-08T00:26:19.655476Z",
"iopub.status.busy": "2023-11-08T00:26:19.654509Z",
"iopub.status.idle": "2023-11-08T00:26:22.327134Z",
"shell.execute_reply": "2023-11-08T00:26:22.326306Z"
},
"id": "zXXx5Oc3pOmN"
},
"outputs": [],
"source": [
"# Split the training set into 60% and 40% to end up with 15,000 examples\n",
"# for training, 10,000 examples for validation and 25,000 examples for testing.\n",
"train_data, validation_data, test_data = tfds.load(\n",
" name=\"imdb_reviews\", \n",
" split=('train[:60%]', 'train[60%:]', 'test'),\n",
" as_supervised=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l50X3GfjpU4r"
},
"source": [
"## 探索数据\n",
"\n",
"我们花一点时间来了解数据的格式。每个样本都是一个代表电影评论的句子和一个相应的标签。句子未经过任何预处理。标签是一个整数值(0 或 1),其中 0 表示负面评价,1 表示正面评价。\n",
"\n",
"我们来打印下前十个样本。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2023-11-08T00:26:22.332137Z",
"iopub.status.busy": "2023-11-08T00:26:22.331393Z",
"iopub.status.idle": "2023-11-08T00:26:22.878168Z",
"shell.execute_reply": "2023-11-08T00:26:22.877399Z"
},
"id": "QtTS4kpEpjbi"
},
"outputs": [
{
"data": {
"text/plain": [
"