clinc_oos

説明:

タスク指向の対話システムは、クエリがサポートされているインテントの範囲外にある場合を知る必要がありますが、現在のテキスト分類コーパスは、すべての例をカバーするラベルセットのみを定義しています。範囲外 (OOS) のクエリ、つまり、システムがサポートする意図のいずれにも該当しないクエリを含む新しいデータセットを導入します。モデルは、推論時のすべてのクエリがシステムでサポートされているインテントクラスに属していると想定できないため、これは新たな課題をもたらします。私たちのデータセットは、10 のドメインにわたる 150 のインテントクラスもカバーしており、実稼働タスク指向のエージェントが処理しなければならない幅を捉えています。タスク駆動型ダイアログシステムでテキスト分類をより厳密かつ現実的にベンチマークする方法を提供します。

追加のドキュメント:コードを使用したペーパーの探索
ホームページ: https://github.com/clinc/oos-eval/
ソースコード: tfds.text.ClincOOS
バージョン:
- 0.1.0 (デフォルト): リリースノートはありません。
ダウンロードサイズ: 256.01 KiB
データセットサイズ: 3.40 MiB
自動キャッシュ(ドキュメント): はい
スプリット:

スプリット	例
`'test'`	4,500
`'test_oos'`	1,000
`'train'`	15,000
`'train_oos'`	100
`'validation'`	3,000
`'validation_oos'`	100

機能構造:

FeaturesDict({
    'domain': int32,
    'domain_name': Text(shape=(), dtype=string),
    'intent': int32,
    'intent_name': Text(shape=(), dtype=string),
    'text': Text(shape=(), dtype=string),
})

機能のドキュメント:

特徴	クラス	Dtype
	特徴辞書
ドメイン	テンソル	int32
ドメイン名	文章	ストリング
意図	テンソル	int32
インテント名	文章	ストリング
文章	文章	ストリング

監視されたキー( as_supervised docを参照): ('text', 'intent')
図( tfds.show_examples ): サポートされていません。
例( tfds.as_dataframe ):

引用：

@inproceedings{larson-etal-2019-evaluation,
    title = "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction",
    author = "Larson, Stefan  and
      Mahendran, Anish  and
      Peper, Joseph J.  and
      Clarke, Christopher  and
      Lee, Andrew  and
      Hill, Parker  and
      Kummerfeld, Jonathan K.  and
      Leach, Kevin  and
      Laurenzano, Michael A.  and
      Tang, Lingjia  and
      Mars, Jason",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1131",
    doi = "10.18653/v1/D19-1131",
    pages = "1311--1316",
}

clinc_oos コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

clinc_oos