TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

ag_news_subset

Description:

AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.

Additional Documentation: Explore on Papers With Code
Homepage: https://arxiv.org/abs/1509.01626
Source code: tfds.datasets.ag_news_subset.Builder
Versions:
- 1.0.0 (default): No release notes.
Download size: 11.24 MiB
Dataset size: 35.79 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	7,600
`'train'`	120,000

Feature structure:

FeaturesDict({
    'description': Text(shape=(), dtype=string),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'title': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
description	Text	string
label	ClassLabel	int64
title	Text	string

Supervised keys (See as_supervised doc): ('description', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@misc{zhang2015characterlevel,
    title={Character-level Convolutional Networks for Text Classification},
    author={Xiang Zhang and Junbo Zhao and Yann LeCun},
    year={2015},
    eprint={1509.01626},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

ag_news_subset Stay organized with collections Save and categorize content based on your preferences.

ag_news_subset