TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

clue

参考：

afqmc

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/afqmc')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	3861
`'train'`	34334
`'validation'`	4316

特征：

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 2,
        "names": [
            "0",
            "1"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

tnews

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/tnews')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	10000
`'train'`	53360
`'validation'`	10000

特征：

{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 15,
        "names": [
            "100",
            "101",
            "102",
            "103",
            "104",
            "106",
            "107",
            "108",
            "109",
            "110",
            "112",
            "113",
            "114",
            "115",
            "116"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

iflytek

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/iflytek')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	2600
`'train'`	12133
`'validation'`	2599

特征：

{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 119,
        "names": [
            "0",
            "1",
            "2",
            "3",
            "4",
            "5",
            "6",
            "7",
            "8",
            "9",
            "10",
            "11",
            "12",
            "13",
            "14",
            "15",
            "16",
            "17",
            "18",
            "19",
            "20",
            "21",
            "22",
            "23",
            "24",
            "25",
            "26",
            "27",
            "28",
            "29",
            "30",
            "31",
            "32",
            "33",
            "34",
            "35",
            "36",
            "37",
            "38",
            "39",
            "40",
            "41",
            "42",
            "43",
            "44",
            "45",
            "46",
            "47",
            "48",
            "49",
            "50",
            "51",
            "52",
            "53",
            "54",
            "55",
            "56",
            "57",
            "58",
            "59",
            "60",
            "61",
            "62",
            "63",
            "64",
            "65",
            "66",
            "67",
            "68",
            "69",
            "70",
            "71",
            "72",
            "73",
            "74",
            "75",
            "76",
            "77",
            "78",
            "79",
            "80",
            "81",
            "82",
            "83",
            "84",
            "85",
            "86",
            "87",
            "88",
            "89",
            "90",
            "91",
            "92",
            "93",
            "94",
            "95",
            "96",
            "97",
            "98",
            "99",
            "100",
            "101",
            "102",
            "103",
            "104",
            "105",
            "106",
            "107",
            "108",
            "109",
            "110",
            "111",
            "112",
            "113",
            "114",
            "115",
            "116",
            "117",
            "118"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

cmnli

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/cmnli')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	13880
`'train'`	391783
`'validation'`	12241

特征：

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "neutral",
            "entailment",
            "contradiction"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

cluewsc2020

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/cluewsc2020')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	2574
`'train'`	1244
`'validation'`	304

特征：

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 2,
        "names": [
            "true",
            "false"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "target": {
        "span1_text": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "span2_text": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "span1_index": {
            "dtype": "int32",
            "id": null,
            "_type": "Value"
        },
        "span2_index": {
            "dtype": "int32",
            "id": null,
            "_type": "Value"
        }
    }
}

csl

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/csl')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	3000
`'train'`	20000
`'validation'`	3000

特征：

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "corpus_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "abst": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 2,
        "names": [
            "0",
            "1"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "keyword": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

cmrc2018

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/cmrc2018')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	2000
`'train'`	10142
`'trial'`	1002
`'validation'`	3219

特征：

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "context": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answers": {
        "feature": {
            "text": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "answer_start": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

drcd

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/drcd')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	3493
`'train'`	26936
`'validation'`	3524

特征：

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "context": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "answers": {
        "feature": {
            "text": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "answer_start": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

chid

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/chid')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	3447
`'train'`	84709
`'validation'`	3218

特征：

{
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "candidates": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "content": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "answers": {
        "feature": {
            "text": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "candidate_id": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

c3

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/c3')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	1625
`'train'`	11869
`'validation'`	3816

特征：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "context": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "choice": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ocnli

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/ocnli')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	3000
`'train'`	50437
`'validation'`	2950

特征：

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "neutral",
            "entailment",
            "contradiction"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}

diagnostics

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:clue/diagnostics')

说明：

CLUE, A Chinese Language Understanding Evaluation Benchmark
(https://www.cluebenchmarks.com/) is a collection of resources for training,
evaluating, and analyzing Chinese language understanding systems.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'test'`	514

特征：

{
    "sentence1": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence2": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "label": {
        "num_classes": 3,
        "names": [
            "neutral",
            "entailment",
            "contradiction"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    }
}