Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:mac_morpho')
  • Description:
Mac-Morpho is a corpus of Brazilian Portuguese texts annotated with part-of-speech tags.
Its first version was released in 2003 [1], and since then, two revisions have been made in order
to improve the quality of the resource [2, 3].
The corpus is available for download split into train, development and test sections.
These are 76%, 4% and 20% of the corpus total, respectively (the reason for the unusual numbers
is that the corpus was first split into 80%/20% train/test, and then 5% of the train section was
set aside for development). This split was used in [3], and new POS tagging research with Mac-Morpho
is encouraged to follow it in order to make consistent comparisons possible.

  • License: Creative Commons Attribution 4.0 International License
  • Version: 3.0.0
  • Splits:
Split Examples
'test' 9987
'train' 37948
'validation' 1997
  • Features:
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        "length": -1,
        "id": null,
        "_type": "Sequence"
    "pos_tags": {
        "feature": {
            "num_classes": 26,
            "names": [
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        "length": -1,
        "id": null,
        "_type": "Sequence"