TFDS CLI

TFDS CLI, TensorFlow Veri Kümeleri ile kolayca çalışmak için çeşitli komutlar sağlayan bir komut satırı aracıdır.

TensorFlow.org'da görüntüleyin Google Colab'da çalıştırın Kaynağı GitHub'da görüntüleyin Not defterini indir
İçe aktarma sırasında TF günlüklerini devre dışı bırakın
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

Kurulum

CLI aracı, tensorflow-datasets (veya tfds-nightly ) ile birlikte yüklenir.

pip install -q tfds-nightly
tfds --version

Tüm CLI komutlarının listesi için:

tfds --help
tutucu3 l10n-yer
usage: tfds [-h] [--helpfull] [--version] {build,new} ...

Tensorflow Datasets CLI tool

optional arguments:
  -h, --help   show this help message and exit
  --helpfull   show full help message and exit
  --version    show program's version number and exit

command:
  {build,new}
    build      Commands for downloading and preparing datasets.
    new        Creates a new dataset directory from the template.

tfds new : Yeni bir Veri Kümesi uygulama

Bu komut, varsayılan uygulama dosyalarını içeren bir <dataset_name>/ dizini oluşturarak yeni Python veri kümenizi yazmaya başlamanıza yardımcı olacaktır.

Kullanım:

tfds new my_dataset
tutucu5 l10n-yer
2022-02-07 04:04:10.397902: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dataset generated at /tmpfs/src/temp/docs/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://www.tensorflow.org/datasets/add_dataset for additional details.

Oluşturacak:

ls -1 my_dataset/
tutucu7 l10n-yer
__init__.py
checksums.tsv
dummy_data/
my_dataset.py
my_dataset_test.py

Daha fazla bilgi için veri seti yazma kılavuzumuza bakın.

Mevcut seçenekler:

tfds new --help
tutucu9 l10n-yer
usage: tfds new [-h] [--helpfull] [--dir DIR] dataset_name

positional arguments:
  dataset_name  Name of the dataset to be created (in snake_case)

optional arguments:
  -h, --help    show this help message and exit
  --helpfull    show full help message and exit
  --dir DIR     Path where the dataset directory will be created. Defaults to
                current directory.

tfds build : Bir veri kümesini indirin ve hazırlayın

Yeni bir veri kümesi oluşturmak için tfds build <my_dataset> kullanın. <my_dataset> olabilir:

  • dataset/ klasörüne veya dataset.py dosyasına giden yol (geçerli dizin için boş):

    • tfds build datasets/my_dataset/
    • cd datasets/my_dataset/ && tfds build
    • cd datasets/my_dataset/ && tfds build my_dataset
    • cd datasets/my_dataset/ && tfds build my_dataset.py
  • Kayıtlı bir veri kümesi:

    • tfds build mnist
    • tfds build my_dataset --imports my_project.datasets

Mevcut seçenekler:

tfds build --help
tutucu11 l10n-yer
usage: tfds build [-h] [--helpfull]
                  [--datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]]
                  [--overwrite]
                  [--max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]]
                  [--data_dir DATA_DIR] [--download_dir DOWNLOAD_DIR]
                  [--extract_dir EXTRACT_DIR] [--manual_dir MANUAL_DIR]
                  [--add_name_to_manual_dir] [--config CONFIG]
                  [--config_idx CONFIG_IDX] [--imports IMPORTS]
                  [--register_checksums] [--force_checksums_validation]
                  [--beam_pipeline_options BEAM_PIPELINE_OPTIONS]
                  [--file_format FILE_FORMAT]
                  [--exclude_datasets EXCLUDE_DATASETS]
                  [--experimental_latest_version]
                  [datasets [datasets ...]]

positional arguments:
  datasets              Name(s) of the dataset(s) to build. Default to current
                        dir. See https://www.tensorflow.org/datasets/cli for
                        accepted values.

optional arguments:
  -h, --help            show this help message and exit
  --helpfull            show full help message and exit
  --datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]
                        Datasets can also be provided as keyword argument.

Debug & tests:
  --pdb Enter post-mortem debugging mode if an exception is raised.

  --overwrite           Delete pre-existing dataset if it exists.
  --max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]
                        When set, only generate the first X examples (default
                        to 1), rather than the full dataset.If set to 0, only
                        execute the `_split_generators` (which download the
                        original data), but skip `_generator_examples`

Paths:
  --data_dir DATA_DIR   Where to place datasets. Default to
                        `~/tensorflow_datasets/` or `TFDS_DATA_DIR`
                        environement variable.
  --download_dir DOWNLOAD_DIR
                        Where to place downloads. Default to
                        `<data_dir>/downloads/`.
  --extract_dir EXTRACT_DIR
                        Where to extract files. Default to
                        `<download_dir>/extracted/`.
  --manual_dir MANUAL_DIR
                        Where to manually download data (required for some
                        datasets). Default to `<download_dir>/manual/`.
  --add_name_to_manual_dir
                        If true, append the dataset name to the `manual_dir`
                        (e.g. `<download_dir>/manual/<dataset_name>/`. Useful
                        to avoid collisions if many datasets are generated.

Generation:
  --config CONFIG, -c CONFIG
                        Config name to build. Build all configs if not set.
  --config_idx CONFIG_IDX
                        Config id to build
                        (`builder_cls.BUILDER_CONFIGS[config_idx]`). Mutually
                        exclusive with `--config`.
  --imports IMPORTS, -i IMPORTS
                        Comma separated list of module to import to register
                        datasets.
  --register_checksums  If True, store size and checksum of downloaded files.
  --force_checksums_validation
                        If True, raise an error if the checksums are not
                        found.
  --beam_pipeline_options BEAM_PIPELINE_OPTIONS
                        A (comma-separated) list of flags to pass to
                        `PipelineOptions` when preparing with Apache Beam.
                        (see:
                        https://www.tensorflow.org/datasets/beam_datasets).
                        Example: `--beam_pipeline_options=job_name=my-
                        job,project=my-project`
  --file_format FILE_FORMAT
                        File format to which generate the tf-examples.
                        Available values: ['tfrecord', 'riegeli'] (see
                        `tfds.core.FileFormat`).

Automation:
  Used by automated scripts.

  --exclude_datasets EXCLUDE_DATASETS
                        If set, generate all datasets except the one defined
                        here. Comma separated list of datasets to exclude.
  --experimental_latest_version
                        Build the latest Version(experiments=...) available
                        rather than default version.