qm9
Stay organized with collections
Save and categorize content based on your preferences.
QM9 consists of computed geometric, energetic, electronic, and thermodynamic
properties for 134k stable small organic molecules made up of C, H, O, N, and F.
As usual, we remove the uncharacterized molecules and provide the remaining
130,831.
FeaturesDict({
'A': float32,
'B': float32,
'C': float32,
'Cv': float32,
'G': float32,
'G_atomization': float32,
'H': float32,
'H_atomization': float32,
'InChI': string,
'InChI_relaxed': string,
'Mulliken_charges': Tensor(shape=(29,), dtype=float32),
'SMILES': string,
'SMILES_relaxed': string,
'U': float32,
'U0': float32,
'U0_atomization': float32,
'U_atomization': float32,
'alpha': float32,
'charges': Tensor(shape=(29,), dtype=int64),
'frequencies': Tensor(shape=(None,), dtype=float32),
'gap': float32,
'homo': float32,
'index': int64,
'lumo': float32,
'mu': float32,
'num_atoms': int64,
'positions': Tensor(shape=(29, 3), dtype=float32),
'r2': float32,
'tag': string,
'zpve': float32,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
A |
Tensor |
|
float32 |
|
B |
Tensor |
|
float32 |
|
C |
Tensor |
|
float32 |
|
Cv |
Tensor |
|
float32 |
|
G |
Tensor |
|
float32 |
|
G_atomization |
Tensor |
|
float32 |
|
H |
Tensor |
|
float32 |
|
H_atomization |
Tensor |
|
float32 |
|
InChI |
Tensor |
|
string |
|
InChI_relaxed |
Tensor |
|
string |
|
Mulliken_charges |
Tensor |
(29,) |
float32 |
|
SMILES |
Tensor |
|
string |
|
SMILES_relaxed |
Tensor |
|
string |
|
U |
Tensor |
|
float32 |
|
U0 |
Tensor |
|
float32 |
|
U0_atomization |
Tensor |
|
float32 |
|
U_atomization |
Tensor |
|
float32 |
|
alpha |
Tensor |
|
float32 |
|
charges |
Tensor |
(29,) |
int64 |
|
frequencies |
Tensor |
(None,) |
float32 |
|
gap |
Tensor |
|
float32 |
|
homo |
Tensor |
|
float32 |
|
index |
Tensor |
|
int64 |
|
lumo |
Tensor |
|
float32 |
|
mu |
Tensor |
|
float32 |
|
num_atoms |
Tensor |
|
int64 |
|
positions |
Tensor |
(29, 3) |
float32 |
|
r2 |
Tensor |
|
float32 |
|
tag |
Tensor |
|
string |
|
zpve |
Tensor |
|
float32 |
|
@article{ramakrishnan2014quantum,
title={Quantum chemistry structures and properties of 134 kilo molecules},
author={Ramakrishnan, Raghunathan and Dral, Pavlo O and Rupp, Matthias and von Lilienfeld, O Anatole},
journal={Scientific Data},
volume={1},
year={2014},
publisher={Nature Publishing Group}
}
qm9/original (default config)
Config description: QM9 does not define any splits. So this variant puts
the full QM9 dataset in the train split, in the original order (no
shuffling).
Auto-cached
(documentation):
Only when shuffle_files=False
(train)
Splits:
Split |
Examples |
'train' |
130,831 |
qm9/cormorant
Split |
Examples |
'test' |
13,083 |
'train' |
100,000 |
'validation' |
17,748 |
qm9/dimenet
Split |
Examples |
'test' |
10,831 |
'train' |
110,000 |
'validation' |
10,000 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-12-11 UTC.
[null,null,["Last updated 2024-12-11 UTC."],[],[],null,["# qm9\n\n\u003cbr /\u003e\n\n- **Description**:\n\nQM9 consists of computed geometric, energetic, electronic, and thermodynamic\nproperties for 134k stable small organic molecules made up of C, H, O, N, and F.\nAs usual, we remove the uncharacterized molecules and provide the remaining\n130,831.\n\n- **Homepage** :\n \u003chttps://doi.org/10.6084/m9.figshare.c.978904.v5\u003e\n\n- **Source code** :\n [`tfds.datasets.qm9.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/qm9/qm9_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `82.62 MiB`\n\n- **Dataset size** : `177.16 MiB`\n\n- **Feature structure**:\n\n FeaturesDict({\n 'A': float32,\n 'B': float32,\n 'C': float32,\n 'Cv': float32,\n 'G': float32,\n 'G_atomization': float32,\n 'H': float32,\n 'H_atomization': float32,\n 'InChI': string,\n 'InChI_relaxed': string,\n 'Mulliken_charges': Tensor(shape=(29,), dtype=float32),\n 'SMILES': string,\n 'SMILES_relaxed': string,\n 'U': float32,\n 'U0': float32,\n 'U0_atomization': float32,\n 'U_atomization': float32,\n 'alpha': float32,\n 'charges': Tensor(shape=(29,), dtype=int64),\n 'frequencies': Tensor(shape=(None,), dtype=float32),\n 'gap': float32,\n 'homo': float32,\n 'index': int64,\n 'lumo': float32,\n 'mu': float32,\n 'num_atoms': int64,\n 'positions': Tensor(shape=(29, 3), dtype=float32),\n 'r2': float32,\n 'tag': string,\n 'zpve': float32,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------------|--------------|---------|---------|-------------|\n| | FeaturesDict | | | |\n| A | Tensor | | float32 | |\n| B | Tensor | | float32 | |\n| C | Tensor | | float32 | |\n| Cv | Tensor | | float32 | |\n| G | Tensor | | float32 | |\n| G_atomization | Tensor | | float32 | |\n| H | Tensor | | float32 | |\n| H_atomization | Tensor | | float32 | |\n| InChI | Tensor | | string | |\n| InChI_relaxed | Tensor | | string | |\n| Mulliken_charges | Tensor | (29,) | float32 | |\n| SMILES | Tensor | | string | |\n| SMILES_relaxed | Tensor | | string | |\n| U | Tensor | | float32 | |\n| U0 | Tensor | | float32 | |\n| U0_atomization | Tensor | | float32 | |\n| U_atomization | Tensor | | float32 | |\n| alpha | Tensor | | float32 | |\n| charges | Tensor | (29,) | int64 | |\n| frequencies | Tensor | (None,) | float32 | |\n| gap | Tensor | | float32 | |\n| homo | Tensor | | float32 | |\n| index | Tensor | | int64 | |\n| lumo | Tensor | | float32 | |\n| mu | Tensor | | float32 | |\n| num_atoms | Tensor | | int64 | |\n| positions | Tensor | (29, 3) | float32 | |\n| r2 | Tensor | | float32 | |\n| tag | Tensor | | string | |\n| zpve | Tensor | | float32 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{ramakrishnan2014quantum,\n title={Quantum chemistry structures and properties of 134 kilo molecules},\n author={Ramakrishnan, Raghunathan and Dral, Pavlo O and Rupp, Matthias and von Lilienfeld, O Anatole},\n journal={Scientific Data},\n volume={1},\n year={2014},\n publisher={Nature Publishing Group}\n }\n\nqm9/original (default config)\n-----------------------------\n\n- **Config description**: QM9 does not define any splits. So this variant puts\n the full QM9 dataset in the train split, in the original order (no\n shuffling).\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Only when `shuffle_files=False` (train)\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 130,831 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nqm9/cormorant\n-------------\n\n- **Config description** : Dataset split used by Cormorant. 100,000 train,\n 17,748 validation, and 13,083 test samples. Splitting happens after\n shuffling with seed 0. Paper: \u003chttps://arxiv.org/abs/1906.04015\u003e Split:\n \u003chttps://github.com/risilab/cormorant/blob/master/src/cormorant/data/prepare/qm9.py\u003e\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes (test, validation), Only when `shuffle_files=False` (train)\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 13,083 |\n| `'train'` | 100,000 |\n| `'validation'` | 17,748 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nqm9/dimenet\n-----------\n\n- **Config description** : Dataset split used by DimeNet. 110,000 train, 10,000\n validation, and 10,831 test samples. Splitting happens after shuffling with\n seed 42. Paper: \u003chttps://arxiv.org/abs/2003.03123\u003e Split:\n \u003chttps://github.com/gasteigerjo/dimenet/blob/master/dimenet/training/data_provider.py\u003e\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes (test, validation), Only when `shuffle_files=False` (train)\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 10,831 |\n| `'train'` | 110,000 |\n| `'validation'` | 10,000 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]