białko_sieć
Zadbaj o dobrą organizację dzięki kolekcji
Zapisuj i kategoryzuj treści zgodnie ze swoimi preferencjami.
ProteinNet to znormalizowany zestaw danych do uczenia maszynowego struktury białek. Zapewnia sekwencje białek, struktury (drugorzędowe i trzeciorzędowe), dopasowania wielu sekwencji (MSA), macierze punktacji specyficzne dla pozycji (PSSM) oraz standaryzowane podziały treningu / walidacji / testu. ProteinNet opiera się na przeprowadzanych co dwa lata ocenach CASP, które przeprowadzają ślepe prognozy niedawno rozwiązanych, ale publicznie niedostępnych struktur białek, aby zapewnić zestawy testów, które przesuwają granice metodologii obliczeniowej. Jest zorganizowana jako seria zestawów danych, obejmująca CASP 7 do 12 (obejmujących okres dziesięciu lat), aby zapewnić zakres rozmiarów zestawów danych, które umożliwiają ocenę nowych metod w systemach stosunkowo ubogich w dane i bogatych w dane.
FeaturesDict({
'evolutionary': Tensor(shape=(None, 21), dtype=float32),
'id': Text(shape=(), dtype=string),
'length': int32,
'mask': Tensor(shape=(None,), dtype=bool),
'primary': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=20)),
'tertiary': Tensor(shape=(None, 3), dtype=float32),
})
Funkcja | Klasa | Kształt | Typ D | Opis |
---|
| FunkcjeDict | | | |
ewolucyjny | Napinacz | (Brak, 21) | pływak32 | |
ID | Tekst | | strunowy | |
długość | Napinacz | | int32 | |
maska | Napinacz | (Nic,) | bool | |
podstawowy | Sekwencja (etykieta klasy) | (Nic,) | int64 | |
trzeciorzędowy | Napinacz | (Brak, 3) | pływak32 | |
@article{ProteinNet19,
title = { {ProteinNet}: a standardized data set for machine learning of protein structure},
author = {AlQuraishi, Mohammed},
journal = {BMC bioinformatics},
volume = {20},
number = {1},
pages = {1--10},
year = {2019},
publisher = {BioMed Central}
}
protein_net/casp7 (domyślna konfiguracja)
Rozdzielać | Przykłady |
---|
'test' | 93 |
'train_100' | 34557 |
'train_30' | 10333 |
'train_50' | 13024 |
'train_70' | 15207 |
'train_90' | 17611 |
'train_95' | 17 938 |
'validation' | 224 |
protein_net/casp8
Rozdzielać | Przykłady |
---|
'test' | 120 |
'train_100' | 48087 |
'train_30' | 13881 |
'train_50' | 17 970 |
'train_70' | 21191 |
'train_90' | 24556 |
'train_95' | 25035 |
'validation' | 224 |
protein_net/casp9
Rozdzielać | Przykłady |
---|
'test' | 116 |
'train_100' | 60350 |
'train_30' | 16973 |
'train_50' | 22172 |
'train_70' | 26263 |
'train_90' | 30513 |
'train_95' | 31128 |
'validation' | 224 |
protein_net/casp10
Rozdzielać | Przykłady |
---|
'test' | 95 |
'train_100' | 73116 |
'train_30' | 19495 |
'train_50' | 25 897 |
'train_70' | 31 001 |
'train_90' | 36258 |
'train_95' | 37033 |
'validation' | 224 |
protein_net/casp11
Rozdzielać | Przykłady |
---|
'test' | 81 |
'train_100' | 87573 |
'train_30' | 22344 |
'train_50' | 29 936 |
'train_70' | 36 005 |
'train_90' | 42507 |
'train_95' | 43544 |
'validation' | 224 |
protein_net/casp12
Rozdzielać | Przykłady |
---|
'test' | 40 |
'train_100' | 104 059 |
'train_30' | 25299 |
'train_50' | 34039 |
'train_70' | 41522 |
'train_90' | 49600 |
'train_95' | 50 914 |
'validation' | 224 |
O ile nie stwierdzono inaczej, treść tej strony jest objęta licencją Creative Commons – uznanie autorstwa 4.0, a fragmenty kodu są dostępne na licencji Apache 2.0. Szczegółowe informacje na ten temat zawierają zasady dotyczące witryny Google Developers. Java jest zastrzeżonym znakiem towarowym firmy Oracle i jej podmiotów stowarzyszonych.
Ostatnia aktualizacja: 2022-12-16 UTC.
[null,null,["Ostatnia aktualizacja: 2022-12-16 UTC."],[],[],null,["# protein_net\n\n\u003cbr /\u003e\n\n- **Description**:\n\nProteinNet is a standardized data set for machine learning of protein structure.\nIt provides protein sequences, structures (secondary and tertiary), multiple\nsequence alignments (MSAs), position-specific scoring matrices (PSSMs), and\nstandardized training / validation / test splits. ProteinNet builds on the\nbiennial CASP assessments, which carry out blind predictions of recently solved\nbut publicly unavailable protein structures, to provide test sets that push the\nfrontiers of computational methodology. It is organized as a series of data\nsets, spanning CASP 7 through 12 (covering a ten-year period), to provide a\nrange of data set sizes that enable assessment of new methods in relatively data\npoor and data rich regimes.\n\n- **Homepage** :\n \u003chttps://github.com/aqlaboratory/proteinnet\u003e\n\n- **Source code** :\n [`tfds.datasets.protein_net.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/protein_net/protein_net_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'evolutionary': Tensor(shape=(None, 21), dtype=float32),\n 'id': Text(shape=(), dtype=string),\n 'length': int32,\n 'mask': Tensor(shape=(None,), dtype=bool),\n 'primary': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=20)),\n 'tertiary': Tensor(shape=(None, 3), dtype=float32),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------|----------------------|------------|---------|-------------|\n| | FeaturesDict | | | |\n| evolutionary | Tensor | (None, 21) | float32 | |\n| id | Text | | string | |\n| length | Tensor | | int32 | |\n| mask | Tensor | (None,) | bool | |\n| primary | Sequence(ClassLabel) | (None,) | int64 | |\n| tertiary | Tensor | (None, 3) | float32 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('primary', 'tertiary')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{ProteinNet19,\n title = { {ProteinNet}: a standardized data set for machine learning of protein structure},\n author = {AlQuraishi, Mohammed},\n journal = {BMC bioinformatics},\n volume = {20},\n number = {1},\n pages = {1--10},\n year = {2019},\n publisher = {BioMed Central}\n }\n\nprotein_net/casp7 (default config)\n----------------------------------\n\n- **Download size** : `3.18 GiB`\n\n- **Dataset size** : `2.53 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 93 |\n| `'train_100'` | 34,557 |\n| `'train_30'` | 10,333 |\n| `'train_50'` | 13,024 |\n| `'train_70'` | 15,207 |\n| `'train_90'` | 17,611 |\n| `'train_95'` | 17,938 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp8\n-----------------\n\n- **Download size** : `4.96 GiB`\n\n- **Dataset size** : `3.55 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 120 |\n| `'train_100'` | 48,087 |\n| `'train_30'` | 13,881 |\n| `'train_50'` | 17,970 |\n| `'train_70'` | 21,191 |\n| `'train_90'` | 24,556 |\n| `'train_95'` | 25,035 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp9\n-----------------\n\n- **Download size** : `6.65 GiB`\n\n- **Dataset size** : `4.54 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 116 |\n| `'train_100'` | 60,350 |\n| `'train_30'` | 16,973 |\n| `'train_50'` | 22,172 |\n| `'train_70'` | 26,263 |\n| `'train_90'` | 30,513 |\n| `'train_95'` | 31,128 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp10\n------------------\n\n- **Download size** : `8.65 GiB`\n\n- **Dataset size** : `5.57 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 95 |\n| `'train_100'` | 73,116 |\n| `'train_30'` | 19,495 |\n| `'train_50'` | 25,897 |\n| `'train_70'` | 31,001 |\n| `'train_90'` | 36,258 |\n| `'train_95'` | 37,033 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp11\n------------------\n\n- **Download size** : `10.81 GiB`\n\n- **Dataset size** : `6.72 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 81 |\n| `'train_100'` | 87,573 |\n| `'train_30'` | 22,344 |\n| `'train_50'` | 29,936 |\n| `'train_70'` | 36,005 |\n| `'train_90'` | 42,507 |\n| `'train_95'` | 43,544 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp12\n------------------\n\n- **Download size** : `13.18 GiB`\n\n- **Dataset size** : `8.05 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 40 |\n| `'train_100'` | 104,059 |\n| `'train_30'` | 25,299 |\n| `'train_50'` | 34,039 |\n| `'train_70'` | 41,522 |\n| `'train_90'` | 49,600 |\n| `'train_95'` | 50,914 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]