protein_net
Koleksiyonlar ile düzeninizi koruyun
İçeriği tercihlerinize göre kaydedin ve kategorilere ayırın.
ProteinNet, protein yapısının makine öğrenimi için standartlaştırılmış bir veri setidir. Protein dizileri, yapıları (ikincil ve üçüncül), çoklu dizi hizalamaları (MSA'lar), konuma özgü puanlama matrisleri (PSSM'ler) ve standartlaştırılmış eğitim / doğrulama / test bölmeleri sağlar. ProteinNet, hesaplama metodolojisinin sınırlarını zorlayan test setleri sağlamak için yakın zamanda çözülmüş ancak halka açık olmayan protein yapılarının kör tahminlerini gerçekleştiren iki yılda bir yapılan CASP değerlendirmelerine dayanıyor. Nispeten veri fakiri ve veri zengini rejimlerde yeni yöntemlerin değerlendirilmesine olanak tanıyan bir dizi veri seti boyutu sağlamak için CASP 7'den 12'ye uzanan (on yıllık bir dönemi kapsayan) bir dizi veri seti olarak düzenlenmiştir.
FeaturesDict({
'evolutionary': Tensor(shape=(None, 21), dtype=float32),
'id': Text(shape=(), dtype=string),
'length': int32,
'mask': Tensor(shape=(None,), dtype=bool),
'primary': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=20)),
'tertiary': Tensor(shape=(None, 3), dtype=float32),
})
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|
| ÖzelliklerDict | | | |
evrimsel | tensör | (Yok, 21) | şamandıra32 | |
İD | Metin | | sicim | |
uzunluk | tensör | | int32 | |
maske | tensör | (Hiçbiri,) | bool | |
öncelik | Sıra(SınıfEtiketi) | (Hiçbiri,) | int64 | |
üçüncül | tensör | (Yok, 3) | şamandıra32 | |
@article{ProteinNet19,
title = { {ProteinNet}: a standardized data set for machine learning of protein structure},
author = {AlQuraishi, Mohammed},
journal = {BMC bioinformatics},
volume = {20},
number = {1},
pages = {1--10},
year = {2019},
publisher = {BioMed Central}
}
protein_net/casp7 (varsayılan yapılandırma)
Bölmek | örnekler |
---|
'test' | 93 |
'train_100' | 34.557 |
'train_30' | 10.333 |
'train_50' | 13.024 |
'train_70' | 15.207 |
'train_90' | 17.611 |
'train_95' | 17.938 |
'validation' | 224 |
protein_net/casp8
Bölmek | örnekler |
---|
'test' | 120 |
'train_100' | 48.087 |
'train_30' | 13.881 |
'train_50' | 17.970 |
'train_70' | 21.191 |
'train_90' | 24.556 |
'train_95' | 25.035 |
'validation' | 224 |
protein_net/casp9
Bölmek | örnekler |
---|
'test' | 116 |
'train_100' | 60.350 |
'train_30' | 16.973 |
'train_50' | 22.172 |
'train_70' | 26.263 |
'train_90' | 30.513 |
'train_95' | 31.128 |
'validation' | 224 |
protein_net/casp10
Bölmek | örnekler |
---|
'test' | 95 |
'train_100' | 73.116 |
'train_30' | 19.495 |
'train_50' | 25.897 |
'train_70' | 31.001 |
'train_90' | 36.258 |
'train_95' | 37.033 |
'validation' | 224 |
protein_net/casp11
Bölmek | örnekler |
---|
'test' | 81 |
'train_100' | 87.573 |
'train_30' | 22.344 |
'train_50' | 29.936 |
'train_70' | 36.005 |
'train_90' | 42.507 |
'train_95' | 43.544 |
'validation' | 224 |
protein_net/casp12
Bölmek | örnekler |
---|
'test' | 40 |
'train_100' | 104.059 |
'train_30' | 25.299 |
'train_50' | 34.039 |
'train_70' | 41.522 |
'train_90' | 49.600 |
'train_95' | 50.914 |
'validation' | 224 |
Aksi belirtilmediği sürece bu sayfanın içeriği Creative Commons Atıf 4.0 Lisansı altında ve kod örnekleri Apache 2.0 Lisansı altında lisanslanmıştır. Ayrıntılı bilgi için Google Developers Site Politikaları'na göz atın. Java, Oracle ve/veya satış ortaklarının tescilli ticari markasıdır.
Son güncelleme tarihi: 2022-12-16 UTC.
[null,null,["Son güncelleme tarihi: 2022-12-16 UTC."],[],[],null,["# protein_net\n\n\u003cbr /\u003e\n\n- **Description**:\n\nProteinNet is a standardized data set for machine learning of protein structure.\nIt provides protein sequences, structures (secondary and tertiary), multiple\nsequence alignments (MSAs), position-specific scoring matrices (PSSMs), and\nstandardized training / validation / test splits. ProteinNet builds on the\nbiennial CASP assessments, which carry out blind predictions of recently solved\nbut publicly unavailable protein structures, to provide test sets that push the\nfrontiers of computational methodology. It is organized as a series of data\nsets, spanning CASP 7 through 12 (covering a ten-year period), to provide a\nrange of data set sizes that enable assessment of new methods in relatively data\npoor and data rich regimes.\n\n- **Homepage** :\n \u003chttps://github.com/aqlaboratory/proteinnet\u003e\n\n- **Source code** :\n [`tfds.datasets.protein_net.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/protein_net/protein_net_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'evolutionary': Tensor(shape=(None, 21), dtype=float32),\n 'id': Text(shape=(), dtype=string),\n 'length': int32,\n 'mask': Tensor(shape=(None,), dtype=bool),\n 'primary': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=20)),\n 'tertiary': Tensor(shape=(None, 3), dtype=float32),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------|----------------------|------------|---------|-------------|\n| | FeaturesDict | | | |\n| evolutionary | Tensor | (None, 21) | float32 | |\n| id | Text | | string | |\n| length | Tensor | | int32 | |\n| mask | Tensor | (None,) | bool | |\n| primary | Sequence(ClassLabel) | (None,) | int64 | |\n| tertiary | Tensor | (None, 3) | float32 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('primary', 'tertiary')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{ProteinNet19,\n title = { {ProteinNet}: a standardized data set for machine learning of protein structure},\n author = {AlQuraishi, Mohammed},\n journal = {BMC bioinformatics},\n volume = {20},\n number = {1},\n pages = {1--10},\n year = {2019},\n publisher = {BioMed Central}\n }\n\nprotein_net/casp7 (default config)\n----------------------------------\n\n- **Download size** : `3.18 GiB`\n\n- **Dataset size** : `2.53 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 93 |\n| `'train_100'` | 34,557 |\n| `'train_30'` | 10,333 |\n| `'train_50'` | 13,024 |\n| `'train_70'` | 15,207 |\n| `'train_90'` | 17,611 |\n| `'train_95'` | 17,938 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp8\n-----------------\n\n- **Download size** : `4.96 GiB`\n\n- **Dataset size** : `3.55 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 120 |\n| `'train_100'` | 48,087 |\n| `'train_30'` | 13,881 |\n| `'train_50'` | 17,970 |\n| `'train_70'` | 21,191 |\n| `'train_90'` | 24,556 |\n| `'train_95'` | 25,035 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp9\n-----------------\n\n- **Download size** : `6.65 GiB`\n\n- **Dataset size** : `4.54 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 116 |\n| `'train_100'` | 60,350 |\n| `'train_30'` | 16,973 |\n| `'train_50'` | 22,172 |\n| `'train_70'` | 26,263 |\n| `'train_90'` | 30,513 |\n| `'train_95'` | 31,128 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp10\n------------------\n\n- **Download size** : `8.65 GiB`\n\n- **Dataset size** : `5.57 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 95 |\n| `'train_100'` | 73,116 |\n| `'train_30'` | 19,495 |\n| `'train_50'` | 25,897 |\n| `'train_70'` | 31,001 |\n| `'train_90'` | 36,258 |\n| `'train_95'` | 37,033 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp11\n------------------\n\n- **Download size** : `10.81 GiB`\n\n- **Dataset size** : `6.72 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 81 |\n| `'train_100'` | 87,573 |\n| `'train_30'` | 22,344 |\n| `'train_50'` | 29,936 |\n| `'train_70'` | 36,005 |\n| `'train_90'` | 42,507 |\n| `'train_95'` | 43,544 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nprotein_net/casp12\n------------------\n\n- **Download size** : `13.18 GiB`\n\n- **Dataset size** : `8.05 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 40 |\n| `'train_100'` | 104,059 |\n| `'train_30'` | 25,299 |\n| `'train_50'` | 34,039 |\n| `'train_70'` | 41,522 |\n| `'train_90'` | 49,600 |\n| `'train_95'` | 50,914 |\n| `'validation'` | 224 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]