kddcup99
Stay organized with collections
Save and categorize content based on your preferences.
This is the data set used for The Third International Knowledge Discovery and
Data Mining Tools Competition, which was held in conjunction with KDD-99 The
Fifth International Conference on Knowledge Discovery and Data Mining. The
competition task was to build a network intrusion detector, a predictive model
capable of distinguishing between 'bad' connections, called intrusions or
attacks, and 'good' normal connections. This database contains a standard set of
data to be audited, which includes a wide variety of intrusions simulated in a
military network environment.
Split |
Examples |
'test' |
311,029 |
'train' |
4,898,431 |
FeaturesDict({
'count': int32,
'diff_srv_rate': float32,
'dst_bytes': int32,
'dst_host_count': int32,
'dst_host_diff_srv_rate': float32,
'dst_host_rerror_rate': float32,
'dst_host_same_src_port_rate': float32,
'dst_host_same_srv_rate': float32,
'dst_host_serror_rate': float32,
'dst_host_srv_count': int32,
'dst_host_srv_diff_host_rate': float32,
'dst_host_srv_rerror_rate': float32,
'dst_host_srv_serror_rate': float32,
'duration': int32,
'flag': ClassLabel(shape=(), dtype=int64, num_classes=11),
'hot': int32,
'is_guest_login': bool,
'is_hot_login': bool,
'label': ClassLabel(shape=(), dtype=int64, num_classes=40),
'land': bool,
'logged_in': bool,
'num_access_files': int32,
'num_compromised': int32,
'num_failed_logins': int32,
'num_file_creations': int32,
'num_outbound_cmds': int32,
'num_root': int32,
'num_shells': int32,
'protocol_type': ClassLabel(shape=(), dtype=int64, num_classes=3),
'rerror_rate': float32,
'root_shell': bool,
'same_srv_rate': float32,
'serror_rate': float32,
'service': ClassLabel(shape=(), dtype=int64, num_classes=71),
'src_bytes': int32,
'srv_count': int32,
'srv_diff_host_rate': float32,
'srv_rerror_rate': float32,
'srv_serror_rate': float32,
'su_attempted': int32,
'urgent': int32,
'wrong_fragment': int32,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
count |
Tensor |
|
int32 |
|
diff_srv_rate |
Tensor |
|
float32 |
|
dst_bytes |
Tensor |
|
int32 |
|
dst_host_count |
Tensor |
|
int32 |
|
dst_host_diff_srv_rate |
Tensor |
|
float32 |
|
dst_host_rerror_rate |
Tensor |
|
float32 |
|
dst_host_same_src_port_rate |
Tensor |
|
float32 |
|
dst_host_same_srv_rate |
Tensor |
|
float32 |
|
dst_host_serror_rate |
Tensor |
|
float32 |
|
dst_host_srv_count |
Tensor |
|
int32 |
|
dst_host_srv_diff_host_rate |
Tensor |
|
float32 |
|
dst_host_srv_rerror_rate |
Tensor |
|
float32 |
|
dst_host_srv_serror_rate |
Tensor |
|
float32 |
|
duration |
Tensor |
|
int32 |
|
flag |
ClassLabel |
|
int64 |
|
hot |
Tensor |
|
int32 |
|
is_guest_login |
Tensor |
|
bool |
|
is_hot_login |
Tensor |
|
bool |
|
label |
ClassLabel |
|
int64 |
|
land |
Tensor |
|
bool |
|
logged_in |
Tensor |
|
bool |
|
num_access_files |
Tensor |
|
int32 |
|
num_compromised |
Tensor |
|
int32 |
|
num_failed_logins |
Tensor |
|
int32 |
|
num_file_creations |
Tensor |
|
int32 |
|
num_outbound_cmds |
Tensor |
|
int32 |
|
num_root |
Tensor |
|
int32 |
|
num_shells |
Tensor |
|
int32 |
|
protocol_type |
ClassLabel |
|
int64 |
|
rerror_rate |
Tensor |
|
float32 |
|
root_shell |
Tensor |
|
bool |
|
same_srv_rate |
Tensor |
|
float32 |
|
serror_rate |
Tensor |
|
float32 |
|
service |
ClassLabel |
|
int64 |
|
src_bytes |
Tensor |
|
int32 |
|
srv_count |
Tensor |
|
int32 |
|
srv_diff_host_rate |
Tensor |
|
float32 |
|
srv_rerror_rate |
Tensor |
|
float32 |
|
srv_serror_rate |
Tensor |
|
float32 |
|
su_attempted |
Tensor |
|
int32 |
|
urgent |
Tensor |
|
int32 |
|
wrong_fragment |
Tensor |
|
int32 |
|
@misc{Dua:2019 ,
author = "Dua, Dheeru and Graff, Casey",
year = 2017,
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and
Computer Sciences"
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-01-04 UTC.
[null,null,["Last updated 2023-01-04 UTC."],[],[],null,["# kddcup99\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThis is the data set used for The Third International Knowledge Discovery and\nData Mining Tools Competition, which was held in conjunction with KDD-99 The\nFifth International Conference on Knowledge Discovery and Data Mining. The\ncompetition task was to build a network intrusion detector, a predictive model\ncapable of distinguishing between 'bad' connections, called intrusions or\nattacks, and 'good' normal connections. This database contains a standard set of\ndata to be audited, which includes a wide variety of intrusions simulated in a\nmilitary network environment.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/kdd-cup-1999-data-data-set)\n\n- **Homepage** :\n \u003chttps://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html\u003e\n\n- **Source code** :\n [`tfds.datasets.kddcup99.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/kddcup99/kddcup99_dataset_builder.py)\n\n- **Versions**:\n\n - `1.0.0`: Initial release.\n - **`1.0.1`** (default): Fixes parsing of boolean fields `land`, `logged_in`, `root_shell`, `is_hot_login` and `is_guest_login`.\n- **Download size** : `18.62 MiB`\n\n- **Dataset size** : `5.25 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|-----------|\n| `'test'` | 311,029 |\n| `'train'` | 4,898,431 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'count': int32,\n 'diff_srv_rate': float32,\n 'dst_bytes': int32,\n 'dst_host_count': int32,\n 'dst_host_diff_srv_rate': float32,\n 'dst_host_rerror_rate': float32,\n 'dst_host_same_src_port_rate': float32,\n 'dst_host_same_srv_rate': float32,\n 'dst_host_serror_rate': float32,\n 'dst_host_srv_count': int32,\n 'dst_host_srv_diff_host_rate': float32,\n 'dst_host_srv_rerror_rate': float32,\n 'dst_host_srv_serror_rate': float32,\n 'duration': int32,\n 'flag': ClassLabel(shape=(), dtype=int64, num_classes=11),\n 'hot': int32,\n 'is_guest_login': bool,\n 'is_hot_login': bool,\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=40),\n 'land': bool,\n 'logged_in': bool,\n 'num_access_files': int32,\n 'num_compromised': int32,\n 'num_failed_logins': int32,\n 'num_file_creations': int32,\n 'num_outbound_cmds': int32,\n 'num_root': int32,\n 'num_shells': int32,\n 'protocol_type': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'rerror_rate': float32,\n 'root_shell': bool,\n 'same_srv_rate': float32,\n 'serror_rate': float32,\n 'service': ClassLabel(shape=(), dtype=int64, num_classes=71),\n 'src_bytes': int32,\n 'srv_count': int32,\n 'srv_diff_host_rate': float32,\n 'srv_rerror_rate': float32,\n 'srv_serror_rate': float32,\n 'su_attempted': int32,\n 'urgent': int32,\n 'wrong_fragment': int32,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-----------------------------|--------------|-------|---------|-------------|\n| | FeaturesDict | | | |\n| count | Tensor | | int32 | |\n| diff_srv_rate | Tensor | | float32 | |\n| dst_bytes | Tensor | | int32 | |\n| dst_host_count | Tensor | | int32 | |\n| dst_host_diff_srv_rate | Tensor | | float32 | |\n| dst_host_rerror_rate | Tensor | | float32 | |\n| dst_host_same_src_port_rate | Tensor | | float32 | |\n| dst_host_same_srv_rate | Tensor | | float32 | |\n| dst_host_serror_rate | Tensor | | float32 | |\n| dst_host_srv_count | Tensor | | int32 | |\n| dst_host_srv_diff_host_rate | Tensor | | float32 | |\n| dst_host_srv_rerror_rate | Tensor | | float32 | |\n| dst_host_srv_serror_rate | Tensor | | float32 | |\n| duration | Tensor | | int32 | |\n| flag | ClassLabel | | int64 | |\n| hot | Tensor | | int32 | |\n| is_guest_login | Tensor | | bool | |\n| is_hot_login | Tensor | | bool | |\n| label | ClassLabel | | int64 | |\n| land | Tensor | | bool | |\n| logged_in | Tensor | | bool | |\n| num_access_files | Tensor | | int32 | |\n| num_compromised | Tensor | | int32 | |\n| num_failed_logins | Tensor | | int32 | |\n| num_file_creations | Tensor | | int32 | |\n| num_outbound_cmds | Tensor | | int32 | |\n| num_root | Tensor | | int32 | |\n| num_shells | Tensor | | int32 | |\n| protocol_type | ClassLabel | | int64 | |\n| rerror_rate | Tensor | | float32 | |\n| root_shell | Tensor | | bool | |\n| same_srv_rate | Tensor | | float32 | |\n| serror_rate | Tensor | | float32 | |\n| service | ClassLabel | | int64 | |\n| src_bytes | Tensor | | int32 | |\n| srv_count | Tensor | | int32 | |\n| srv_diff_host_rate | Tensor | | float32 | |\n| srv_rerror_rate | Tensor | | float32 | |\n| srv_serror_rate | Tensor | | float32 | |\n| su_attempted | Tensor | | int32 | |\n| urgent | Tensor | | int32 | |\n| wrong_fragment | Tensor | | int32 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @misc{Dua:2019 ,\n author = \"Dua, Dheeru and Graff, Casey\",\n year = 2017,\n title = \"{UCI} Machine Learning Repository\",\n url = \"http://archive.ics.uci.edu/ml\",\n institution = \"University of California, Irvine, School of Information and\n Computer Sciences\"\n }"]]