Semantic and hyper-parameters for a single feature.
View aliases
Main aliases
tfdf.keras.FeatureUsage(
name: str,
semantic: Optional[tfdf.keras.FeatureSemantic
] = None,
num_discretized_numerical_bins: Optional[int] = None,
max_vocab_count: Optional[int] = None,
min_vocab_frequency: Optional[int] = None,
override_global_imputation_value: Optional[str] = None,
monotonic: tfdf.keras.core.MonotonicConstraint
= None
)
Used in the notebooks
Used in the tutorials |
---|
This class allows to | |
---|---|
|
Note that the model's "features" argument is optional. If it is not specified, all available feature will be used. See the "CoreModel" class documentation for more details.
Usage example:
# A feature named "A". The semantic will be detected automatically. The
# global hyper-parameters of the model will be used.
feature_a = FeatureUsage(name="A")
# A feature named "C" representing a CATEGORICAL value.
# Specifying the semantic ensure the feature is correctly detected.
# In this case, the feature might be stored as an integer, and would have be
# detected as NUMERICAL.
feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL)
# A feature with a specific maximum dictionary size.
feature_c = FeatureUsage(name="C",
semantic=Semantic.CATEGORICAL,
max_vocab_count=32)
model = CoreModel(features=[feature_a, feature_b, feature_c])
Attributes | |
---|---|
name
|
The name of the feature. Used as an identifier if the dataset is a dictionary of tensors. |
semantic
|
Semantic of the feature. If None, the semantic is automatically determined. The semantic controls how a feature is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model. See "FeatureSemantic" and "Semantic" for the definition of the of available semantics. |
num_discretized_numerical_bins
|
For DISCRETIZED_NUMERICAL features only. Number of bins used to discretize DISCRETIZED_NUMERICAL features. |
max_vocab_count
|
For CATEGORICAL and CATEGORICAL_SET features only. Number of unique categorical values stored as string. If more categorical values are present, the least frequent values are grouped into a Out-of-vocabulary item. Reducing the value can improve or hurt the model. |
min_vocab_frequency
|
For CATEGORICAL and CATEGORICAL_SET features only. Minimum number of occurence of a categorical value. Values present less than "min_vocab_frequency" times in the training dataset are treated as "Out-of-vocabulary". |
override_global_imputation_value
|
For CATEGORICAL and CATEGORICAL_SET features only. If set, replaces the global imputation value used to handle missing values. That is, at inference time, missing values will be treated as "override_global_imputation_value". "override_global_imputation_value" can only be used on categorical features and on columns not containing missing values in the training dataset. If the algorithm used to handle missing values is not "GLOBAL_IMPUTATION" (default algorithm), this value is ignored. |
monotonic
|
Monotonic constraints between the feature and the model output.
Use None (default) for a non monotonic constrainted features.
Monotonic.INCREASING ensures the model is monotonically increasing with
the features. Monotonic.DECREASING ensures the model is monotonically
decreasing with the features. Alternatively, you can also use 0 , +1
and -1 to respectively define a non-constrained, monotonically
increasing, and monotonically decreasing feature.
|
guide
|