Semantic and hyper-parameters for a single feature.
tfdf.keras.FeatureSemantic] = None,
num_discretized_numerical_bins: Optional[int] = None,
max_vocab_count: Optional[int] = None,
min_vocab_frequency: Optional[int] = None,
override_global_imputation_value: Optional[str] = None,
tfdf.keras.core.MonotonicConstraint = None
Used in the notebooks
This class allows to
- Limit the input features of the model.
- Set manually the semantic of a feature.
- Specify feature specific hyper-parameters.
Note that the model's "features" argument is optional. If it is not specified,
all available feature will be used. See the "CoreModel" class
documentation for more details.
# A feature named "A". The semantic will be detected automatically. The
# global hyper-parameters of the model will be used.
feature_a = FeatureUsage(name="A")
# A feature named "C" representing a CATEGORICAL value.
# Specifying the semantic ensure the feature is correctly detected.
# In this case, the feature might be stored as an integer, and would have be
# detected as NUMERICAL.
feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL)
# A feature with a specific maximum dictionary size.
feature_c = FeatureUsage(name="C",
model = CoreModel(features=[feature_a, feature_b, feature_c])
The name of the feature. Used as an identifier if the dataset is a
dictionary of tensors.
Semantic of the feature. If None, the semantic is automatically
determined. The semantic controls how a feature is interpreted by a model.
Using the wrong semantic (e.g. numerical instead of categorical) will hurt
your model. See "FeatureSemantic" and "Semantic" for the definition of the
of available semantics.
For DISCRETIZED_NUMERICAL features only.
Number of bins used to discretize DISCRETIZED_NUMERICAL features.
For CATEGORICAL and CATEGORICAL_SET features only. Number
of unique categorical values stored as string. If more categorical values
are present, the least frequent values are grouped into a
Out-of-vocabulary item. Reducing the value can improve or hurt the model.
For CATEGORICAL and CATEGORICAL_SET features only.
Minimum number of occurence of a categorical value. Values present less
than "min_vocab_frequency" times in the training dataset are treated as
For CATEGORICAL and CATEGORICAL_SET
features only. If set, replaces the global imputation value used to handle
missing values. That is, at inference time, missing values will be treated
as "override_global_imputation_value". "override_global_imputation_value"
can only be used on categorical features and on columns not containing
missing values in the training dataset. If the algorithm used to handle
missing values is not "GLOBAL_IMPUTATION" (default algorithm), this value
Monotonic constraints between the feature and the model output.
None (default) for a non monotonic constrainted features.
Monotonic.INCREASING ensures the model is monotonically increasing with
Monotonic.DECREASING ensures the model is monotonically
decreasing with the features. Alternatively, you can also use
-1 to respectively define a non-constrained, monotonically
increasing, and monotonically decreasing feature.