This implements the Deep Residual Network from:
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
Deep Residual Learning for Image Recognition.
(https://arxiv.org/pdf/1512.03385) and
Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas,
Tsung-Yi Lin, Jonathon Shlens, Barret Zoph.
Revisiting ResNets: Improved Training and Scaling Strategies.
(https://arxiv.org/abs/2103.07579).
A float of the depth multiplier to uniformaly scale up
all layers in channel size. This argument is also referred to as
width_multiplier in (https://arxiv.org/abs/2103.07579).
stem_type
A str of stem type of ResNet. Default to v0. If set to
v1, use ResNet-D type stem (https://arxiv.org/abs/1812.01187).
resnetd_shortcut
A bool of whether to use ResNet-D shortcut in
downsampling blocks.
replace_stem_max_pool
A bool of whether to replace the max pool in stem
with a stride-2 conv,
se_ratio
A float or None. Ratio of the Squeeze-and-Excitation layer.
init_stochastic_depth_rate
A float of initial stochastic depth rate.
scale_stem
A bool of whether to scale stem layers.
activation
A str name of the activation function.
use_sync_bn
If True, use synchronized batch normalization.
norm_momentum
A float of normalization momentum for the moving average.
norm_epsilon
A small float added to variance to avoid dividing by zero.
kernel_initializer
A str for kernel initializer of convolutional layers.