vllm.model_executor.layers.quantization.compressed_tensors.transform.schemes.linear_qutlass_nvfp4 ¶
QutlassNvFP4LinearMethod ¶
Bases: CompressedTensorsLinearTransformMethod
Source code in vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py
apply ¶
create_weights ¶
create_weights(
layer,
input_size_per_partition,
output_partition_sizes,
input_size,
output_size,
params_dtype,
**extra_weight_attrs,
)
Source code in vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py
is_qutlass_fp4_scheme ¶
is_qutlass_fp4_scheme(
quant_scheme: CompressedTensorsScheme | None,
input_tfms: dict[int, TransformTuple],
) -> bool