About 263,000 results
Open links in new tab
  1. Quantization — PyTorch 2.11 documentation

    Oct 9, 2019 · 3. pt2e quantization has been migrated to torchao (pytorch/ao) see pytorch/ao#2259 for more details We plan to delete torch.ao.quantization in 2.10 if there are no blockers, or in the earliest …

  2. Neural Network Quantization in PyTorch | Practical ML

    Apr 16, 2025 · What is Quantization? Quantization is a model optimization technique that reduces the numerical precision used to represent weights and activations in deep learning models. Its primary …

  3. Quantization API Reference — PyTorch 2.11 documentation

    Jul 25, 2020 · torch.ao.quantization.backend_config # This module contains BackendConfig, a config object that defines how quantization is supported in a backend. Currently only used by FX Graph …

  4. Welcome to PyTorch Tutorials — PyTorch Tutorials 2.11.0+cu130 …

    Learn how to use torch.nn.utils.prune to sparsify your neural networks, and how to extend it to implement your own custom pruning technique.

  5. TensorRT-LLM/examples/quantization/README.md at main - GitHub

    TensorRT-LLM Quantization Toolkit Installation Guide Introduction This document introduces: The steps to install the TensorRT-LLM quantization toolkit. The Python APIs to quantize the models. The …

  6. PyTorch的量化 - 知乎 - 知乎专栏

    PyTorch 1.1的时候开始添加 torch.qint8 dtype 、 torch.quantize_linear 转换函数来开始对量化提供有限的实验性支持。 PyTorch 1.3开始正式支持量化,在可量化的Tensor之外,PyTorch开始支持 CNN 中 …

  7. GitHub - NVIDIA/Model-Optimizer: A unified library of SOTA model ...

    Mar 23, 2026 · NVIDIA Model Optimizer (referred to as Model Optimizer, or ModelOpt) is a library comprising state-of-the-art model optimization techniques including quantization, distillation, pruning, …

  8. GitHub - amd/Quark

    AMD Quark provides examples of Language Model and Image Classification model quantization, which can be found under examples/torch/ and examples/onnx/. These examples are documented here:

  9. Automatic Mixed Precision package - torch.ampPyTorch 2.11 …

    Jun 12, 2025 · Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses torch.autocast and torch.amp.GradScaler together, as shown in the Automatic Mixed Precision …

  10. Quantization - Hugging Face

    Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead …