Unclaimed project

Are you a maintainer of pytorch? Claim this project to take control of your public changelog and roadmap.

Changelog

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

pytorch/pytorch·

97k27kPythonNOASSERTION

·Website

autograddeep-learninggpumachine-learningneural-networknumpy+2

Last updated about 2 months ago

Back to changelog

NewApril 23, 2025

PyTorch 2.7.0 Release

PyTorch 2.7.0 Release Notes

Highlights
Tracked Regressions
Backwards Incompatible Changes
Deprecations
New Features

More Python Projects

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

182.9k

Python

stable-diffusion-webui

Stable Diffusion web UI

162.0k

Python

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

torch.onnx.dynamo_export(model, *args, **kwargs)

torch.onnx.export(model, args, kwargs=kwargs, dynamo=True)

optim = ...
lrsched = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, verbose=True)
// lrsched will internally call print_lr() and print the learning rate

optim = ...
lrsched = torch.optim.lr_scheduler.ReduceLROnPlateau(optim)
print(lrsched.get_last_lr())

from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = capture_pre_autograd_graph(m, *example_inputs)
m = prepare_pt2e(m, quantizer)

from torch.export import export
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
# please get xnnpack quantizer from executorch (https://github.com/pytorch/executorch/)
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = export(m, *example_inputs)
m = prepare_pt2e(m, quantizer)

class MyPrintObserver(GraphTransformObserver):
    def on_node_creation(self, node: torch.fx.Node):
        print(node)

class MyPrintObserver(GraphTransformObserver):
    def get_node_creation_hook(self):
        def hook(node: torch.fx.Node):
            print(node)
        return hook

>>> from torch.ao.quantization.pt2e.graph_utils import get_control_flow_submodules

>>> from torch.ao.quantization.pt2e.graph_utils import get_control_flow_submodules
ImportError: cannot import name 'get_control_flow_submodules' from 'torch.ao.quantization.pt2e.graph_utils'
>>> from torch.ao.quantization.pt2e.graph_utils import _get_control_flow_submodules  # Note: Use _get_control_flow_submodules for private access

torch.onnx.dynamo_export(model, *args, **kwargs)

torch.onnx.export(model, args, kwargs=kwargs, dynamo=True)

from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = capture_pre_autograd_graph(m, *example_inputs)
m = prepare_pt2e(m, quantizer)

# we also updated the export call
from torch.export import export
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
# please get xnnpack quantizer from executorch (https://github.com/pytorch/executorch/)
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = export(m, *example_inputs)
m = prepare_pt2e(m, quantizer)

from torchao.dtypes import PlainLayout
from torchao.experimental.packed_linear_int8_dynamic_activation_intx_weight_layout import (
    PackedLinearInt8DynamicActivationIntxWeightLayout,
)
from torchao.experimental.quant_api import (
    int8_dynamic_activation_intx_weight,
)
from torchao.quantization.granularity import (
    PerGroup,
    PerRow,
)
from torchao.quantization.quant_api import quantize_
from torchao.quantization.quant_primitives import MappingType
my_model = Model()
quantize_(
    my_model,
    int8_dynamic_activation_intx_weight(
        weight_dtype=torch.int4,
        granularity=PerGroup(32), # PerRow() is also supported
        has_weight_zeros=True, # Should be True
        weight_mapping_type=MappingType.SYMMETRIC_NO_CLIPPING_ERR # MappingType.SYMMETRIC can also be used but increases error
        layout=PackedLinearInt8DynamicActivationIntxWeightLayout(target="aten"),
    ),
)

Add profiling support for codegened CPU FlexAttention kernels (#145894).
Other FlexAttention improvements: (#147765) (#147435) (#147010) (#146657) (#145059) (#144938) (#143299) (#142281) (#147918) (#148857).
Add Inductor support for non-power-of-2 cooperative RSPLIT (#145689).
Remove runtime dependency on packaging (#149125)
Add Cutlass support for runtime param choices, starting with swizzle (#147223).
Make Inductor cpp backend enable_floating_point_contract_flag take string. Previously, the only options were "on" or "off". Now the value of INDUCTOR_CPP_ENABLE_FLOATING_POINT_CONTRACT_FLAG will be passed to ffp-contract (#143450).
Add upcasting FP16/BF16 math reductions to FP32 in Triton (#141052).
Support for more types of async_compile pools. Set variable TORCHINDUCTOR_WORKER_START to one of "subprocess", "fork", or "spawn" (#144491).
Create a new benchmarker to replace Triton's do_bench (#133058).
Inplace-padding support for cpp-wrapper (#145325).
New environment variables for emulate_precision_casts: TORCHINDUCTOR_EMULATE_PRECISION_CASTS (#145948).
New environment variables to filter cutlass kernels: TORCHINDUCTOR_CUTLASS_ALLOWLIST and TORCHINDUCTOR_CUTLASS_DENYLIST (#148161).
Add option to disable runtime scalar assertions: TORCHINDUCTOR_SCALAR_ASSERTS (#146462).
Add new inductor configs to compiler bisector: layout_optimization and comprehensive_padding (#148450).
Add an option to skip optimizing generated wrapper code. Set AOT_INDUCTOR_COMPILE_WRAPPER_WITH_O0=1 (#144866).
Support dynamic shape constraints in Export (#146044).
Handle MLIR scf.yield more accurately in user Triton code (#147762).
Support Triton 3.3: add a global_scratch arg, fix cpp_wrapper (#148051, #149973).
Removed an unnecessarily struct runtime alignment assertion, allowing more flexible use cases of AOTI (#143236).
Support _int_mm in AOTI (#144571).
Support AOTI + CUDAGraphs when calling from Python (#148601).
New post grad pass to remove torch.ops.aten._assert_tensor_metadata.default for AOTI (#145028).
Support basic TorchBind in aot_compile and aoti_compile_and_package (#148506).
Add top level tlparse logging for AOTI (#147760)
Added Inductor dashboard benchmarks (#144427, #145791, #145654, #145655, #146449, #145683, #141371, #143223)
Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031)

Beta	Prototype
Torch.Compile support for Torch Function Modes	NVIDIA Blackwell Architecture Support
Mega Cache	PyTorch Native Context Parallel
	Enhancing Intel GPU Acceleration
	FlexAttention LLM first token processing on X86 CPUs
	FlexAttention LLM throughput mode optimization on X86 CPUs
	Foreach Map
	Flex Attention for Inference
	Prologue Fusion Support in Inductor

PyTorch 2.7.0 Release Notes

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

PyTorch 2.7.0 Release Notes

More Python Projects

AutoGPT

stable-diffusion-webui

transformers

Highlights

Tracked Regressions

NCCL init hits CUDA failure 'invalid argument' on 12.2 driver

Backwards Incompatible Changes

Dropped support for Triton < 2.2.0. Removed Support for CUDA 12.4, Anaconda in CI/CD.

C++ Extensions py_limited_api=True is now built with -DPy_LIMITED_API (#145764)

Change torch.Tensor.new_tensor() to be on the given Tensor's device by default (#144958)

Use Manylinux 2.28 and CXX11_ABI=1 for future released Linux wheel builds.

torch.onnx.dynamo_export now uses the ExportedProgram logic path (#137296)

Finish deprecation of LRScheduler.print_lr() along with the verbose kwarg to the LRScheduler constructor. (#147301)

libtorch_python.so symbols are now invisible by default on all platforms except Apple (#142214)

Please use torch.export.export instead of capture_pre_autograd_graph to export the model for pytorch 2 export quantization (#139505)

New interface for torch.fx.passes.graph_transform_observer.GraphTransformObserver to enable Node Level provenance tracking (#144277)

torch.ao.quantization.pt2e.graph_utils.get_control_flow_submodules is no longer public (#141612)

Deprecations

torch.onnx.dynamo_export is deprecated (#146425, #146639, #146923)

XNNPACKQuantizer is deprecated in PyTorch and moved to ExecuTorch, please use it from executorch.backends.xnnpack.quantizer.xnnpack_quantizer instead of torch.ao.quantization.quantizer.xnnpack_quantizer. (#144940)

New features

Release Engineering

Python Frontend

C++ Extensions

Distributed

Context Parallel

c10d

Distributed Checkpoint (DCP)

CUDA

MPS

ROCm

XPU

torch.compile

Dynamo

Inductor

Profiler

Quantization

ONNX

torch.onnx.verification.verify_onnx_program (#148396, #148706, #148730, #148707)

Improvements

Release Engineering

Python Frontend

Autograd

Dataloader

Linear Algebra

Nested Tensor (NJT)

torch.nn

torch.optim

Build Frontend

C++ Frontend

Distributed

c10d

DistributedDataParallel (DDP)

FullyShardedDataParallel2 (FSDP2)

DTensor

TensorParallel

Torch Elastic

Pipelining

CPU

General

x86

CUDA

MPS

ROCm

XPU

Profiler

torch.compile

Dynamo

AOTDispatcher

Dynamic Shapes

Decompositions, FakeTensor and meta tensors

Inductor

torch.fx

C++ Extensions `py_limited_api=True` is now built with `-DPy_LIMITED_API` (#145764)

Change `torch.Tensor.new_tensor()` to be on the given Tensor's device by default (#144958)

`torch.onnx.dynamo_export` now uses the ExportedProgram logic path (#137296)

Finish deprecation of `LRScheduler.print_lr()` along with the `verbose` kwarg to the LRScheduler constructor. (#147301)

Please use `torch.export.export` instead of `capture_pre_autograd_graph` to export the model for pytorch 2 export quantization (#139505)

New interface for `torch.fx.passes.graph_transform_observer.GraphTransformObserver` to enable Node Level provenance tracking (#144277)

`torch.ao.quantization.pt2e.graph_utils.get_control_flow_submodules` is no longer public (#141612)

`torch.onnx.dynamo_export` is deprecated (#146425, #146639, #146923)

`XNNPACKQuantizer` is deprecated in PyTorch and moved to ExecuTorch, please use it from `executorch.backends.xnnpack.quantizer.xnnpack_quantizer` instead of `torch.ao.quantization.quantizer.xnnpack_quantizer`. (#144940)

`torch.onnx.verification.verify_onnx_program` (#148396, #148706, #148730, #148707)