PyG 2.4.0: Model compilation, on-disk datasets, hierarchical sampling
We are excited to announce the release of PyG 2.4 🎉🎉🎉
Unclaimed project
Are you a maintainer of pytorch_geometric? Claim this project to take control of your public changelog and roadmap.
Changelog
Graph Neural Network Library for PyTorch
Last updated about 1 month ago
We are excited to announce the release of PyG 2.4 🎉🎉🎉
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Stable Diffusion web UI
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A feature-rich command-line audio/video downloader
PyG 2.4 is the culmination of work from 62 contributors who have worked on features and bug-fixes for a total of over 500 commits since torch-geometric==2.3.1.
torch.compile(dynamic=True) supportThe long wait has an end! With the release of PyTorch 2.1, PyG 2.4 now brings full support for torch.compile to graphs of varying size via the dynamic=True option, which is especially useful for use-cases that involve the usage of DataLoader or NeighborLoader. Examples and tutorials have been updated to reflect this support accordingly (#8134), and models and layers in torch_geometric.nn have been tested to produce zero graph breaks:
import torch_geometric
model = torch_geometric.compile(model, dynamic=True)
When enabling the dynamic=True option, PyTorch will up-front attempt to generate a kernel that is as dynamic as possible to avoid recompilations when sizes change across mini-batches changes. As such, you should only ever not specify dynamic=True when graph sizes are guaranteed to never change. Note that dynamic=True requires PyTorch >= 2.1.0 to be installed.
PyG 2.4 is fully compatible with PyTorch 2.1, and supports the following combinations:
| PyTorch 2.1 | cpu | cu118 | cu121 |
|--------------|-------|---------|---------|
| Linux | âś… | âś… | âś… |
| macOS | âś… | | |
| Windows | âś… | âś… | âś… |
You can still install PyG 2.4 on older PyTorch releases up to PyTorch 1.11 in case you are not eager to update your PyTorch version.
OnDiskDataset InterfaceWe added the OnDiskDataset base class for creating large graph datasets (e.g., molecular databases with billions of graphs), which do not easily fit into CPU memory at once (#8028, #8044, #8046, #8051, #8052, #8054, #8057, #8058, #8066, #8088, #8092, #8106). OnDiskDataset leverages our newly introduced Database backend (sqlite3 by default) for on-disk storage and access of graphs, supports DataLoader out-of-the-box, and is optimized for maximum performance.
OnDiskDataset utilizes a user-specified schema to store data as efficient as possible (instead of Python pickling). The schema can take int, float str, object or a dictionary with dtype and size keys (for specifying tensor data) as input, and can be nested as a dictionary. For example,
dataset = OnDiskDataset(root, schema={
'x': dict(dtype=torch.float, size=(-1, 16)),
'edge_index': dict(dtype=torch.long, size=(2, -1)),
'y': float,
})
creates a database with three columns, where x and edge_index are stored as binary data, and y is stored as a float.
Afterwards, you can append data to the OnDiskDataset and retrieve data from it via dataset.append()/dataset.extend(), and dataset.get()/dataset.multi_get(), respectively. We added a fully working example on how to set up your own OnDiskDataset here (#8102). You can also convert in-memory dataset instances to an OnDiskDataset instance by running InMemoryDataset.to_on_disk_dataset() (#8116).
One drawback of NeighborLoader is that it computes a representations for all sampled nodes at all depths of the network. However, nodes sampled in later hops no longer contribute to the node representations of seed nodes in later GNN layers, thus performing useless computation. NeighborLoader will be marginally slower since we are computing node embeddings for nodes we no longer need. This is a trade-off we have made to obtain a clean, modular and experimental-friendly GNN design, which does not tie the definition of the model to its utilized data loader routine.
With PyG 2.4, we introduced the option to eliminate this overhead and speed-up training and inference in mini-batch GNNs further, which we call "Hierarchical Neighborhood Sampling" (see here for the full tutorial) (#6661, #7089, #7244, #7425, #7594, #7942). Its main idea is to progressively trim the adjacency matrix of the returned subgraph before inputting it to each GNN layer, and works seamlessly across several models, both in the homogeneous and heterogeneous graph setting. To support this trimming and implement it effectively, the NeighborLoader implementation in PyG and in pyg-lib additionally return the number of nodes and edges sampled in each hop, which are then used on a per-layer basis to trim the adjacency matrix and the various feature matrices to only maintain the required amount (see the trim_to_layer method):
class GNN(torch.nn.Module):
def __init__(self, in_channels: int, out_channels: int, num_layers: int):
super().__init__()
self.convs = ModuleList([SAGEConv(in_channels, 64)])
for _ in range(num_layers - 1):
self.convs.append(SAGEConv(hidden_channels, hidden_channels))
self.lin = Linear(hidden_channels, out_channels)
def forward(
self,
x: Tensor,
edge_index: Tensor,
num_sampled_nodes_per_hop: List[int],
num_sampled_edges_per_hop: List[int],
) -> Tensor:
for i, conv in enumerate(self.convs):
# Trim edge and node information to the current layer `i`.
x, edge_index, _ = trim_to_layer(
i, num_sampled_nodes_per_hop, num_sampled_edges_per_hop,
x, edge_index)
x = conv(x, edge_index).relu()
return self.lin(x)
Additionally, we added support for weighted/biased sampling in NeighborLoader/LinkNeighborLoader scenarios. For this, simply specify your edge_weight attribute during NeighborLoader initialization, and PyG will pick up these weights to perform weighted/biased sampling (#8038):
data = Data(num_nodes=5, edge_index=edge_index, edge_weight=edge_weight)
loader = NeighborLoader(
data,
num_neighbors=[10, 10],
weight_attr='edge_weight',
)
batch = next(iter(loader))
As part of our algorithm and documentation sprints (#7892), we have added:
MixHopConv: “MixHop: Higher-Order Graph Convolutional Architecturesvia Sparsified Neighborhood Mixing” (examples/mixhop.py) (#8025)LCMAggregation: “Learnable Commutative Monoids for Graph Neural Networks” (examples/lcm_aggr_2nd_min.py) (#7976, #8020, #8023, #8026, #8075)DirGNNConv: “Edge Directionality Improves Learning on Heterophilic Graphs” (examples/dir_gnn.py) (#7458)Performer in GPSConv: “Recipe for a General, Powerful, Scalable Graph Transformer” (examples/graph_gps.py) (#7465)PMLP: “Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs” (examples/pmlp.py) (#7470, #7543)RotateE: “RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space” (examples/kge_fb15k_237.py) (#7026)NeuralFingerprint: “Convolutional Networks on Graphs for Learning Molecular Fingerprints” (#7919)HM (#7515), BrcaTcga (#7994), MyketDataset (#7959), Wikidata5M (#7864), OSE_GVCS (#7811), (#7479), (#7483), (#7442), (#7441), (#7398), (#7011), (#8112), (#8102)CaptumExplainer (examples/captum_explainer_hetero_link.py) (#7096)LightGCN on AmazonBook for recommendation () (#7603)Join our Slack here if you're interested in joining community sprints in the future!
Data.keys() is now a method instead of a property (#7629):
| <=2.3 | 2.4 |
|---|---|
| |
FastHGTConv in favor of HGTConv (#7117)layer_type argument from GraphMaskExplainer (#7445)dest argument to dst in utils.geodesic_distance (#7708)contrib.explain.GraphMaskExplainer in favor of explain.algorithm.GraphMaskExplainer (#7779)Data and HeteroData improvements
HeteroData.validate() (#7995)HeteroData support in to_networkx (#7713)Data.sort() and HeteroData.sort() (#7649)HeteroData.to_homogeneous() in case feature dimensionalities do not match (#7374)torch.nested_tensor support in Data and Batch (#7643, #7647)keep_inter_cluster_edges option to ClusterData to support inter-subgraph edge connections when doing graph partitioning (#7326)Data-loading improvements
Dataset, e.g., dataset[:0.9] (#7915)save and load methods to InMemoryDataset (#7250, #7413)IBMBNodeLoader and IBMBBatchLoader data loaders (#6230)HyperGraphData to support hypergraphs (#7611)CachedLoader (#7896, #7897)NodeLoader and LinkLoader (#7572)PrefetchLoader capabilities (#7376, #7378, #7383)NodeLoader and LinkLoader (#7197)Better support for sparse tensors
SparseTensor support to WLConvContinuous, GeneralConv, PDNConv and ARMAConv (#8013)torch_sparse.SparseTensor logic to utilize torch.sparse_csr instead (#7041)torch.sparse.Tensor in DataLoader (#7252)torch.jit.script within MessagePassing layers without torch_sparse being installed (#7061, #7062)torch.sparse.Tensor (#7037)Data.num_edges for native torch.sparse.Tensor adjacency matrices (#7104)cross_entropy implementation (#7447, #7466)Integration with 3rd-party libraries
torch_geometric.transforms
HalfHop graph upsampling augmentation (#7827)Cartesian, LocalCartesian and Distance transformations (#7533, #7614, #7700)add_pad_mask argument to the Pad transform (#7339)NodePropertySplit transformation for creating node-level splits using structural node properties (#6894)AddRemainingSelfLoops transformation (#7192)HeteroConv for layers that have a non-default argument order, e.g., GCN2Conv (#8166)ModuleDict and ParameterDict (#8163)DynamicBatchSampler.__len__ to raise an error in case num_steps is undefined (#8137)DimeNet models (#8019)batch.e_id was not correctly computed on unsorted graph inputs (#7953)from_networkx conversion from nx.stochastic_block_model graphs (#7941)bias_initializer in HeteroLinear (#7923)HGBDataset (#7907)SetTransformerAggregation produced NaN values for isolates nodes (#7902)summary on modules with uninitialized parameters (#7884)add_self_loops for a dynamic number of nodes (#7330)PNAConv.get_degree_histogram (#7830)edge_label_time when using temporal sampling on homogeneous graphs (#7807)edge_label_index computation in LinkNeighborLoader for the homogeneous+disjoint mode (#7791)CaptumExplainer for binary classification tasks (#7787)HeteroData (#7714)get_mesh_laplacian for normalization="sym" (#7544)dim_size to initialize output size of the EquilibriumAggregation layer (#7530)SparseTensor (#7519)scaler tensor in GeneralConv to the correct device (#7484)HeteroLinear bug when used via mixed precision (#7473)utils.spmm (#7428)QuantileAggregation when dim_size is passed (#7407)LightGCN.recommendation_loss() to only use the embeddings of the nodes involved in the current mini-batch (#7384)to_hetero_with_bases (#7363)node_default and edge_default attributes in from_networkx (#7348)HGTConv utility function _construct_src_node_feat (#7194)subgraph on unordered inputs (#7187)HeteroDictLinear (#7185)numpy incompatiblity when reading files for Planetoid datasets (#7141)CaptumExplainer to be called multiple times in a row (#7391)AddLaplacianEigenvectorPE for small-scale graphs (#8143)top_k computation in TopKPooling (#7737)GIN implementation in benchmarks to apply sequential batch normalization (#7955)QM9 data pre-processing to include the SMILES string (#7867)training flag in to_hetero modules (#7772)add_random_edge to only add true negative edges (#7654)BasicGNN models in DeepGraphInfomax (#7648)num_edges parameter to the forward method of HypergraphConv (#7560)max_num_elements parameter to the forward method of GraphMultisetTransformer, GRUAggregation, LSTMAggregation, SetTransformerAggregation and (#7529, #7367)ClusterLoader to integrate pyg-lib METIS routine (#7416)filter_per_worker option will not get automatically inferred by default based on the device of the underlying data (#7399)fill_value as a torch.tensor to utils.to_dense_batch (#7367)NeighborLoader instead of NeighborSampler (#7152)batch_size argument to avg_pool_x and max_pool_x (#7216)from_networkx memory footprint by reducing unnecessary copies (#7119)batch_size argument to LayerNorm, GraphNorm, InstanceNorm, GraphSizeNorm and PairNorm (#7135)MultiAggregation (#7077)HeterophilousGraphDataset are now undirected by default (#7065)batch_size and max_num_nodes arguments to MemPooling layer (#7239)Full Changelog: https://github.com/pyg-team/pytorch_geometric/compare/2.3.0...2.4.0
MovieLens1Mexamples/lightgcn.pyFeatureStore (examples/kuzu) (#7298)ogbn-papers100M (examples/papers100m_multigpu.py) (#7921)OGC model on Cora (examples/ogc.py) (#8168)graphlearn-for-pytorch (examples/distributed/graphlearn_for_pytorch) (#7402)SortAggregation