# Graph Conversions

Conversions are **structural graph rewrites** that normalize the computation
graph before fusion patterns are applied. They change the graph topology —
replacing one set of operators with a functionally equivalent set that
downstream patterns can match.

Conversion patterns have `is_conversion = True` and are applied **iteratively**
by `transform()` until no new matches are found.

## Built-in TensorRT conversions

### FlattenLinearToConv1x1Pattern

**Matches:** `Flatten (4D→2D) → [Dropout/Activation]* → Linear`

**Replaces with:** `[Dropout/Activation]* → Conv2d(1×1) → Flatten`

Many classification networks end with:

```
AdaptiveAvgPool2d → Flatten → Linear(1000)
```

The `Linear` layer cannot be fused with preceding operations in TensorRT.
After this conversion it becomes:

```
AdaptiveAvgPool2d → Conv2d(1×1, out=1000) → Flatten
```

The `Conv2d(1×1)` can then be matched by downstream fusion patterns
(`ConvBNPattern`, etc.) and benefits from INT8 quantization.

**Weight conversion:** The `Linear` weight matrix of shape `(out_features,
in_features)` is reshaped to `(out_features, in_features, 1, 1)` for the
equivalent `Conv2d`.

**Element-wise ops:** Activations or dropout layers between `Flatten` and
`Linear` are absorbed by a `Wildcard` and moved before the `Conv2d` in the
replacement graph.

**Affected architectures:** ResNet, ConvNeXt, EfficientNet, MobileNet, and
most classification backbones with a `Flatten → Linear` classifier head.


### RemoveIdentityAdaptiveAvgPoolPattern

**Matches:** `AdaptiveAvgPool2d` where `output_size == input spatial dims`

**Replaces with:** nothing (erases the node)

In ConvNeXt-style architectures, `AdaptiveAvgPool2d(output_size=(7, 7))` is
applied to a 7×7 feature map — a mathematical identity. Removing it simplifies
the graph and prevents it from blocking fusion of surrounding operators.

:::{note}
Requires shape metadata (via `torch.fx.passes.shape_prop.ShapeProp`) to
determine that input and output spatial dimensions match. Nodes without shape
metadata are skipped with a warning.
:::


### DecomposeMultiheadAttentionPattern

**Matches:** `nn.MultiheadAttention` (self-attention, `batch_first=True`)

**Replaces with:** three explicit modules:

1. `MHAInProjection` — the combined Q/K/V linear projection
2. `ScaledDotProductAttention` — the attention computation
3. `nn.Linear` — the output projection

PyTorch's `nn.MultiheadAttention` is a monolithic module. TensorRT cannot fuse
or quantize its internal operations. By decomposing it into visible sub-modules
in the FX graph, each component can be independently fused and quantized.

**Restrictions:** Only self-attention (`_qkv_same_embed_dim=True`,
`batch_first=True`) without masks or `add_zero_attn` is supported. Unsupported
configurations are skipped with a warning.

**Affected architectures:** Vision Transformer (ViT), DeiT, and any model
using `nn.MultiheadAttention`.


## When conversions matter

### ResNet50

The `FlattenLinearToConv1x1Pattern` converts the final classifier:

```
avgpool (AdaptiveAvgPool2d) → flatten → fc (Linear)
```
becomes:
```
avgpool (AdaptiveAvgPool2d) → fc (Conv2d 1×1) → flatten
```

The `Conv2d` is then picked up by `ConvBNPattern` during the fusion stage.

### ConvNeXt

ConvNeXt uses `AdaptiveAvgPool2d` at several points where the spatial
dimensions do not change. The `RemoveIdentityAdaptiveAvgPoolPattern` cleans
these up, and `FlattenLinearToConv1x1Pattern` converts the head.

For ConvNeXt models, shape propagation is required before running conversions:

```python
from torch.fx.passes.shape_prop import ShapeProp

graph_module = torch.fx.symbolic_trace(model)
ShapeProp(graph_module).propagate(torch.randn(1, 3, 224, 224))

result = transform(graph_module, patterns=TENSORRT_PATTERNS)
```

### Vision Transformer (ViT)

`DecomposeMultiheadAttentionPattern` expands each `nn.MultiheadAttention`
into three sub-modules. After decomposition, the fusion pass applies
`MHAInProjectionPattern` and `ScaledDotProductAttentionPattern` to wrap
them in quantization-aware fused modules. The output projection `nn.Linear`
is matched by `LinearPattern`.


## Running conversions only

```python
from embedl_deploy import transform
from embedl_deploy.tensorrt import TENSORRT_CONVERSION_PATTERNS

# Apply only structural conversions
result = transform(model, patterns=TENSORRT_CONVERSION_PATTERNS)
converted_model = result.model
```

This is useful for debugging: you can inspect the graph after conversions
to verify the structural rewrites before running fusions.