Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1107,51 +1107,51 @@ is shown in a single column in the table below.
==== Intel XMX Supported Combinations
This is currently available in devices with the architecture
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_dg2_g10`,
`architecture::intel_gpu_dg2_g11`, `architecture::intel_gpu_dg2_g12`,
`architecture::intel_gpu_arl_h`, `architecture::intel_gpu_ptl_h`, and
`architecture::intel_gpu_ptl_u`.
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_dg2_g10`, `architecture::intel_gpu_dg2_g11`,
`architecture::intel_gpu_dg2_g12`, `architecture::intel_gpu_arl_h`,
`architecture::intel_gpu_ptl_h`, and `architecture::intel_gpu_ptl_u`.

[frame="none",options="header"]
|======================
| A type | B type | C type | D type | M | N | K | device
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
`architecture::intel_gpu_arl_h`
.2+| `matrix_type::uint8` .2+| `matrix_type::sint8` .2+|
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
`architecture::intel_gpu_arl_h`
.2+| `matrix_type::sint8` .2+| `matrix_type::uint8` .2+|
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
`architecture::intel_gpu_arl_h`
.2+| `matrix_type::sint8` .2+| `matrix_type::sint8` .2+|
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
`architecture::intel_gpu_arl_h`
.8+|`matrix_type::fp16` .8+| `matrix_type::fp16` .8+|
`matrix_type::fp32` .8+|`matrix_type::fp32` .1+| 16 .1+| 16 | 16
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
.2+| 1 .2+| 64 | 16 |32
.2+| 32 .2+| 64 | 16 |32
.2+| +<=+ 8 | 16 .2+| 16
Expand All @@ -1162,28 +1162,28 @@ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
`matrix_type::fp16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16
.6+| `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
.2+| 32 .2+| 64 | 16 | 32
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
`matrix_type::fp32` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
.2+| 32 .2+| 64 |16 | 32
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
`matrix_type::fp16` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 |32 .2+| 32 .2+| 64 | 16 | 32
.8+| `matrix_type::bf16` .8+| `matrix_type::bf16` .8+|
`matrix_type::fp32` .8+| `matrix_type::fp32` | 16 | 16 | 16
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
.2+| 1 .2+| 64 | 16 | 32
.2+| 32 .2+| 64 | 16 |32
.2+| +<=+ 8 | 16 .2+| 16
Expand All @@ -1194,34 +1194,35 @@ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
`matrix_type::bf16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16 .6+|
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
.2+| 32 .2+| 64 |16 | 32
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
`matrix_type::fp32` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
.2+| 32 .2+| 64 |16 | 32
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
`matrix_type::bf16` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
.2+| 32 .2+| 64 |16 | 32
| `matrix_type::tf32` | `matrix_type::tf32` |
`matrix_type::fp32` .2+| `matrix_type::fp32` | +<=+ 8 | 16 | 8 |
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
`architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g31`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, `architecture::intel_gpu_ptl_u`
|======================

===== Restrictions on `architecture::intel_gpu_pvc`,
`architecture::intel_gpu_bmg_g21`, `architecture::intel_gpu_lnl_m`,
`architecture::intel_gpu_ptl_h`, and `architecture::intel_gpu_ptl_u`
`architecture::intel_gpu_bmg_g21`, `architecture::intel_gpu_bmg_g31`,
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
and `architecture::intel_gpu_ptl_u`

- The `stride` parameter to `joint_matrix_load` and
`joint_matrix_store` has the following restrictions:
Expand Down Expand Up @@ -1363,4 +1364,4 @@ load/store overloads
|11 |2024-04-29 |Yury Plyakhin | Add 1x64x16 supported combination for
Intel XMX (intel_gpu_pvc)
|12 |2024-06-14 |Jack Kirk | Add note on sm version device matching issue.
|======================
|======================