Skip to content

Conversation

@christiangnrd
Copy link
Member

@christiangnrd christiangnrd commented Sep 30, 2025

The goal is to allow for kernels to be written without relying on KernelAbstractions macros

See #562 for initial discussion

@vchuravy @maleadt

@github-actions
Copy link
Contributor

github-actions bot commented Sep 30, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/intrinsics.jl b/src/intrinsics.jl
index e0b2c46a..2025bf5e 100644
--- a/src/intrinsics.jl
+++ b/src/intrinsics.jl
@@ -159,7 +159,6 @@ end
 function _print end
 
 
-
 """
     Kernel{Backend, Kern}
 

@christiangnrd christiangnrd force-pushed the intrinsics branch 3 times, most recently from c16a665 to b166baa Compare October 2, 2025 19:58
@vchuravy
Copy link
Member

vchuravy commented Oct 6, 2025

Can you rebase?

@christiangnrd

This comment was marked as outdated.

@vchuravy vchuravy changed the title More KernelIntrinsics KernelIntrinsics API Oct 6, 2025
@vchuravy vchuravy added this to the 0.10.0 milestone Oct 6, 2025
@vchuravy vchuravy mentioned this pull request Oct 6, 2025
@vchuravy vchuravy linked an issue Oct 6, 2025 that may be closed by this pull request
@vchuravy
Copy link
Member

vchuravy commented Oct 6, 2025

@christiangnrd do you think we need a lower-level kernel launch interface? Otherwise the three-dimensional indices would be superflous.

@christiangnrd
Copy link
Member Author

christiangnrd commented Oct 6, 2025

do you think we need a lower-level kernel launch interface?

I'd been thinking about that and I think so. Would something that, assuming you wrote the whole kernel with KernelIntrinsics, could be used interchangeably with the backend's kernel launch macro (@cuda, @metal, etc.) make sense/be feasible?

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Maybe we can also add some simple shuffle operations?

Comment on lines 107 to 125
"""
barrier()
After a `barrier()` call, all read and writes to global and local memory
from each thread in the workgroup are visible in from all other threads in the
workgroup.
!!! note
`barrier()` must be encountered by all workitems of a work-group executing the kernel or by none at all.
!!! note
Backend implementations **must** implement:
```
@device_override barrier()
```
"""
function barrier()
error("Group barrier used outside kernel or not captured")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify the memory semantics that are required by the barrier implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we should specify the semantics in the error message or that the semantics described in the docstring be more specific?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics in the docs string should be more specific

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return quote
$SharedMemory($(esc(T)), Val($(esc(dims))), Val($(QuoteNode(id))))
$SharedMemory($(esc(T)), Val($(esc(dims))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically an ABI break, which I had avoided so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ca11482 sufficient or do we still need to require backends to take in a third unused argument on the KI side?

@vchuravy vchuravy requested a review from maleadt November 6, 2025 19:43
@@ -0,0 +1,342 @@
module KernelIntrinsics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe add a little module-level docstring elaborating the positioning of KernelIntrinsics vs. the KernelAbstractions intrinsics? AFAIU, everything here operating in the launch domain vs. KA.jl returning positions in the domain of the array or problem being launched over (i.e. ndrange).

Copy link
Member Author

@christiangnrd christiangnrd Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thrown up something. Very open to feedback

@christiangnrd
Copy link
Member Author

I'm thinking we should either directly rename KernelIntrinsics to KI or provide the alias.

@vchuravy
Copy link
Member

vchuravy commented Nov 7, 2025

I am okay with providing the alias!

@vchuravy
Copy link
Member

vchuravy commented Nov 7, 2025

One additional thing to consider is what intrinsics do we need to implement something like #559

Can we add some primitive shuffles? Or add a test kernel that implements a reduction correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lower-level kernel form?

3 participants