Discussion about this post

User's avatar
Rainbow Roxy's avatar

Hey, great read as always, it’s fascinating how you break this down, but I was wondering if you could elabroate a bit on the actual masking implementation that stopps the model from “peeking” at future tokens, as that always strikes me as the clever bit.

1 more comment...

No posts

Ready for more?