<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Into AI]]></title><description><![CDATA[Helping you become a 100X AI Engineer]]></description><link>https://www.intoai.pub</link><image><url>https://substackcdn.com/image/fetch/$s_!Ea4T!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png</url><title>Into AI</title><link>https://www.intoai.pub</link></image><generator>Substack</generator><lastBuildDate>Thu, 21 May 2026 06:05:29 GMT</lastBuildDate><atom:link href="https://www.intoai.pub/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Dr. Ashish Bamania]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[intoai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[intoai@substack.com]]></itunes:email><itunes:name><![CDATA[Dr. Ashish Bamania]]></itunes:name></itunes:owner><itunes:author><![CDATA[Dr. Ashish Bamania]]></itunes:author><googleplay:owner><![CDATA[intoai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[intoai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Dr. Ashish Bamania]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Cross-Entropy Loss in LLMs, Explained Visually]]></title><description><![CDATA[A visual guide to understand how LLMs are trained using the cross-entropy loss, step by step.]]></description><link>https://www.intoai.pub/p/cross-entropy-loss-in-llms-explained</link><guid isPermaLink="false">https://www.intoai.pub/p/cross-entropy-loss-in-llms-explained</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Wed, 20 May 2026 11:43:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!J7Df!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J7Df!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset image2-full-screen"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J7Df!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!J7Df!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!J7Df!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!J7Df!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J7Df!,w_5760,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;full&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2939462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/198384175?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-fullscreen" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J7Df!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!J7Df!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!J7Df!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!J7Df!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7020d920-024f-4c8b-b53c-065e3a714b5a_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLMs are trained to predict next tokens well. </p><p>Given all the classes/ tokens in the vocabulary, an LLM is trained to pick the right one at each training step. This is essentially a <strong><a href="https://en.wikipedia.org/wiki/Multiclass_classification">Multi-class classification problem</a></strong> in machine learning.</p><p>The <strong>Categorical Cross-entropy loss </strong>(or simply<strong> </strong>the<strong> Cross-entropy loss</strong>) is used to train an LLM to solve this multi-class classification problem.</p>
      <p>
          <a href="https://www.intoai.pub/p/cross-entropy-loss-in-llms-explained">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[This Week In AI Research (10-16 May 26) 🗓️]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-10-16-may</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-10-16-may</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Mon, 18 May 2026 09:52:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!X9D4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X9D4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X9D4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!X9D4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!X9D4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!X9D4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X9D4!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:2845198,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X9D4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!X9D4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!X9D4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!X9D4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F761f6ce9-b092-45ae-b601-f9eab4699ade_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>10. Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling</h3><p>This research paper introduces a simple method for converting a post-trained reasoning LLM into a rigorous Olympiad-level solver.</p><p>The method first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y8JH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y8JH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 424w, https://substackcdn.com/image/fetch/$s_!y8JH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 848w, https://substackcdn.com/image/fetch/$s_!y8JH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 1272w, https://substackcdn.com/image/fetch/$s_!y8JH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y8JH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png" width="1456" height="664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:664,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:699746,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y8JH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 424w, https://substackcdn.com/image/fetch/$s_!y8JH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 848w, https://substackcdn.com/image/fetch/$s_!y8JH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 1272w, https://substackcdn.com/image/fetch/$s_!y8JH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b4d9c5a-a717-474f-8280-b1462883f24c_2792x1274.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The authors train a 30B model, <strong>SU-01</strong>, using this method, which reasons on difficult problems with trajectories exceeding 100K tokens, achieving gold-medal-level scores on IMO 2025 and USAMO 2026.</p><p>This model also shows strong generalization of scientific reasoning to domains beyond mathematics and physics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!URAK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!URAK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 424w, https://substackcdn.com/image/fetch/$s_!URAK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 848w, https://substackcdn.com/image/fetch/$s_!URAK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 1272w, https://substackcdn.com/image/fetch/$s_!URAK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!URAK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png" width="1456" height="738" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:738,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:289867,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!URAK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 424w, https://substackcdn.com/image/fetch/$s_!URAK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 848w, https://substackcdn.com/image/fetch/$s_!URAK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 1272w, https://substackcdn.com/image/fetch/$s_!URAK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F411ca401-f340-4313-90dd-9d10921df6a6_2286x1158.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.13301">using this link</a>.</p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#127828; <a href="https://www.intoai.pub/p/what-makes-deekseek-v4-so-good">What makes DeekSeek-V4 so good?</a></p></li><li><p>&#127829; <a href="https://www.intoai.pub/p/diffusion-llms-explained-simply">Diffusion LLMs, Explained Simply</a></p></li><li><p>&#127839; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p>&#129391; <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts?utm_source=publication-search">Build and Train a Mixture-of-Experts (MoE) LLM from scratch</a></p></li><li><p>&#127849; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127827; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><div><hr></div><h3>9. Is Grep All You Need? How Agent Harnesses Reshape Agentic Search</h3><p>This research paper examines how different search tools impact LLM agents when handling retrieval-heavy tasks. </p><p>The authors conducted two experiments:</p><ul><li><p>Compare grep and vector retrieval on a 116-question sample from LongMemEval, using a custom agent harness (Chronos) and provider-native CLI harnesses (Claude Code, Codex, and Gemini CLI)</p></li><li><p>Compare grep-only and vector-only retrieval while progressively mixing in additional unrelated conversation history, so that each query is embedded in more distracting material alongside the passages that matter. </p></li></ul><p>Across Chronos and the provider CLIs, grep generally leads to higher accuracy than vector retrieval in the first experiment. </p><p>At the same time, overall scores still depend heavily on the agent harness and how the tool results are presented to the model, even when the underlying conversation data are the same.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i3-I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i3-I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 424w, https://substackcdn.com/image/fetch/$s_!i3-I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 848w, https://substackcdn.com/image/fetch/$s_!i3-I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!i3-I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i3-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png" width="1456" height="851" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:851,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:409967,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!i3-I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 424w, https://substackcdn.com/image/fetch/$s_!i3-I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 848w, https://substackcdn.com/image/fetch/$s_!i3-I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!i3-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdad60a93-a91c-46b2-a514-f7affe0585a7_2020x1180.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.15184">using this link</a>.</p><div><hr></div><h3>8. &#948;-mem: Efficient Online Memory for Large Language Models</h3><p>&#948;-mem is a lightweight online memory system for LLMs that lets them reuse past information without expanding the context window or retraining the entire model. </p><p>&#948;-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the LLM&#8217;s attention computation during generation. </p><p>With only an 8 &#215; 8 online memory state, &#948;-mem improves the average score to 1.10&#215; that of the LLM backbone and 1.15&#215; that of the strongest non-&#948;-mem memory baseline.</p><p>It achieves larger gains on memory-heavy benchmarks, reaching 1.31&#215; on <a href="https://github.com/HUST-AI-HYZ/MemoryAgentBench">MemoryAgentBench</a> and 1.20&#215; on <a href="https://snap-research.github.io/locomo/">LoCoMo</a>, while largely preserving general capabilities.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e_-z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e_-z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 424w, https://substackcdn.com/image/fetch/$s_!e_-z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 848w, https://substackcdn.com/image/fetch/$s_!e_-z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!e_-z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e_-z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png" width="1456" height="803" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:803,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:455520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e_-z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 424w, https://substackcdn.com/image/fetch/$s_!e_-z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 848w, https://substackcdn.com/image/fetch/$s_!e_-z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!e_-z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3c7f3a3-7b9f-45f0-8f7d-06f748d73635_2316x1278.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.12357">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>7. <strong>FutureSim: Replaying World Events to Evaluate Adaptive Agents</strong></h3><p>This research paper introduces <strong>FutureSim</strong>, a benchmark for testing whether AI agents can learn over time by replaying real-world news events in order and predicting future outcomes beyond their knowledge limits.</p><p>Agents can only use the news available up to each simulated date, update predictions as new information comes in, and are scored based on accuracy and Brier skill score. </p><p>The results show that the best agent&#8217;s accuracy is 25%, and many have worse Brier skill scores than making no prediction at all.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TqP_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TqP_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 424w, https://substackcdn.com/image/fetch/$s_!TqP_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 848w, https://substackcdn.com/image/fetch/$s_!TqP_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 1272w, https://substackcdn.com/image/fetch/$s_!TqP_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TqP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png" width="1456" height="706" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:706,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:505197,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TqP_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 424w, https://substackcdn.com/image/fetch/$s_!TqP_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 848w, https://substackcdn.com/image/fetch/$s_!TqP_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 1272w, https://substackcdn.com/image/fetch/$s_!TqP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4602660-1c98-4a72-9f6c-d26510ddf871_2418x1172.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.15188">using this link</a>.</p><div><hr></div><h3>6. World Action Models</h3><p>World Action Models (WAMs) are the next step beyond Vision-Language-Action robot policies that, instead of directly mapping observations and language to actions, model future world states and actions together, giving embodied agents a form of predictive physical foresight.</p><p>This survey is the first systematic account of the WAMs landscape, clarifying key architectural paradigms and their trade-offs, and identifying open challenges and future opportunities for this rapidly evolving field.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y_4L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y_4L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 424w, https://substackcdn.com/image/fetch/$s_!Y_4L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 848w, https://substackcdn.com/image/fetch/$s_!Y_4L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 1272w, https://substackcdn.com/image/fetch/$s_!Y_4L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y_4L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png" width="1456" height="861" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:861,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:683743,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y_4L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 424w, https://substackcdn.com/image/fetch/$s_!Y_4L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 848w, https://substackcdn.com/image/fetch/$s_!Y_4L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 1272w, https://substackcdn.com/image/fetch/$s_!Y_4L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf386ad-c347-4879-9b97-9658a58d4f9b_2218x1312.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.12090">using this link</a>.</p><div><hr></div><h3>5. The Truth Lies Somewhere in the Middle (of the Generated Tokens)</h3><p>The research paper suggests that, for autoregressive LLMs, the best semantic representation is often not the final generated token or a prompt token, but the mean-pooled hidden states across the generated tokens.</p><p>This finding is quantified through kernel alignment to reference spaces in language, vision, and protein domains, where results indicate that improvements from mean pooling are consistent with information being distributed across generated tokens rather than localized to a single position.</p><p>Furthermore, representations coming from generated tokens outperform those from prompt tokens.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N37C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N37C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 424w, https://substackcdn.com/image/fetch/$s_!N37C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 848w, https://substackcdn.com/image/fetch/$s_!N37C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 1272w, https://substackcdn.com/image/fetch/$s_!N37C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N37C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png" width="1354" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e101f071-7449-409a-a143-0ae903ef3c50_1354x816.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1354,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:393363,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N37C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 424w, https://substackcdn.com/image/fetch/$s_!N37C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 848w, https://substackcdn.com/image/fetch/$s_!N37C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 1272w, https://substackcdn.com/image/fetch/$s_!N37C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe101f071-7449-409a-a143-0ae903ef3c50_1354x816.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.09969">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>4. MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs</h3><p>The research paper presents <strong>MetaBackdoor</strong>, a new type of LLM backdoor attack that relies on positional information, particularly input or conversation length, as the trigger instead of suspicious words or hidden characters. </p><p>This means a model might act normally with ordinary text but switch to a harmful mode once the prompt or chat history reaches a particular length. This switch could lead to leaking system prompts, private context, or making harmful tool calls. </p><p>While current backdoor defenses typically search for odd input content, this attack shows that even clean, semantically normal inputs can trigger hidden behavior via the model&#8217;s positional encoding.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jrP4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jrP4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 424w, https://substackcdn.com/image/fetch/$s_!jrP4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 848w, https://substackcdn.com/image/fetch/$s_!jrP4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 1272w, https://substackcdn.com/image/fetch/$s_!jrP4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jrP4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png" width="1294" height="780" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:1294,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:255759,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jrP4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 424w, https://substackcdn.com/image/fetch/$s_!jrP4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 848w, https://substackcdn.com/image/fetch/$s_!jrP4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 1272w, https://substackcdn.com/image/fetch/$s_!jrP4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F830305d4-0c2b-48ae-ba0d-a5c2808ce613_1294x780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.15172">using this link</a>.</p><div><hr></div><h3>3.LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling</h3><p>This research paper presents <strong>AutoTTS</strong>, a framework in which LLM agents autonomously discover better test-time scaling strategies, rather than relying on humans. </p><p>AutoTTS views test-time scaling as a controller search problem based on pre-collected reasoning traces. In this way, strategies can be tested cheaply without needing to call the LLM repeatedly. </p><p>Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy&#8211;cost tradeoff over strong manually designed baselines. </p><p>The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!54u7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!54u7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 424w, https://substackcdn.com/image/fetch/$s_!54u7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 848w, https://substackcdn.com/image/fetch/$s_!54u7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!54u7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!54u7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png" width="1456" height="624" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:624,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:635635,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!54u7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 424w, https://substackcdn.com/image/fetch/$s_!54u7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 848w, https://substackcdn.com/image/fetch/$s_!54u7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!54u7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91a276c8-e8c3-455e-af44-647072dda009_2526x1082.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.08083">using this link</a>.</p><div><hr></div><h3>2. VGGT-&#937;</h3><p>The research paper presents <strong>VGGT-&#937;</strong>, a larger and more efficient 3D reconstruction model for static and dynamic scenes. </p><p>It builds on the <a href="https://arxiv.org/abs/2503.11651">Visual Geometry Grounded Transformer (VGGT)</a> by simplifying the design, using compact registers to share scene information across frames, and reducing training memory to about 30% of the previous model. </p><p>This allows it to train on significantly more data and achieve better reconstruction and camera estimation, including a reported 77% improvement in camera accuracy on <a href="https://sintel.is.tue.mpg.de/">Sintel</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eSHi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eSHi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 424w, https://substackcdn.com/image/fetch/$s_!eSHi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 848w, https://substackcdn.com/image/fetch/$s_!eSHi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!eSHi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eSHi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png" width="1456" height="732" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:732,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1191672,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eSHi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 424w, https://substackcdn.com/image/fetch/$s_!eSHi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 848w, https://substackcdn.com/image/fetch/$s_!eSHi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!eSHi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7dbbe3-e8f3-45c4-b0c2-e4dd171b59be_2460x1236.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.15195">using this link</a>.</p><div><hr></div><h3>1. ELF: Embedded Language Flows</h3><p>The paper introduces <strong>ELF (Embedded Language Flows)</strong>, a new class of diffusion-based language model that generates text primarily in continuous embedding space instead of directly using discrete tokens. </p><p>It employs Flow Matching to clean up embeddings, removing noise and turning them into clear language representations, only converting them back into tokens at the final step. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k_BV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k_BV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 424w, https://substackcdn.com/image/fetch/$s_!k_BV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 848w, https://substackcdn.com/image/fetch/$s_!k_BV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 1272w, https://substackcdn.com/image/fetch/$s_!k_BV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k_BV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png" width="1456" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:368234,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k_BV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 424w, https://substackcdn.com/image/fetch/$s_!k_BV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 848w, https://substackcdn.com/image/fetch/$s_!k_BV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 1272w, https://substackcdn.com/image/fetch/$s_!k_BV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01185ff-bc51-48e2-8144-bef788d4a869_2280x560.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>ELF surpasses previous discrete and continuous diffusion language models, achieving better generation quality with fewer sampling steps and using 10X fewer training tokens.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Iesj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Iesj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 424w, https://substackcdn.com/image/fetch/$s_!Iesj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 848w, https://substackcdn.com/image/fetch/$s_!Iesj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 1272w, https://substackcdn.com/image/fetch/$s_!Iesj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Iesj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png" width="1456" height="265" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:265,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159720,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197974392?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Iesj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 424w, https://substackcdn.com/image/fetch/$s_!Iesj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 848w, https://substackcdn.com/image/fetch/$s_!Iesj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 1272w, https://substackcdn.com/image/fetch/$s_!Iesj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1be4afdc-4046-40e4-9cb7-514a91d4b4b9_2228x406.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/pdf/2605.10938">using this link</a>.</p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#127828; <a href="https://www.intoai.pub/p/what-makes-deekseek-v4-so-good">What makes DeekSeek-V4 so good?</a></p></li><li><p>&#127829; <a href="https://www.intoai.pub/p/diffusion-llms-explained-simply">Diffusion LLMs, Explained Simply</a></p></li><li><p>&#127839; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p>&#129391; <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts?utm_source=publication-search">Build and Train a Mixture-of-Experts (MoE) LLM from scratch</a></p></li><li><p>&#127849; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127827; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p>]]></content:encoded></item><item><title><![CDATA[What makes DeekSeek-V4 so good?]]></title><description><![CDATA[If Uber had used DeepSeek-V4 instead of Claude, their 2026 AI budget would have lasted 7 years rather than only 4 months!]]></description><link>https://www.intoai.pub/p/what-makes-deekseek-v4-so-good</link><guid isPermaLink="false">https://www.intoai.pub/p/what-makes-deekseek-v4-so-good</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Fri, 15 May 2026 17:02:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!83mq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!83mq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!83mq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 424w, https://substackcdn.com/image/fetch/$s_!83mq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 848w, https://substackcdn.com/image/fetch/$s_!83mq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!83mq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!83mq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png" width="1456" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:288596,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!83mq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 424w, https://substackcdn.com/image/fetch/$s_!83mq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 848w, https://substackcdn.com/image/fetch/$s_!83mq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!83mq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f8a877-47be-477c-8325-5e4dfe62c2c1_2376x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>DeepSeek recently released the preview of two models in its V4 series:</p><ul><li><p><strong>DeepSeek-V4-Pro</strong> with 1.6T parameters (49B activated during inference)</p></li><li><p><strong>DeepSeek-V4-Flash</strong> with 284B parameters (13B activated during inference)</p></li></ul><p>Both models are <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts">Mixtures-of-Experts (MoE) models</a> and support a context length of&nbsp;1 million tokens.</p><p>These models use multiple architectural and optimization updates, with the four main ones being:</p><ol><li><p>A new hybrid Attention architecture that uses <strong>Compressed Sparse Attention (CSA)</strong> and <strong>Heavily Compressed Attention (HCA)</strong> to reduce the memory bottleneck</p></li><li><p><strong><a href="https://arxiv.org/abs/2512.24880">Manifold-Constrained Hyper-Connections (mHC)</a> </strong>to<strong> </strong>strengthen conventional residual connections, which improve the stability of signal propagation across the layers while preserving model expressivity</p></li><li><p><strong><a href="https://kellerjordan.github.io/posts/muon/">Muon Optimizer</a> </strong>for faster convergence and greater training stability</p></li></ol><p>These major updates (alongside many others) make these models highly efficient. DeepSeek-V4-Pro requires only 27% of the single-token inference FLOPs&nbsp;and&nbsp;10% of the KV cache&nbsp;compared with&nbsp;<a href="https://arxiv.org/abs/2512.02556">DeepSeek-V3.2</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!is-6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!is-6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 424w, https://substackcdn.com/image/fetch/$s_!is-6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 848w, https://substackcdn.com/image/fetch/$s_!is-6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!is-6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!is-6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png" width="1456" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:928315,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!is-6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 424w, https://substackcdn.com/image/fetch/$s_!is-6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 848w, https://substackcdn.com/image/fetch/$s_!is-6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!is-6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54db41a0-5ff4-4645-846b-16c691cfda00_2692x1400.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Benchmark performance, inference FLOPs, and KV cache size of DeepSeek-V4 and baselines (<a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf">Source</a>)</figcaption></figure></div><p>And according to the <a href="https://x.com/sdrzn/status/2047537442719428969">following post on X</a>, if Uber had used DeepSeek instead of Claude, their 2026 AI budget would have lasted 7 years rather than only 4 months. That&#8217;s incredibly efficient!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uox_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uox_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 424w, https://substackcdn.com/image/fetch/$s_!uox_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 848w, https://substackcdn.com/image/fetch/$s_!uox_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 1272w, https://substackcdn.com/image/fetch/$s_!uox_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uox_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png" width="1456" height="851" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:851,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:386715,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uox_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 424w, https://substackcdn.com/image/fetch/$s_!uox_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 848w, https://substackcdn.com/image/fetch/$s_!uox_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 1272w, https://substackcdn.com/image/fetch/$s_!uox_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2345084-0925-4408-a70b-ca5c2b32c468_1626x950.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>The magic of Mixture-of-Experts</h3><p>In an LLM, the feed-forward network (FFN) in the Transformer block is often the most computationally demanding and holds the largest number of parameters.</p><p>This is why a dense feed-forward network is frequently replaced by a Mixture-of-Experts (MoE) feed-forward network. Mixture-of-Experts network improves computational efficiency by activating only a subset of expert modules during inference.</p><p>DeepSeek models <a href="https://arxiv.org/abs/2412.19437">from V2 onwards</a> (including V4) use <strong>DeepSeekMoE</strong>, which splits the FFN into many smaller experts and routes each token to only a few of them (Routed Experts), while keeping some experts active for all tokens (Shared Experts). This significantly reduces its computational cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1_gX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1_gX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 424w, https://substackcdn.com/image/fetch/$s_!1_gX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 848w, https://substackcdn.com/image/fetch/$s_!1_gX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 1272w, https://substackcdn.com/image/fetch/$s_!1_gX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1_gX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png" width="1456" height="477" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:477,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:173949,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1_gX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 424w, https://substackcdn.com/image/fetch/$s_!1_gX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 848w, https://substackcdn.com/image/fetch/$s_!1_gX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 1272w, https://substackcdn.com/image/fetch/$s_!1_gX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F314cc92c-f266-41e4-8eee-11f8350864e7_1966x644.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Architecture of DeepSeekMoE (<a href="https://arxiv.org/abs/2412.19437">Source</a>)</figcaption></figure></div><p>But given that V4 supports a 1M-token context window (compared to V3.2&#8217;s 128K), the attention mechanism becomes a significant bottleneck at inference.</p><p>This is where the new attention architecture innovated by DeepSeek-V4 shines! V4 models use a hybrid attention mechanism combining:</p><ul><li><p>Compressed Sparse Attention (CSA)</p></li><li><p>Heavily Compressed Attention (HCA)</p></li></ul><p>To understand how these attention mechanisms work and improve V4's efficiency so well, we will have to understand them from the ground up. Let&#8217;s start with the very basics and build from there, step by step.</p><div><hr></div><h3>What is Attention?</h3><p>The <a href="https://www.intoai.pub/p/self-attention">(Self) Attention mechanism</a> in a Transformer-based LLM helps the model understand how each word/ token relates to every other word/ token in a text sequence.</p><p>Given the embedding of each token:</p><ul><li><p>It is first projected into Query (Q), Key (K), and Value (V) vectors.</p></li><li><p>The query vector is compared (technically via the dot product) with keys generated from itself and from previous tokens. This comparison produces attention scores that measure how closely the tokens are related to this one.</p></li><li><p>The scores are scaled, <a href="https://www.intoai.pub/p/causal-mha">masked to block a token from attending to future tokens</a>, passed through softmax, and then used to compute a weighted sum of the value vectors.</p></li><li><p>This weighted sum is the token&#8217;s context-aware representation, called <strong>Masked and Scaled Dot-Product Self-Attention</strong>.</p></li></ul><p>Since every token is attending to itself and all the previous tokens, for an input sequence length of N, the attention mechanism processing it has an O(<code>N</code><sup>2</sup>) computational complexity. This quadratic complexity isn&#8217;t ideal when processing long text sequences.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ih4R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ih4R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 424w, https://substackcdn.com/image/fetch/$s_!Ih4R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 848w, https://substackcdn.com/image/fetch/$s_!Ih4R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 1272w, https://substackcdn.com/image/fetch/$s_!Ih4R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ih4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png" width="728" height="272" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:544,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ih4R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 424w, https://substackcdn.com/image/fetch/$s_!Ih4R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 848w, https://substackcdn.com/image/fetch/$s_!Ih4R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 1272w, https://substackcdn.com/image/fetch/$s_!Ih4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ab64779-95e1-4f62-9e31-93492fd702ab_2522x942.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Calculations taking place in Self-attention</figcaption></figure></div><div><hr></div><h3>What is <strong>Multi-head </strong>Attention?</h3><p>To improve performance, instead of performing attention operations once, these are done several times in parallel using different learned projections of the same token embeddings. </p><p>Each unit of attention calculation in this case is called a <strong>Head</strong>, and the mechanism is called <strong><a href="https://www.intoai.pub/p/mha?utm_source=publication-search">Multi-head Attention (MHA)</a></strong>.</p><p>Each attention head has its own Query, Key, and Value projections, and each head computes masked self-attention independently. The outputs of all heads are then concatenated into a single representation. </p><p>The use of multiple heads helps capture different kinds of semantic relationships between the tokens.</p><p>There are two efficient types of MHA that reduce the number of&nbsp;Key/Value&nbsp;projections stored and used during inference. These are:</p><ul><li><p><strong><a href="https://www.intoai.pub/p/multi-query-attention">Multi-Query Attention (MQA)</a></strong>, where each attention head has its own Query, but all Query heads share the same Key and Value.</p></li><li><p><strong><a href="https://www.intoai.pub/p/grouped-query-attention">Grouped-Query Attention (GQA)</a></strong>, where Queries are split into groups, and each group of Query heads shares one Key and Value head.</p></li></ul><p>Both MQA and GQA make inference faster and cheaper, but at the expense of generation quality.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b-yB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b-yB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 424w, https://substackcdn.com/image/fetch/$s_!b-yB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 848w, https://substackcdn.com/image/fetch/$s_!b-yB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 1272w, https://substackcdn.com/image/fetch/$s_!b-yB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b-yB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png" width="1456" height="554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:554,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:233353,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b-yB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 424w, https://substackcdn.com/image/fetch/$s_!b-yB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 848w, https://substackcdn.com/image/fetch/$s_!b-yB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 1272w, https://substackcdn.com/image/fetch/$s_!b-yB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a9467c-1c69-4905-a210-0497fdd7dfb2_2476x942.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MHA vs. GQA vs MQA (<a href="https://arxiv.org/abs/2305.13245">Source</a>)</figcaption></figure></div><p>Attention mechanisms are made much more efficient using a KV cache.</p><div><hr></div><h3>KV cache to the rescue</h3><p>Note that at each step of text generation/ inference, we need to calculate the K and V vectors for all previous tokens. </p><p>How about we store the&nbsp;Key&nbsp;and&nbsp;Value&nbsp;vectors from previous tokens, and reuse them later? That&#8217;s exactly what a KV cache does.</p><p>When using a KV cache:</p><ul><li><p>For each generated token, we store its K and V vectors in the cache.</p></li><li><p>Then, in the next token generation step, we compute only the new token&#8217;s Q, K, and V.</p></li><li><p>Next, we reuse the cached KV of previous tokens, with the new token&#8217;s Q attending to both the cached KV and its newly calculated KV. This saves on lots of compute during inference.</p></li></ul><div><hr></div><h3>Towards Multi-head Latent Attention</h3><p>Released in 2024, <a href="https://arxiv.org/abs/2405.04434">DeepSeek-V2</a> introduced an efficient version of attention called <strong><a href="https://www.intoai.pub/p/multi-head-latent-attention-is-the">Multi-head Latent Attention (MLA)</a></strong> that improves KV cache by not caching full Keys and Values. Instead, it stores a much smaller latent representation (one latent KV entry per token) and reconstructs the full KV information from it when needed.</p><p>This reduces the memory required to store the full KV cache, making the inference much faster. MLA is therefore also used in the later model, <a href="https://arxiv.org/abs/2412.19437v2">DeepSeek-V3</a>.</p><p>Check out the following illustration to see how MLA compares to the other attention variants.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Fvr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Fvr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 424w, https://substackcdn.com/image/fetch/$s_!2Fvr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 848w, https://substackcdn.com/image/fetch/$s_!2Fvr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 1272w, https://substackcdn.com/image/fetch/$s_!2Fvr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Fvr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png" width="728" height="207" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:414,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:277329,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!2Fvr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 424w, https://substackcdn.com/image/fetch/$s_!2Fvr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 848w, https://substackcdn.com/image/fetch/$s_!2Fvr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 1272w, https://substackcdn.com/image/fetch/$s_!2Fvr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e15a3e-859f-45f3-beb0-8aebbb88a0b0_2734x778.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">MHA vs GQA vs MQA vs MLA (<a href="https://arxiv.org/abs/2405.04434">Source</a>)</figcaption></figure></div><p>The complete architecture of MLA with the latent KV representation is shown below. I have also written a deep dive post on how it works, which <a href="https://www.intoai.pub/p/multi-head-latent-attention-is-the">you can read here</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RHbk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RHbk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 424w, https://substackcdn.com/image/fetch/$s_!RHbk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 848w, https://substackcdn.com/image/fetch/$s_!RHbk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 1272w, https://substackcdn.com/image/fetch/$s_!RHbk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RHbk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/acbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:189780,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RHbk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 424w, https://substackcdn.com/image/fetch/$s_!RHbk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 848w, https://substackcdn.com/image/fetch/$s_!RHbk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 1272w, https://substackcdn.com/image/fetch/$s_!RHbk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbd720a-a38c-4e95-8f7a-4211bf3c8278_1594x834.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Architecture of Multi-head Latent Attention (MLA) (<a href="https://arxiv.org/abs/2405.04434">Source</a>)</figcaption></figure></div><div><hr></div><h3>Sliding Window to improve Attention further</h3><p>Till now, we have learned how:</p><ul><li><p>KV caching helps reduce the compute and memory requirements for attention at inference</p></li><li><p>Building on this and using a latent representation of the KV cache in MLA helps further reduce the memory requirements</p></li></ul><p>There&#8217;s another way to reduce the compute and memory requirements of attention. This is by reducing the number of previous tokens that a token needs to attend to, which brings us to the <strong>Sliding window attention (SWA)</strong>.</p><p>It was <a href="https://arxiv.org/abs/2004.05150">introduced in the &#8216;</a><strong><a href="https://arxiv.org/abs/2004.05150">Longformer&#8217;</a></strong><a href="https://arxiv.org/abs/2004.05150"> paper</a> in 2020. In SWA, each token attends only to a fixed-size window of previous tokens, rather than all previous ones.</p><p>We previously discussed how the full Attention has an O(<code>N</code><sup>2</sup>) computational complexity, as every token attends to itself and all the previous tokens, for an input sequence length of <code>N</code>.</p><p>With SWA, for an input sequence of <code>N</code> and window size (fixed-sized window of previous tokens that a token attends to) of <code>w</code>, the computational complexity goes down to O(<code>Nw</code>).</p><p>Although it is efficient, since each token attends to a window of nearby tokens, some long-range information in language modeling is lost. Hence, it is often used in combination with:</p><ul><li><p>Full attention layers</p></li><li><p>Global window (where some tokens are allowed to attend to the entire sequence of previous tokens)</p></li></ul><p>SWA can also use a dilated window pattern, where each token attends to nearby previous tokens but skips some positions in a regular pattern. This helps a token attend to more distant previous tokens at roughly the same compute cost.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!azAY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!azAY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 424w, https://substackcdn.com/image/fetch/$s_!azAY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 848w, https://substackcdn.com/image/fetch/$s_!azAY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 1272w, https://substackcdn.com/image/fetch/$s_!azAY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!azAY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png" width="1456" height="333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:333,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:192637,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196789511?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!azAY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 424w, https://substackcdn.com/image/fetch/$s_!azAY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 848w, https://substackcdn.com/image/fetch/$s_!azAY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 1272w, https://substackcdn.com/image/fetch/$s_!azAY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F637ee5cf-ee88-4ade-9d46-1bd8fc1fb3af_2746x628.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Full attention vs Sliding window attention and its variants (<a href="https://arxiv.org/abs/2004.05150">Source</a>)</figcaption></figure></div><p>A popular example is the <a href="https://arxiv.org/pdf/2508.10925">GPT-oss family of models</a>, which uses SWA with a window size of 128 tokens in alternating layers with full attention (GQA with a configuration of 64 query heads and 8 KV heads).</p><div><hr></div><h3>Moving to DeepSeek Sparse Attention</h3>
      <p>
          <a href="https://www.intoai.pub/p/what-makes-deekseek-v4-so-good">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[This Week In AI Research (3-9 May 26) 🗓️]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-3-9-may</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-3-9-may</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Mon, 11 May 2026 12:44:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AkcT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AkcT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AkcT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 424w, https://substackcdn.com/image/fetch/$s_!AkcT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 848w, https://substackcdn.com/image/fetch/$s_!AkcT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 1272w, https://substackcdn.com/image/fetch/$s_!AkcT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AkcT!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png" width="1200" height="526.6483516483516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:639,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:801536,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AkcT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 424w, https://substackcdn.com/image/fetch/$s_!AkcT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 848w, https://substackcdn.com/image/fetch/$s_!AkcT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 1272w, https://substackcdn.com/image/fetch/$s_!AkcT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F670d459b-4039-4108-ad76-6f57fc2f65ea_2692x1182.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. ProgramBench: Can Language Models Rebuild Programs From Scratch?</h3><p>ProgramBench is a benchmark by Meta researchers that measures the ability of software engineering agents to develop software holistically.</p><p>Given only a program and its documentation, agents must architect and implement a codebase that matches the reference executable&#8217;s behavior. </p><p>Evaluation of 9 frontier LLMs across 200 tasks, ranging from compact CLI tools to widely used software such as FFmpeg, SQLite, and the PHP interpreter, shows that none fully resolve any task, and the best model passes 95% of tests on only 3% of tasks. </p><p>Models favor monolithic, single-file implementations, which are very different from how humans write code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cRUT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cRUT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 424w, https://substackcdn.com/image/fetch/$s_!cRUT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 848w, https://substackcdn.com/image/fetch/$s_!cRUT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!cRUT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cRUT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png" width="1456" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:524643,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cRUT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 424w, https://substackcdn.com/image/fetch/$s_!cRUT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 848w, https://substackcdn.com/image/fetch/$s_!cRUT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!cRUT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01469ca2-1fe9-4b4b-ad95-324d1c74f33f_2650x1164.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.03546">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>2. AI Co-Mathematician</h3><p>This research from Google DeepMind introduces the AI co-mathematician, a workbench that helps mathematicians use AI agents for open-ended research.</p><p>The AI co-mathematician is optimized for mathematical workflows, ranging from idea generation and literature searches to computational exploration, theorem proving, and theory building. </p><p>Early tests show the AI co-mathematician helped researchers solve open mathematical problems, find new research directions, and discover overlooked literature references.</p><p>Alongside this, the AI co-mathematician scored 48% on <a href="https://epoch.ai/benchmarks/frontiermath-tier-4">FrontierMath Tier 4</a>, a hard problem-solving benchmark. This is the highest score of all AI systems evaluated.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IQuK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IQuK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 424w, https://substackcdn.com/image/fetch/$s_!IQuK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 848w, https://substackcdn.com/image/fetch/$s_!IQuK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 1272w, https://substackcdn.com/image/fetch/$s_!IQuK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IQuK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png" width="1456" height="748" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:748,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:483088,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IQuK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 424w, https://substackcdn.com/image/fetch/$s_!IQuK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 848w, https://substackcdn.com/image/fetch/$s_!IQuK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 1272w, https://substackcdn.com/image/fetch/$s_!IQuK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9437ed18-c740-4985-a756-699a72d7d83b_2562x1316.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.06651">using this link</a>.</p><div><hr></div><h3>3. Continuous Latent Diffusion Language Model</h3><p>This research paper presents <strong>Cola DLM</strong>, a continuous latent diffusion language model that generates text by compressing global semantic structure into a continuous latent space (using a Text VAE) and decoding it into text (using a block-causal DiT).</p><p>Cola DLM proves effective and shows strong scaling behavior for text generation across 4 research questions, 8 benchmarks, strictly matched ~2B-parameter autoregressive and LLaDA baselines, and scaling curves up to about 2000 EFLOPs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yorq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yorq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 424w, https://substackcdn.com/image/fetch/$s_!yorq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 848w, https://substackcdn.com/image/fetch/$s_!yorq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!yorq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yorq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png" width="1456" height="705" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:705,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:939944,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yorq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 424w, https://substackcdn.com/image/fetch/$s_!yorq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 848w, https://substackcdn.com/image/fetch/$s_!yorq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!yorq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11195352-aaa1-41dc-882d-251c6e353043_2680x1298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.06548">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>4. Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior</h3><p>This research paper finds that the geometry of a model&#8217;s internal activations causally shapes its outputs rather than merely correlating with them. </p><p>The authors introduce <strong>Manifold steering</strong>, where interventions move along the curved manifold of natural activations rather than in a straight Euclidean direction.</p><p>Across language-model reasoning tasks, in-context learning tasks, and a video world model, steering along the activation manifold produces output behaviors that stay on the model&#8217;s natural &#8220;behavior manifold.&#8221; In contrast, ordinary linear steering often moves through unnatural off-manifold areas, resulting in distorted outputs.</p><p>Researchers also show that this relationship works in reverse. Optimizing for desired behavioral trajectories recovers curved activation paths that align with the internal manifold. </p><p>This means that reliable model control should focus less on finding a single steering direction and more on understanding and following the model&#8217;s underlying representational geometry.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5F2N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5F2N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 424w, https://substackcdn.com/image/fetch/$s_!5F2N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 848w, https://substackcdn.com/image/fetch/$s_!5F2N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 1272w, https://substackcdn.com/image/fetch/$s_!5F2N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5F2N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png" width="1456" height="952" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:952,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:477007,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5F2N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 424w, https://substackcdn.com/image/fetch/$s_!5F2N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 848w, https://substackcdn.com/image/fetch/$s_!5F2N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 1272w, https://substackcdn.com/image/fetch/$s_!5F2N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aa47e7d-0dd5-4892-81cd-810a7bce33e5_1768x1156.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.05115">using this link</a>.</p><div><hr></div><h3>5.MolmoAct2</h3><p>This research paper presents <strong>MolmoAct2</strong>, a fully open VLA robot control model designed for real-world deployment and use.</p><p>It builds on <strong>MolmoAct</strong> with a new embodied-reasoning backbone, <strong>Molmo2-ER</strong>, which is trained on 3.3 million samples with a specialize-then-rehearse recipe.</p><p>The release also includes:</p><ul><li><p>Three new robotics datasets</p></li><li><p><strong>MolmoAct2-FAST Tokenizer</strong>, an open-weight, open-data action tokenizer trained on millions of trajectories across five embodiments</p></li><li><p>A new KV-conditioned VLA architecture </p></li><li><p><strong>MolmoAct2-Think</strong>, an adaptive-depth reasoning variant that maintains spatial reasoning while massively reducing latency</p></li></ul><p>MolmoAct2 outperforms strong baselines, including &#960;0.5, while Molmo2-ER surpasses GPT-5 and Gemini Robotics ER-1.5 across 13 embodied-reasoning benchmarks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oGLI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oGLI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 424w, https://substackcdn.com/image/fetch/$s_!oGLI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 848w, https://substackcdn.com/image/fetch/$s_!oGLI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!oGLI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oGLI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png" width="1456" height="700" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:700,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1908226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oGLI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 424w, https://substackcdn.com/image/fetch/$s_!oGLI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 848w, https://substackcdn.com/image/fetch/$s_!oGLI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!oGLI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15dd33cc-e651-4a5d-9ed9-d7bf7e472592_2682x1290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.02881v2">using this link</a>.</p><div><hr></div><h3>6. Model Spec Midtraining</h3><p>The research paper presents <strong>Model Spec Midtraining (MSM)</strong>, a method that improves how alignment training generalizes by teaching a model its intended behavior rules before fine-tuning. </p><p>Instead of relying only on examples of aligned behavior, MSM trains models on synthetic documents that explain the Model Spec or constitution. This approach allows later demonstrations to be interpreted through that framework. </p><p>The authors demonstrate that the same fine-tuning data can generalize in very different ways based on the specific spec taught during mid-training. They show that MSM can reduce safety-related failures. For instance, agentic misalignment in Qwen3-32B decreases from 54% to 7%, which surpasses a deliberative alignment baseline of 14%.</p><p>The authors also show that specs perform better when they clarify the values behind the rules and provide specific guidance rather than vague suggestions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XX_3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XX_3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 424w, https://substackcdn.com/image/fetch/$s_!XX_3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 848w, https://substackcdn.com/image/fetch/$s_!XX_3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!XX_3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XX_3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png" width="1456" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:510635,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XX_3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 424w, https://substackcdn.com/image/fetch/$s_!XX_3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 848w, https://substackcdn.com/image/fetch/$s_!XX_3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!XX_3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586b09ac-a7ee-480a-90ec-18af8403c32a_2622x1206.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.02087">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>7. Mamoda2.5</h3><p>The research paper presents <strong>Mamoda2.5</strong>, a unified AR&#8211;Diffusion framework that integrates multimodal understanding and generation within a single architecture.</p><p>The model&#8217;s Diffusion Transformer backbone uses a fine-grained Mixture-of-Experts (MoE) architecture (128 experts, Top-8 routing), resulting in a 25B-parameter model that activates only 3B parameters, significantly reducing training costs while scaling up model capacity.</p><p>Mamoda2.5 achieves top-tier generation performance on VBench 2.0 and sets a new record in video editing quality, beating evaluated open-source models and matching the performance of current top-tier proprietary models, including the Kling O1 on OpenVE-Bench.</p><p>Compared to open-source baselines, Mamoda2.5 achieves up to 95.9&#215; faster video editing inference. This is due to a joint few-step distillation and reinforcement learning framework that compresses the 30-step editing model into a 4-step model, greatly accelerating model inference.</p><p>In real-world applications, Mamoda2.5 has been successfully deployed for content moderation and creative restoration tasks in advertising scenarios, achieving a 98% success rate in an internal advertising video editing scenario.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DhY2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DhY2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 424w, https://substackcdn.com/image/fetch/$s_!DhY2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 848w, https://substackcdn.com/image/fetch/$s_!DhY2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 1272w, https://substackcdn.com/image/fetch/$s_!DhY2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DhY2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png" width="1456" height="872" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:872,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:446460,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DhY2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 424w, https://substackcdn.com/image/fetch/$s_!DhY2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 848w, https://substackcdn.com/image/fetch/$s_!DhY2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 1272w, https://substackcdn.com/image/fetch/$s_!DhY2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d5b7e-7827-434c-bdef-eda7a104b8bf_1894x1134.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.02641v1">using this link</a>.</p><div><hr></div><h3>8. Superintelligent Retrieval Agent</h3><p>This research paper introduces <strong>SuperIntelligent Retrieval Agent (SIRA)</strong>, a retrieval agent that replaces slow multi-round searches with one effective, corpus-aware lexical query. </p><p>Instead of simply adding related words to a user query, SIRA combines an LLM with document-frequency statistics to select terms that help differentiate likely evidence from misleading corpus matches. </p><p>It improves documents offline by adding missing vocabulary, predicts terms that could serve as evidence but are not included in the query, filters out weak or overly common terms, and then performs a single weighted BM25 retrieval call combining the original query with the validated expansion.</p><p>In tests across 10 BEIR and QA benchmarks, SIRA outperformed dense retrievers and multi-round retrieval systems while remaining training-free, efficient, and interpretable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Mye!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Mye!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 424w, https://substackcdn.com/image/fetch/$s_!2Mye!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 848w, https://substackcdn.com/image/fetch/$s_!2Mye!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 1272w, https://substackcdn.com/image/fetch/$s_!2Mye!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Mye!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png" width="1456" height="886" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:886,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:384151,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Mye!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 424w, https://substackcdn.com/image/fetch/$s_!2Mye!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 848w, https://substackcdn.com/image/fetch/$s_!2Mye!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 1272w, https://substackcdn.com/image/fetch/$s_!2Mye!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57440ba4-12f4-476d-9e9c-74073c8a216d_1932x1176.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.06647">using this link</a>.</p><div><hr></div><h3>9. Nonsense Helps</h3><p>This research paper presents <strong>Lorem Perturbation for Exploration (LoPE)</strong>, a simple RL training technique for reasoning models that adds random Lorem Ipsum-style &#8220;nonsense&#8221; text to the prompt and resamples when GRPO gets stuck because all the sampled answers to a difficult question fail. </p><p>These irrelevant changes to the prompt can lead the model down different reasoning paths and improve exploration more than just resampling the original prompt. </p><p>Experiments across 1.7B, 4B, and 7B models show that LoPE significantly outperforms resampling with the original prompts. Further analysis reveals that other Latin-based random sequences with low perplexity are also effective perturbations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qRST!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qRST!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 424w, https://substackcdn.com/image/fetch/$s_!qRST!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 848w, https://substackcdn.com/image/fetch/$s_!qRST!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 1272w, https://substackcdn.com/image/fetch/$s_!qRST!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qRST!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png" width="1456" height="759" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:759,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:856361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qRST!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 424w, https://substackcdn.com/image/fetch/$s_!qRST!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 848w, https://substackcdn.com/image/fetch/$s_!qRST!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 1272w, https://substackcdn.com/image/fetch/$s_!qRST!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16f7c410-55f5-4ef3-a498-7433f85ec8f8_2434x1268.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.05566">using this link</a>.</p><div><hr></div><h3>10. Retrieval from Within</h3><p>This research presents <strong>INTRA (INTrinsic Retrieval via Attention)</strong>, a RAG-style approach in which an attention-based encoder-decoder retrieves evidence from its precomputed internal representations rather than relying on a separate external retriever. </p><p>The decoder attention queries score the encoded evidence chunks, which are then reused directly as context for generation. </p><p>INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality on question-answering benchmarks. </p><p>This shows that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BEIZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BEIZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 424w, https://substackcdn.com/image/fetch/$s_!BEIZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 848w, https://substackcdn.com/image/fetch/$s_!BEIZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 1272w, https://substackcdn.com/image/fetch/$s_!BEIZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BEIZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png" width="1456" height="738" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:738,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:452902,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/197085688?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BEIZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 424w, https://substackcdn.com/image/fetch/$s_!BEIZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 848w, https://substackcdn.com/image/fetch/$s_!BEIZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 1272w, https://substackcdn.com/image/fetch/$s_!BEIZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F740e04e9-d793-4f72-bd71-10ed6dd28909_2540x1288.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.05806">using this link</a>.</p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter and 100x your AI engineering skills.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gK6Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" width="1456" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[This Week In AI Research (26 April - 2 May 26) 🗓️]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-26-april</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-26-april</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Wed, 06 May 2026 15:21:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3Pn3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3Pn3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3Pn3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 424w, https://substackcdn.com/image/fetch/$s_!3Pn3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 848w, https://substackcdn.com/image/fetch/$s_!3Pn3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 1272w, https://substackcdn.com/image/fetch/$s_!3Pn3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3Pn3!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png" width="1200" height="525" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:637,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1492803,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3Pn3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 424w, https://substackcdn.com/image/fetch/$s_!3Pn3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 848w, https://substackcdn.com/image/fetch/$s_!3Pn3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 1272w, https://substackcdn.com/image/fetch/$s_!3Pn3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78daef9f-4375-494d-a9a6-a4785df90e2e_2596x1136.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. Thinking with Visual Primitives</h3><p>This research paper from DeepSeek introduces <strong>Thinking with Visual Primitives</strong>, a new method that enables multimodal LLMs to reason using visual markers, such as points and bounding boxes. Since reasoning with just language produces poor performance on precise spatial tasks, this technique lets models &#8220;point&#8221; to objects or areas as they reason. </p><p>The framework is built on a highly optimized architecture with extreme visual efficiency. It achieves frontier performance on challenging visual reasoning tasks such as counting, maze navigation, and path tracing, matching or exceeding models like GPT-5.4, Claude Sonnet 4.6, and Gemini-3-Flash.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aaKK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aaKK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 424w, https://substackcdn.com/image/fetch/$s_!aaKK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 848w, https://substackcdn.com/image/fetch/$s_!aaKK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 1272w, https://substackcdn.com/image/fetch/$s_!aaKK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aaKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png" width="1456" height="415" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:436914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aaKK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 424w, https://substackcdn.com/image/fetch/$s_!aaKK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 848w, https://substackcdn.com/image/fetch/$s_!aaKK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 1272w, https://substackcdn.com/image/fetch/$s_!aaKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127ffba5-891e-4f7e-85a0-ef2e2cd26fec_2504x714.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://huggingface.co/datasets/NodeLinker/deepseek-ai-Thinking-with-Visual-Primitives-deleted-repo/blob/main/Thinking_with_Visual_Primitives.pdf">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>2. Recursive Multi-Agent Systems</h3><p>This research paper introduces <strong>Recursive MAS</strong>, a framework in which multiple AI agents collaborate by passing and refining latent states rather than just communicating via text. </p><p>The framework treats the entire agent team as a loop for recursive computation using a lightweight <em>RecursiveLink</em> module and an inner-outer loop learning algorithm for iterative whole-system co-optimization through shared gradient-based credit assignment across recursion rounds. </p><p>In 9 benchmarks across math, science, medicine, search, and coding, the authors report an average accuracy rise of 8.3%, an end-to-end inference speedup of 1.2 to 2.4 times, and a token reduction of 34.6% to 75.6% compared to strong baselines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8N-R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8N-R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 424w, https://substackcdn.com/image/fetch/$s_!8N-R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 848w, https://substackcdn.com/image/fetch/$s_!8N-R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 1272w, https://substackcdn.com/image/fetch/$s_!8N-R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8N-R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png" width="1456" height="668" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:668,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:449968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8N-R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 424w, https://substackcdn.com/image/fetch/$s_!8N-R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 848w, https://substackcdn.com/image/fetch/$s_!8N-R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 1272w, https://substackcdn.com/image/fetch/$s_!8N-R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4933ee4-170e-4ba6-a76c-74f968027238_2724x1250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.25917">using this link</a>.</p><div><hr></div><h3>3. Agentic Harness Engineering</h3><p>This research paper introduces <strong>Agentic Harness Engineering (AHE)</strong>, a system that automatically improves the harness around coding agents, rather than relying on manual tuning.</p><p>AHE observes when agents fail, summarizes long trajectories into usable evidence, suggests harness edits, and verifies if the predicted benefits of each edit actually work. </p><p>Results show that after 10 iterations, it boosts pass@1 performance on Terminal-Bench 2  from 69.7% to 77.0%. This improvement surpasses human-designed harnesses in Codex-CLI (71.9%) and other self-evolving systems such as ACE and TF-GRPO.</p><p>The improved harness functions well without needing retraining. It uses fewer tokens on SWE-bench-verified tasks and provides significant performance gains on Terminal-Bench 2 across various model families.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OfRx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OfRx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 424w, https://substackcdn.com/image/fetch/$s_!OfRx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 848w, https://substackcdn.com/image/fetch/$s_!OfRx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 1272w, https://substackcdn.com/image/fetch/$s_!OfRx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OfRx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png" width="1456" height="749" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:749,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:389513,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OfRx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 424w, https://substackcdn.com/image/fetch/$s_!OfRx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 848w, https://substackcdn.com/image/fetch/$s_!OfRx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 1272w, https://substackcdn.com/image/fetch/$s_!OfRx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feee9a3f7-82fc-4285-b5d1-289087ca5e82_2426x1248.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.25850">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>4. Incompressible Knowledge Probe</h3><p>This research paper presents <strong>Incompressible Knowledge Probes (IKPs)</strong>, a benchmark of 1,400 obscure factual questions used to estimate the hidden parameter count of black-box LLMs. </p><p>The idea is that storing <code>F</code> facts requires at least <code>F/(bits per parameter)</code> weights, so measuring how much a model knows lower-bounds how many parameters it has. </p><p>The author tests IKP accuracy on 89 open-weight models and finds a strong log-linear connection with parameter count. Then, the author applies this to estimate the effective knowledge capacity and parameter count of proprietary models.</p><p>The author also shows that, while reasoning benchmarks may saturate, the factual capacity of models still scales steadily with total parameters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FpRc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FpRc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 424w, https://substackcdn.com/image/fetch/$s_!FpRc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 848w, https://substackcdn.com/image/fetch/$s_!FpRc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 1272w, https://substackcdn.com/image/fetch/$s_!FpRc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FpRc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png" width="1268" height="1284" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1284,&quot;width&quot;:1268,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:354947,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FpRc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 424w, https://substackcdn.com/image/fetch/$s_!FpRc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 848w, https://substackcdn.com/image/fetch/$s_!FpRc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 1272w, https://substackcdn.com/image/fetch/$s_!FpRc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa0ac1f7-b844-4219-9cc9-1b9baf2fe46b_1268x1284.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.24827">using this link</a>.</p><div><hr></div><h3>5. <strong>MotuBrain</strong></h3><p>This research paper introduces <strong>MotuBrain</strong>, a unified robot &#8220;world action model&#8221; that learns how the world changes and what actions a robot should take. </p><p>Instead of creating separate systems for policy learning, video prediction, inverse dynamics, and action generation, MotuBrain integrates them into a single architecture.</p><p>It uses multimodal data, including video-only, task-agnostic, and cross-embodiment robot data, and introduces unified multiview modeling, an independent text stream for stronger language-action coupling, a shared cross-embodiment action representation, and an efficient post-training and deployment recipe for long-horizon real-world control.</p><p>It also employs heavy inference optimization to operate in real time, achieving up to 11 Hz inference and a more than 50x speedup over the baseline. </p><p>MotuBrain achieves 95.8% and 96.1% average success on RoboTwin 2.0 under clean and randomized settings, respectively, attains the strongest reported EWMScore in WorldArena comparison, and adapts to new humanoid embodiments with only 50-100 trajectories.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F4Lu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F4Lu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 424w, https://substackcdn.com/image/fetch/$s_!F4Lu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 848w, https://substackcdn.com/image/fetch/$s_!F4Lu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 1272w, https://substackcdn.com/image/fetch/$s_!F4Lu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F4Lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png" width="1456" height="972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:605400,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F4Lu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 424w, https://substackcdn.com/image/fetch/$s_!F4Lu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 848w, https://substackcdn.com/image/fetch/$s_!F4Lu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 1272w, https://substackcdn.com/image/fetch/$s_!F4Lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de91d03-1218-474e-a1aa-7c7aea36a997_1990x1328.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.27792v2">using this link</a>.</p><div><hr></div><h3><strong>6. Latent-GRPO</strong></h3><p>This research paper introduces<strong> Latent-GRPO</strong>,  a reinforcement-learning algorithm that helps LLMs reason in the latent space rather than producing lengthy chain-of-thought text.</p><p>Applying GRPO directly to latent reasoning can be unstable, but Latent-GRPO addresses these issues by combining:</p><ul><li><p>Invalid sample advantage masking</p></li><li><p>One-sided noise sampling</p></li><li><p>Optimal correct-path first-token selection</p></li></ul><p>Latent-GRPO improves Pass@1 scores on both easy and hard reasoning benchmarks and uses reasoning chains that are about 3-4x shorter than those of explicit GRPO.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J5kn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J5kn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 424w, https://substackcdn.com/image/fetch/$s_!J5kn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 848w, https://substackcdn.com/image/fetch/$s_!J5kn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 1272w, https://substackcdn.com/image/fetch/$s_!J5kn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J5kn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png" width="1456" height="430" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:430,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191557,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J5kn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 424w, https://substackcdn.com/image/fetch/$s_!J5kn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 848w, https://substackcdn.com/image/fetch/$s_!J5kn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 1272w, https://substackcdn.com/image/fetch/$s_!J5kn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaae1d14-3888-4251-9c3b-21cd1be6d844_2588x764.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.27998">using this link</a>.</p><div><hr></div><h3><strong>7. Contextual Agentic Memory is a Memo, Not True Memory</strong></h3><p>The authors of this research paper argue that current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory but rather perform lookups.</p><p>They allow agents to retrieve past notes, but they do not update the model&#8217;s weights or develop lasting expertise. The authors claim this creates a generalization ceiling, where agents can reuse similar past cases but struggle with genuinely new combinations that require internalized rules. </p><p>They also warn that persistent external memory increases security risks because poisoned or injected content can keep resurfacing in future sessions. </p><p>Useful agent memory should combine fast external recall with slower weight-based consolidation, similar to how humans turn experiences into expertise.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n_2B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n_2B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 424w, https://substackcdn.com/image/fetch/$s_!n_2B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 848w, https://substackcdn.com/image/fetch/$s_!n_2B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 1272w, https://substackcdn.com/image/fetch/$s_!n_2B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n_2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png" width="1456" height="299" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:299,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:178677,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n_2B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 424w, https://substackcdn.com/image/fetch/$s_!n_2B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 848w, https://substackcdn.com/image/fetch/$s_!n_2B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 1272w, https://substackcdn.com/image/fetch/$s_!n_2B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff313cdf9-fe48-4aa1-a6aa-cec49ee9fad7_2804x576.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.27707v1">using this link</a>.</p><div><hr></div><h3><strong>8. Tuna-2</strong></h3><p>This research paper presents <strong>Tuna-2</strong>, a unified multimodal model that replaces the typical pretrained vision encoder and VAE with simple raw pixel patch embeddings. This enables image understanding and generation within a single end-to-end system, thereby avoiding mismatches between separate visual representations. </p><p>Tuna-2 achieves state-of-the-art performance in multimodal benchmarks, demonstrating that unified pixel-space modeling can fully compete with latent-space approaches for high-quality image generation.</p><p>Its encoder-free design achieves stronger multimodal understanding at scale, particularly for tasks that require fine-grained visual perception. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wFp_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wFp_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 424w, https://substackcdn.com/image/fetch/$s_!wFp_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 848w, https://substackcdn.com/image/fetch/$s_!wFp_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!wFp_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wFp_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png" width="1456" height="593" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:593,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:546020,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wFp_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 424w, https://substackcdn.com/image/fetch/$s_!wFp_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 848w, https://substackcdn.com/image/fetch/$s_!wFp_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!wFp_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a56836-309a-4b9b-aff3-30f87c386fdb_2858x1164.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.24763">using this link</a>.</p><div><hr></div><h3><strong>9. GLM-5V-Turbo</strong></h3><p>This research paper presents GLM-5V-Turbo, Z.AI&#8217;s first multimodal coding foundation model, built for vision-based coding tasks. </p><p>It can natively process multimodal inputs such as images, video, and text, excels at long-horizon planning, complex coding, and action execution, and is deeply optimized for agent workflows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1pRg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1pRg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 424w, https://substackcdn.com/image/fetch/$s_!1pRg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 848w, https://substackcdn.com/image/fetch/$s_!1pRg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 1272w, https://substackcdn.com/image/fetch/$s_!1pRg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1pRg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png" width="1456" height="1007" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1007,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Description&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Description" title="Description" srcset="https://substackcdn.com/image/fetch/$s_!1pRg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 424w, https://substackcdn.com/image/fetch/$s_!1pRg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 848w, https://substackcdn.com/image/fetch/$s_!1pRg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 1272w, https://substackcdn.com/image/fetch/$s_!1pRg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd144d558-8d91-42e6-b226-06fb04397e76_2326x1608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://www.arxiv.org/abs/2604.26752">using this link</a>.</p><div><hr></div><h3><strong>10. Let ViT Speak: Generative Language-Image Pre-training</strong></h3><p>The research paper presents <strong>Generative Language-Image Pre-training (GenLIP)</strong>, a minimalist method to pretrain Vision Transformers for multimodal LLMs. </p><p>GenLIP trains a ViT to predict language tokens directly from visual tokens using a standard language modeling objective, without contrastive batch construction or an additional text decoder.</p><p>This offers three main advantages: </p><ol><li><p>A single transformer jointly models visual and textual tokens</p></li><li><p>It scales effectively with both data and model size</p></li><li><p>It achieves competitive or superior results across diverse multimodal benchmarks. </p></li></ol><p>Trained on 8B samples from Recap-DataComp-1B, GenLIP matches or surpasses strong baselines despite using substantially less pretraining data. </p><p>After continued pretraining on multi-resolution images at native aspect ratios, GenLIP further improves on detail-sensitive tasks such as OCR and chart understanding, making it a strong foundation for vision encoders in MLLMs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lWYJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lWYJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 424w, https://substackcdn.com/image/fetch/$s_!lWYJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 848w, https://substackcdn.com/image/fetch/$s_!lWYJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 1272w, https://substackcdn.com/image/fetch/$s_!lWYJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lWYJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png" width="1456" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:336152,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/196364301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lWYJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 424w, https://substackcdn.com/image/fetch/$s_!lWYJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 848w, https://substackcdn.com/image/fetch/$s_!lWYJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 1272w, https://substackcdn.com/image/fetch/$s_!lWYJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee324419-9b72-4bb5-8afd-d7a8f30afcf5_2186x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2605.00809">using this link</a>.</p><div><hr></div><p>Share this article with others and <strong>earn some referral rewards</strong>. &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/this-week-in-ai-research-26-april?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/this-week-in-ai-research-26-april?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Join the <strong>paid tier today</strong> to get access to all posts in this newsletter.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gK6Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" width="1456" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Speculative Decoding, Simply Explained]]></title><description><![CDATA[Learn how Speculative Decoding works from scratch and how to use it in your AI applications for faster and cheaper inference.]]></description><link>https://www.intoai.pub/p/speculative-decoding-simply-explained</link><guid isPermaLink="false">https://www.intoai.pub/p/speculative-decoding-simply-explained</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Thu, 30 Apr 2026 18:42:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!l8XM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8xoK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8xoK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 424w, https://substackcdn.com/image/fetch/$s_!8xoK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 848w, https://substackcdn.com/image/fetch/$s_!8xoK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 1272w, https://substackcdn.com/image/fetch/$s_!8xoK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8xoK!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png" width="1200" height="595.8791208791209" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:723,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:3569314,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195365207?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8xoK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 424w, https://substackcdn.com/image/fetch/$s_!8xoK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 848w, https://substackcdn.com/image/fetch/$s_!8xoK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 1272w, https://substackcdn.com/image/fetch/$s_!8xoK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d09473d-a4ad-47d3-b2ff-5d0d13be1d3c_2324x1154.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLM inference is slow, memory-heavy, and computationally expensive. </p><p>LLMs are <a href="https://en.wikipedia.org/wiki/Autoregressive_model">autoregressive</a> and produce one token at a time, and the LLMs in use today also contain multiple trillion parameters. This means that producing just one token from these requires trillions of computations!</p><p>In 2022, Google published a research paper called &#8216;<em><a href="https://arxiv.org/abs/2211.17192">Fast Inference from Transformers via Speculative Decoding</a></em>&#8217; where they introduced an inference technique called <strong>Speculative Decoding.</strong></p><p>This technique claimed to increase inference speeds by <strong>2-3x without compromising LLM output quality</strong>, and since then, it has been used by AI overviews in Google Search (and every other modern LLM application) to speed up inference. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l8XM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l8XM!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 424w, https://substackcdn.com/image/fetch/$s_!l8XM!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 848w, https://substackcdn.com/image/fetch/$s_!l8XM!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 1272w, https://substackcdn.com/image/fetch/$s_!l8XM!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l8XM!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif" width="800" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l8XM!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 424w, https://substackcdn.com/image/fetch/$s_!l8XM!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 848w, https://substackcdn.com/image/fetch/$s_!l8XM!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 1272w, https://substackcdn.com/image/fetch/$s_!l8XM!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931dcfcb-ea96-4e74-a179-0212d11f6b59_800x910.gif 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://storage.googleapis.com/gweb-research2023-media/media/SpeculativeDecoding-0-AIO.mp4">Source</a></figcaption></figure></div><div><hr></div><h3>What really makes Inference slow?</h3><p>You might assume that performing trillions of computations to produce a single token is what limits the inference speed, but this is a&nbsp;<strong>big</strong>&nbsp;<strong>misconception</strong>!</p><p><a href="https://en.wikipedia.org/wiki/Graphics_processing_unit">GPUs</a> and <a href="https://en.wikipedia.org/wiki/Tensor_Processing_Unit">TPUs</a> in use today are powerful parallel computing machines and can perform hundreds of trillions of operations per second. This is often more than 100x higher than what&#8217;s needed to generate a token through an LLM.</p><p><em>So what&#8217;s the real bottleneck?</em></p><p>To produce a single token, the GPU must read all the model&#8217;s parameters from high-bandwidth memory (HBM) into the compute units. This process is painfully slow and leaves the compute units sitting idle while they wait for weights to arrive.</p><p>This makes the process <strong><a href="https://en.wikipedia.org/wiki/Memory-bound_function">memory-bound</a></strong> rather than compute-bound. </p><p>(<em>LLM inference is very different from LLM training, where you load the parameters once and reuse them across thousands of operations on a large batch of training samples.</em>)</p><blockquote><p><strong>Insight 1: This means that we have all that extra compute sitting there to be used.</strong> </p></blockquote><div><hr></div><h3>Is predicting all tokens really that tough?</h3><p>Consider this prompt and response pair: </p><blockquote><p><em>"Tell 2 statements about Jupiter and its moons. Jupiter is the largest planet in our solar system. It has dozens of moons, including Ganymede, the largest moon in the solar system."</em></p></blockquote><p>In this statement, considering words as tokens, some tokens, such as &#8220;<em>Jupiter</em>&#8221;, &#8220;<em>is</em>&#8221;, and &#8220;<em>the</em>&#8221;, are easier to predict/ generate, while others, such as &#8220;<em>Ganymede</em>&#8221;, are harder.</p><p>This is because the LLM has already seen &#8220;<em>Jupiter</em>&#8221; in its prompt, and in the positions that follow, "<em>is</em>" and "<em>the</em>" have few competing alternatives. </p><p>For &#8220;<em>Ganymede</em>&#8221;, its preceding context, "<em>dozens of moons, including</em>&#8221; leads to many possible valid completions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lZ4V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lZ4V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 424w, https://substackcdn.com/image/fetch/$s_!lZ4V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 848w, https://substackcdn.com/image/fetch/$s_!lZ4V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 1272w, https://substackcdn.com/image/fetch/$s_!lZ4V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lZ4V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png" width="1456" height="427" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:427,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:176249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195365207?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lZ4V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 424w, https://substackcdn.com/image/fetch/$s_!lZ4V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 848w, https://substackcdn.com/image/fetch/$s_!lZ4V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 1272w, https://substackcdn.com/image/fetch/$s_!lZ4V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7588ce15-fbc3-4e3a-8933-dad82d837c38_2386x700.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><strong>Insight 2: We do not always need a powerful LLM to produce tokens that are easier to predict.</strong></p></blockquote><div><hr></div><h3>How does Speculative Decoding use these ideas?</h3>
      <p>
          <a href="https://www.intoai.pub/p/speculative-decoding-simply-explained">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[This Week In AI Research (19-25 April 26) 🗓️]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-19-25-april</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-19-25-april</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Wed, 29 Apr 2026 01:10:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mwxe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mwxe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mwxe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 424w, https://substackcdn.com/image/fetch/$s_!mwxe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 848w, https://substackcdn.com/image/fetch/$s_!mwxe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 1272w, https://substackcdn.com/image/fetch/$s_!mwxe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mwxe!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png" width="1200" height="528.2967032967033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:641,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1026724,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mwxe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 424w, https://substackcdn.com/image/fetch/$s_!mwxe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 848w, https://substackcdn.com/image/fetch/$s_!mwxe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 1272w, https://substackcdn.com/image/fetch/$s_!mwxe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d51490-69c6-49b8-a1cd-e49be793daea_2484x1094.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. DeepSeek-V4</h3><p>This research paper introduces the preview version of the DeepSeek-V4 series with two Mixture-of-Experts (MoE) language models:</p><ol><li><p>DeepSeek-V4-Pro with 1.6T parameters (49B activated)</p></li><li><p>DeepSeek-V4-Flash with 284B parameters (13B activated)</p></li></ol><p>Both models support a context length of 1 million tokens and use multiple architectural and optimization techniques, including:</p><ul><li><p>A hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency</p></li><li><p>ManifoldConstrained Hyper-Connections (mHC) over conventional residual connections</p></li><li><p>The Muon optimizer for faster convergence and greater training stability</p></li></ul><p>Models in the DeepSeek-V4 series are highly efficient in long-context scenarios, with DeepSeekV4-Pro requiring only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2 in the 1M token context setting.</p><p><strong>DeepSeek-V4-ProMax</strong> is the maximum reasoning-effort mode of DeepSeek-V4-Pro, representing the new state of the art for open models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zJQj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zJQj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 424w, https://substackcdn.com/image/fetch/$s_!zJQj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 848w, https://substackcdn.com/image/fetch/$s_!zJQj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!zJQj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zJQj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png" width="1456" height="846" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:846,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:411889,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zJQj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 424w, https://substackcdn.com/image/fetch/$s_!zJQj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 848w, https://substackcdn.com/image/fetch/$s_!zJQj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!zJQj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae6e3ee9-6218-4c51-8211-40bbc21f6b0f_1846x1072.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>2. <strong>Hyperloop Transformers</strong></h3><p>This<strong> </strong>research paper introduces the <strong>Hyper-Connected Looped (Hyperloop) Transformer</strong>, a simple architecture that reuses Transformer layers across depth, making them more parameter-efficient than conventional Transformers.</p><p>The looped Transformer is organized into three blocks (begin, middle, and end), where each block consists of multiple Transformer layers, and only the middle block is applied recurrently across depth. The looped middle block is further augmented&nbsp;by <strong><a href="https://arxiv.org/abs/2512.24880">Manifold-Constrained Hyper-Connections (mHC)</a></strong>,<strong>&nbsp;</strong>which are applied only after each loop to create the Hyperloop Transformer.</p><p>The Hyperloop Transformer outperforms depth-matched Transformer and mHC Transformer baselines despite using ~50% fewer parameters, and its outperformance persists after post-training weight quantization.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Poe3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Poe3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 424w, https://substackcdn.com/image/fetch/$s_!Poe3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 848w, https://substackcdn.com/image/fetch/$s_!Poe3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 1272w, https://substackcdn.com/image/fetch/$s_!Poe3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Poe3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png" width="1456" height="951" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:951,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:273122,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Poe3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 424w, https://substackcdn.com/image/fetch/$s_!Poe3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 848w, https://substackcdn.com/image/fetch/$s_!Poe3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 1272w, https://substackcdn.com/image/fetch/$s_!Poe3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c88d16-47d6-4563-941c-19e8f7c19882_2054x1342.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.21254">using this link</a>.</p><div><hr></div><h3><strong>3. LLaDA2.0-Uni</strong></h3><p>This research paper introduces <strong>LLaDA2.0-Uni</strong>, a unified discrete diffusion LLM (dLLM) that combines language and vision within a single framework.</p><p>The model converts images into discrete semantic tokens, jointly processes text and visual inputs using a <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts">Mixture-of-experts (MoE)</a> backbone, and then reconstructs high-quality images with a diffusion decoder. This architecture enables the model to understand, generate, and edit across different modalities within a single system.</p><p>LLaDA2.0-Uni matches specialized VLMs in multimodal understanding and has strong performance in image generation and editing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_PKK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_PKK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 424w, https://substackcdn.com/image/fetch/$s_!_PKK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 848w, https://substackcdn.com/image/fetch/$s_!_PKK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 1272w, https://substackcdn.com/image/fetch/$s_!_PKK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_PKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png" width="1456" height="929" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:929,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:968713,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_PKK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 424w, https://substackcdn.com/image/fetch/$s_!_PKK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 848w, https://substackcdn.com/image/fetch/$s_!_PKK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 1272w, https://substackcdn.com/image/fetch/$s_!_PKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe97b4c4e-8d73-42e4-a571-6607ef4a94f7_2054x1310.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.20796">using this link</a>.</p><div><hr></div><h3><strong>4. Sapiens2</strong></h3><p>This research paper introduces <strong>Sapiens2</strong>, a family of high-resolution vision transformer models designed for&nbsp;human-centric vision tasks, including pose estimation, segmentation, and surface reconstruction.</p><p>Sapiens2 models are pretrained using unified objectives (combining masked image reconstruction and self-distilled contrastive learning) on a curated dataset of 1 billion high-quality human images.</p><p>These models set a new state of the art, improving over the first generation in pose, body-part segmentation, and normal estimation, and extending to new tasks such as pointmap and albedo estimation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M9tF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M9tF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 424w, https://substackcdn.com/image/fetch/$s_!M9tF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 848w, https://substackcdn.com/image/fetch/$s_!M9tF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 1272w, https://substackcdn.com/image/fetch/$s_!M9tF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M9tF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png" width="1456" height="702" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:702,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:998746,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M9tF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 424w, https://substackcdn.com/image/fetch/$s_!M9tF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 848w, https://substackcdn.com/image/fetch/$s_!M9tF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 1272w, https://substackcdn.com/image/fetch/$s_!M9tF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b45641-b63d-4b3c-8ad3-17ced6fc4a2e_2430x1172.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.21681">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3><strong>5. Neural Garbage Collection</strong></h3><p>This research paper introduces <strong>Neural Garbage Collection (NGC)</strong>, a method that helps LLMs to selectively forget information while reasoning. </p><p>Instead of keeping all intermediate tokens (which creates large KV caches), a model is trained end-to-end with RL and an outcome-based task reward to learn which tokens to keep and which to discard to manage memory effectively.</p><p>NGC enables significant cache compression without sacrificing reasoning accuracy, making LLMs more scalable for long-context inference.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J1Xo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J1Xo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 424w, https://substackcdn.com/image/fetch/$s_!J1Xo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 848w, https://substackcdn.com/image/fetch/$s_!J1Xo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!J1Xo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J1Xo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png" width="1456" height="1236" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1236,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:459925,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J1Xo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 424w, https://substackcdn.com/image/fetch/$s_!J1Xo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 848w, https://substackcdn.com/image/fetch/$s_!J1Xo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!J1Xo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36a97e63-57e3-4f3e-9c1a-1ae8acd4ce40_1588x1348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.18002">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3><strong>6. AI scientists produce results without reasoning scientifically</strong></h3><p>This research paper tells that current LLM-based research agents can execute scientific workflows, but they do not follow the reasoning processes typical of real science. </p><p>Through large-scale experiments, the authors show that performance is driven almost entirely by the base model rather than agent scaffolding, and that agents often overlook evidence, seldom change their beliefs, and rarely combine different lines of evidence.</p><p>As a result, their outputs may seem accurate, but the reasoning behind them lacks key epistemic properties, such as hypothesis testing and self-correction. </p><p>For AI systems to produce reliable and trustworthy knowledge, scientific reasoning itself needs to become a training objective, and simply improving prompts or agent frameworks is not enough.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Pl1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Pl1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 424w, https://substackcdn.com/image/fetch/$s_!5Pl1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 848w, https://substackcdn.com/image/fetch/$s_!5Pl1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!5Pl1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Pl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png" width="1202" height="1350" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1350,&quot;width&quot;:1202,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:361923,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Pl1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 424w, https://substackcdn.com/image/fetch/$s_!5Pl1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 848w, https://substackcdn.com/image/fetch/$s_!5Pl1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!5Pl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48daf987-b3cd-4235-ae30-c8dfce7af42d_1202x1350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.18805">using this link</a>.</p><div><hr></div><h3><strong>7. SWE-chat: Coding Agent Interactions From Real Users in the Wild</strong></h3><p>This research paper presents <strong>SWE-chat</strong>, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild, containing 6,000 sessions, more than 63,000 user prompts, and 355,000 agent tool calls.</p><p>Experiments with SWE-chat show that:</p><ul><li><p>In 41% of sessions, agents author virtually all committed code (a.k.a. &#8220;Vibe coding&#8221;), while in 23% of sessions, humans write all code themselves.</p></li><li><p>Only 44% of all agent-produced code survives into user commits, and agent-written code introduces more security vulnerabilities than human-authored code. </p></li><li><p>Users push back against agent outputs (through corrections, failure reports, and interruptions) in 44% of all turns.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IPpJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IPpJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 424w, https://substackcdn.com/image/fetch/$s_!IPpJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 848w, https://substackcdn.com/image/fetch/$s_!IPpJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!IPpJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IPpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:754903,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IPpJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 424w, https://substackcdn.com/image/fetch/$s_!IPpJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 848w, https://substackcdn.com/image/fetch/$s_!IPpJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!IPpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a3690f-1a66-4152-b758-a35f0510b3ed_2284x1282.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.20779">using this link</a>.</p><div><hr></div><h3><strong>8. Image Generators are Generalist Vision Learners</strong></h3><p>This research paper argues that training image-generation models builds a general visual understanding in them, similar to how LLMs develop broad capabilities through generative pre-training.</p><p>While learning how to generate images, these models also learn rich representations that apply to many vision tasks, such as segmentation and depth estimation in a zero-shot or minimally supervised way. </p><p>The authors show this with a generalist model called <strong>Vision Banana</strong>, built by instruction-tuning Nano Banana Pro (NBP) on a mixture of its original training data alongside a small amount of vision task data. </p><p>Vision Banana achieves state-of-the-art results across multiple vision tasks, including 2D and 3D understanding, outperforming or matching zero-shot domain specialists, such as <a href="https://ai.meta.com/research/sam3/">SAM 3</a>, on segmentation tasks and the <a href="https://depth-anything.github.io/">Depth Anything</a> series on metric depth estimation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j20d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j20d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 424w, https://substackcdn.com/image/fetch/$s_!j20d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 848w, https://substackcdn.com/image/fetch/$s_!j20d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!j20d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j20d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png" width="1456" height="888" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:888,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:916945,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j20d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 424w, https://substackcdn.com/image/fetch/$s_!j20d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 848w, https://substackcdn.com/image/fetch/$s_!j20d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!j20d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb38e433-f2b1-4905-8a5a-183ee2e8cfb0_2114x1290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.20329">using this link</a>.</p><div><hr></div><h3><strong>9. How Do AI Agents Spend Your Money?</strong></h3><p>This research paper examines how modern coding agents use tokens in real workflows and why they can unexpectedly become expensive.</p><p>The paper analyzes trajectories from 8 frontier LLMs on <a href="https://epoch.ai/benchmarks/swe-bench-verified">SWE-bench Verified</a> and evaluates their ability to predict their own token costs before task execution.</p><p>Some of the interesting findings from this paper are:</p><ul><li><p>Agentic tasks consume 1000&#215; more tokens than code reasoning and code chat, with input tokens rather than output tokens driving the overall cost.</p></li><li><p>Token usage is highly variable, and multiple runs on the same task can differ by up to 30&#215; in total tokens. </p></li><li><p>Higher token usage does not translate into higher accuracy. Instead, accuracy often peaks at intermediate cost and saturates at higher costs.</p></li><li><p>Models differ substantially in token efficiency. On the same tasks, Kimi-K2 and Claude-Sonnet-4.5, on average, consume over 1.5M more tokens than GPT-5.</p></li><li><p>Task difficulty ratings by human experts align only weakly with actual token costs.</p></li><li><p>Frontier models fail to accurately predict their own token usage and systematically underestimate real token costs.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lHEy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lHEy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 424w, https://substackcdn.com/image/fetch/$s_!lHEy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 848w, https://substackcdn.com/image/fetch/$s_!lHEy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!lHEy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lHEy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png" width="1456" height="556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:556,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:272845,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lHEy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 424w, https://substackcdn.com/image/fetch/$s_!lHEy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 848w, https://substackcdn.com/image/fetch/$s_!lHEy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!lHEy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6a18430-a879-4bf6-b4cb-e57cc7072cdf_2624x1002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wOQn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wOQn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 424w, https://substackcdn.com/image/fetch/$s_!wOQn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 848w, https://substackcdn.com/image/fetch/$s_!wOQn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!wOQn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wOQn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png" width="1456" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:476338,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wOQn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 424w, https://substackcdn.com/image/fetch/$s_!wOQn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 848w, https://substackcdn.com/image/fetch/$s_!wOQn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!wOQn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dff604-726c-48f0-a079-f28e18b84889_2646x1204.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.22750">using this link</a>.</p><div><hr></div><h3>10. Wan-Image</h3><p>This research paper presents <strong>Wan-Image</strong>, a unified multimodal image generation system designed to move beyond casual image synthesis into professional-grade visual creation tools.</p><p>It combines a reasoning-capable multimodal LLM (planner) with a diffusion-based image generator (visualizer), allowing the system to understand complex user intent and produce precise, high-quality outputs.</p><p>Unlike typical diffusion models, Wan-Image focuses on control and real-world usability. It supports features such as long-text rendering, identity consistency across images, palette-guided design, multi-image storytelling, and high-resolution (4K) generation.</p><p>Wan-Image outperforms Seedream 5.0 Lite and GPT Image 1.5 across the board, achieving parity with Nano Banana Pro on challenging tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bjKz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bjKz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 424w, https://substackcdn.com/image/fetch/$s_!bjKz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 848w, https://substackcdn.com/image/fetch/$s_!bjKz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 1272w, https://substackcdn.com/image/fetch/$s_!bjKz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bjKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png" width="1512" height="1275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1275,&quot;width&quot;:1512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3312060,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195514181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F559aa0d5-34da-43bb-ba51-3ca9a01083ce_1512x1288.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bjKz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 424w, https://substackcdn.com/image/fetch/$s_!bjKz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 848w, https://substackcdn.com/image/fetch/$s_!bjKz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 1272w, https://substackcdn.com/image/fetch/$s_!bjKz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b187c70-cae9-49a6-925f-17d81b5da395_1512x1275.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.19858v2">using this link</a>.</p><div><hr></div><p>Share this article with others and <strong>earn referral rewards</strong>. &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/this-week-in-ai-research-19-25-april?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/this-week-in-ai-research-19-25-april?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#127829; <a href="https://www.intoai.pub/p/diffusion-llms-explained-simply">Diffusion LLMs, Explained Simply</a></p></li><li><p>&#127851; <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts?utm_source=publication-search">Build and Train a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#127752; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li><li><p>&#127856; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B_dw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" width="1344" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[⚡️ Flash Sale for "LLMs In 100 Images"]]></title><description><![CDATA[Only 22 hours left...]]></description><link>https://www.intoai.pub/p/flash-sale-for-llms-in-100-images</link><guid isPermaLink="false">https://www.intoai.pub/p/flash-sale-for-llms-in-100-images</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Mon, 27 Apr 2026 21:25:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8gSJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey there! &#128075;&#127995;</p><p>&#127873; A flash sale just went live for &#8220;<strong>LLMs In 100 Images</strong>&#8221;!</p><p>I am offering you a <strong>40% discount</strong> on the book for the <strong>NEXT 24 hours! </strong>(It&#8217;s actually <strong>22 hours now</strong>.)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8gSJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8gSJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!8gSJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!8gSJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!8gSJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8gSJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69514,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/195679845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8gSJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!8gSJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!8gSJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!8gSJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F422f4983-96b0-4c52-8692-864e945e9c7e_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It won&#8217;t be a better time to get the book if you&#8217;ve been thinking about it.</p><p>Here&#8217;s your link:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bamaniaashish.gumroad.com/l/llmbook/40FLASH&quot;,&quot;text&quot;:&quot;Grab your 40% discount&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://bamaniaashish.gumroad.com/l/llmbook/40FLASH"><span>Grab your 40% discount</span></a></p><p><strong>&#9200; This offer ends in the next 22 hours, so don&#8217;t wait too long.</strong></p><p>Cheers,<br>Ashish</p>]]></content:encoded></item><item><title><![CDATA[Diffusion LLMs, Explained Simply]]></title><description><![CDATA[A gentle but comprehensive introduction to Diffusion LLMs.]]></description><link>https://www.intoai.pub/p/diffusion-llms-explained-simply</link><guid isPermaLink="false">https://www.intoai.pub/p/diffusion-llms-explained-simply</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Thu, 23 Apr 2026 18:19:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4OWT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4OWT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4OWT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 424w, https://substackcdn.com/image/fetch/$s_!4OWT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 848w, https://substackcdn.com/image/fetch/$s_!4OWT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 1272w, https://substackcdn.com/image/fetch/$s_!4OWT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4OWT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png" width="1456" height="718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2310236,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194697013?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4OWT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 424w, https://substackcdn.com/image/fetch/$s_!4OWT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 848w, https://substackcdn.com/image/fetch/$s_!4OWT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 1272w, https://substackcdn.com/image/fetch/$s_!4OWT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8461e51e-f9d6-485f-ac77-552a448ec9e7_1922x948.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLM-based chatbots are all around us. They reply by producing their responses sequentially. This means that they generate their output token by token, one at a time.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kSwC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kSwC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 424w, https://substackcdn.com/image/fetch/$s_!kSwC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 848w, https://substackcdn.com/image/fetch/$s_!kSwC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 1272w, https://substackcdn.com/image/fetch/$s_!kSwC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kSwC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png" width="728" height="230" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:460,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:139421,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194697013?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kSwC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 424w, https://substackcdn.com/image/fetch/$s_!kSwC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 848w, https://substackcdn.com/image/fetch/$s_!kSwC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 1272w, https://substackcdn.com/image/fetch/$s_!kSwC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95ede2b9-c120-4f7e-94e4-aa8497845b58_2374x750.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>This makes them painfully slow (<em>For a computer engineer, increasing execution speed is one of their wildest dreams</em>).</p><p>The reason is that their underlying architecture, the Transformer, is&nbsp;an <a href="https://en.wikipedia.org/wiki/Autoregressive_model">autoregressive</a> model that at each step of text generation answers this simple question:</p><blockquote><p><em>Given all previous tokens, what is the probability distribution of the next token over the entire vocabulary of the model?</em></p></blockquote><p>This is the <strong>Next-token prediction objective</strong>.</p><p>For the prompt shown above: &#8220;The cat&#8221;.</p><p>At the first step of generation, the model could produce a probability distribution and select the token with the highest probability, as shown below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UqQG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UqQG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 424w, https://substackcdn.com/image/fetch/$s_!UqQG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 848w, https://substackcdn.com/image/fetch/$s_!UqQG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 1272w, https://substackcdn.com/image/fetch/$s_!UqQG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UqQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png" width="1456" height="469" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:469,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UqQG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 424w, https://substackcdn.com/image/fetch/$s_!UqQG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 848w, https://substackcdn.com/image/fetch/$s_!UqQG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 1272w, https://substackcdn.com/image/fetch/$s_!UqQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75a1899-e2d0-430e-99ef-1ddbc8371d0d_1600x515.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The LLM picks the next token as the one with the maximum probability in the next-token probability distribution (Greedy decoding)</figcaption></figure></div><p>The selected token is then added to the original prompt, and the process repeats until the LLM finishes generating its output.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Is this the only way to generate tokens?</h3><p>What if we could generate tokens all at once, i.e., in parallel? </p><p>Yes, it is possible, and this is something that Diffusion LLMs do, using a process called &#8220;Diffusion&#8221; (<em>duh</em>).</p><p><strong><a href="https://en.wikipedia.org/wiki/Diffusion">Diffusion</a></strong> is actually a physical process in which particles from regions of higher concentration move to regions of lower concentration, eventually dispersing evenly to an equilibrium state.</p><p>Image generation models are inspired by this principle. These models are trained for using two diffusion processes as follows:</p><ol><li><p><strong>Forward Diffusion</strong>: The model starts with input data (an image) and incrementally (in multiple timesteps) adds <a href="https://en.wikipedia.org/wiki/Gaussian_noise">Gaussian noise</a> until the data becomes completely random. </p><p><br>In other words, this means changing the pixel values of the input image until it becomes pure noise.  <br></p></li><li><p><strong>Reverse Diffusion: </strong>In this phase, a neural network learns to reverse the noise-addition process by predicting the added noise at each timestep during the forward diffusion process.<br><br>In other words, this means recreating the original pixel values of the input image from pure noise. </p></li></ol><p>These processes are controlled by a&nbsp;<a href="https://cabralpinto.github.io/modular-diffusion/modules/noise-schedule/">Noise schedule</a>, which defines the rate at which noise is added to the data (forward diffusion) or removed from the&nbsp;data (reverse diffusion/denoising process) across different timesteps.</p><p>Once the diffusion model is fully trained, it can generate new image samples from pure noise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0wyy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0wyy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!0wyy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!0wyy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!0wyy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0wyy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png" width="1080" height="1080" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1080,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:203014,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194697013?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0wyy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!0wyy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!0wyy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!0wyy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdedcac6a-f8a6-41f9-a10a-63eab645f7ce_1080x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from &#8220;<a href="https://bamaniaashish.gumroad.com/l/visual_ai">AI In 100 Images</a>&#8221;</figcaption></figure></div><p>While Diffusion is commonly used in image generation models, it is increasingly being applied to language models. </p><p>To apply Diffusion, we would need to learn how to make the tokens noisy first. But how do we do this, since tokens are discrete rather than continuous, like the pixel values of an image?</p><div><hr></div><h3>How to apply Diffusion to text?</h3>
      <p>
          <a href="https://www.intoai.pub/p/diffusion-llms-explained-simply">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[This Week In AI Research (12-18 April 26) 🗓️]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-12-18-april</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-12-18-april</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Wed, 22 Apr 2026 13:14:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!lfs6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lfs6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lfs6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 424w, https://substackcdn.com/image/fetch/$s_!lfs6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 848w, https://substackcdn.com/image/fetch/$s_!lfs6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!lfs6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lfs6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png" width="1456" height="645" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:645,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:714748,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lfs6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 424w, https://substackcdn.com/image/fetch/$s_!lfs6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 848w, https://substackcdn.com/image/fetch/$s_!lfs6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!lfs6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbac93d1d-f8ef-4aa3-a1bf-a21cceca20dd_2712x1202.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. Dive into Claude Code</h3><p>This research paper describes the architecture of <strong><a href="https://code.claude.com/docs/en/overview">Claude Code</a></strong> by analyzing its publicly available TypeScript source code and comparing it with the open-source <a href="https://openclaw.ai/">OpenClaw</a>. </p><p>At its core, Claude Code is a simple while-loop that calls the model, runs tools, and repeats. Its impressive capabilities come from the systems built around this loop:</p><ul><li><p>A permission system with seven modes and an ML-based classifier</p></li><li><p>A five-layer compaction pipeline for context management</p></li><li><p>Four extensibility mechanisms (MCP, plugins, skills, and hooks)</p></li><li><p>A subagent delegation and orchestration mechanism </p></li><li><p>Append-oriented session storage</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RA-O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RA-O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 424w, https://substackcdn.com/image/fetch/$s_!RA-O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 848w, https://substackcdn.com/image/fetch/$s_!RA-O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!RA-O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RA-O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png" width="1456" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:595,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:271117,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RA-O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 424w, https://substackcdn.com/image/fetch/$s_!RA-O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 848w, https://substackcdn.com/image/fetch/$s_!RA-O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!RA-O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cc30fc1-d4a0-4d8d-99af-3589a1d63547_2602x1064.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.14228">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>2. Gemini Robotics-ER 1.6</h3><p>This blog post introduces Google DeepMind&#8217;s <strong>Gemini Robotics-ER 1.6</strong>, an upgraded AI model that helps robots better understand and reason about the physical world. </p><p>It improves spatial awareness, multi-camera scene understanding, and task completion detection. The model also adds useful capabilities, such as reading gauges, thermometers, and industrial instruments, which are particularly beneficial for inspection robots like <a href="https://bostondynamics.com/products/spot/">Boston Dynamics&#8217; Spot</a>. </p><p>Additionally, the model enables safer decisions regarding physical constraints, such as avoiding unsafe objects or tasks. This positions it as a step toward more autonomous robots that can operate reliably in real-world environments.</p><div id="youtube2-kBwxmlI2yHQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;kBwxmlI2yHQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/kBwxmlI2yHQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Read more about this release <a href="https://deepmind.google/blog/gemini-robotics-er-1-6/">using this link</a>.</p><div><hr></div><h3>3. &#960; 0.7: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities</h3><p>This research paper introduces &#960;&#8320;.&#8327;, a general-purpose robotic foundation model from Physical Intelligence that equips robots with strong out-of-the-box capabilities across a range of tasks without requiring retraining for specific jobs. </p><p>The main idea behind &#960;&#8320;.&#8327; is to use diverse context conditioning during training. Instead of relying solely on language commands, the model also incorporates richer context, such as subgoal images, task metadata, control modes, and demonstrations, enabling it to adapt its behavior and strategy effectively.</p><p>In experiments, &#960;&#8320;.&#8327; demonstrates broad generalization. It can handle unseen kitchens, fold laundry without prior training on new robot bodies, and even operate an espresso machine with performance levels comparable to those of specialized reinforcement-learning systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F7vp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F7vp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 424w, https://substackcdn.com/image/fetch/$s_!F7vp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 848w, https://substackcdn.com/image/fetch/$s_!F7vp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!F7vp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F7vp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png" width="1456" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2319279,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F7vp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 424w, https://substackcdn.com/image/fetch/$s_!F7vp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 848w, https://substackcdn.com/image/fetch/$s_!F7vp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!F7vp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4f84ba-3c12-4156-b42b-8b535b4884af_2016x1262.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.15483">using this link</a>.</p><div><hr></div><h3>4. Qwen3.5-Omni </h3><p>This research paper presents <strong>Qwen3.5-Omni</strong>, Alibaba&#8217;s new multimodal model for handling text, images, audio, and video. </p><p>The model comes with hundreds of billions of parameters and a 256k-token context window, and was trained on large text-image collections and over 100 million hours of audio-visual data.</p><p>Qwen3.5-Omni uses a Hybrid Attention Mixture-of-Experts design for reasoning and response generation, which improves efficiency on long multimodal sequences. It can process over 10 hours of audio and about 400 seconds of 720p video in a single session. </p><p>The research paper also introduces <strong>ARIA (Adaptive Rate Interleave Alignment)</strong>, a method that better synchronizes text and speech token streams, resulting in more natural low-latency conversational speech. </p><p>Alongside this, the model supports multilingual speech generation in 36 languages, zero-shot voice cloning from short samples, strong temporal video captioning, scene segmentation, and a new capability called &#8220;<strong>Audio-Visual Vibe Coding</strong>&#8221;, which enables it to write code directly from spoken and visual instructions.</p><p>The top model, Qwen3.5-Omni-Plus, achieves leading results across 215 benchmarks and tasks for audio and audio-visual understanding, reasoning, and interaction. It outperforms Google DeepMind Gemini 3.1 Pro on several audio tasks and matches it on broad audio-visual evaluation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9YEs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9YEs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 424w, https://substackcdn.com/image/fetch/$s_!9YEs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 848w, https://substackcdn.com/image/fetch/$s_!9YEs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!9YEs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9YEs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png" width="1456" height="1172" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1172,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:766219,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9YEs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 424w, https://substackcdn.com/image/fetch/$s_!9YEs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 848w, https://substackcdn.com/image/fetch/$s_!9YEs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!9YEs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c86866-cfad-424c-9b1e-ae32dbac8b6c_1608x1294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.15804">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>5. Do Vision-Language Models Truly Perform Vision Reasoning?</h3><p>This research paper introduces <strong>CrossMath</strong>, a benchmark that tests whether vision-language models actually reason with images or primarily rely on text. Each problem in the benchmark is available in text-only, image-only, and image+text formats, and verified by a human annotator.</p><p>The authors find that top VLMs perform best on text-only tasks, and adding visual data (image+text) often lowers performance. This suggests that current vision-language models primarily reason through their language components, with limited reliance on visual evidence. </p><p>Fine-tuning a VLM on a curated CrossMath training set significantly improves reasoning performance across both text and vision tasks, suggesting that this issue can be partially fixed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q-2Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 424w, https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 848w, https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png" width="1456" height="842" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:842,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:194888,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 424w, https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 848w, https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!Q-2Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da26b64-a100-45d7-a37a-fd696cb2edc8_1986x1148.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.16256">using this link</a>.</p><div><hr></div><h3>6. LLM Reasoning Is Latent, Not the Chain of Thought</h3><p>The author of the research paper compares three explanations for improvements in LLM reasoning: </p><ol><li><p>Hidden internal trajectories</p></li><li><p>Explicit written reasoning steps</p></li><li><p>Simply giving the model more sequential compute</p></li></ol><p>Reviewing recent evidence, it is concluded that the strongest current&nbsp;support is for the latent-state perspective, which suggests that models often reason internally and the visible CoT is not always a reliable reflection of that process. </p><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.15726">using this link</a>.</p><div><hr></div><h3>7. Autogenesis: A Self-Evolving Agent Protocol</h3><p>This research paper introduces the <strong>Autogenesis Protocol (AGP)</strong>, a framework designed for creating self-evolving LLM agent systems that can manage complex, long-term tasks more effectively. </p><p>Current agent standards often fall short in areas like lifecycle management, memory and context handling, version tracking, and safe update processes. These gaps result in fragile, single-unit agentic systems. </p><p>AGP tackles this issue by distinguishing what changes (such as prompts, tools, agents, memory, and environments) from how those changes occur. It implements a closed-loop method for suggesting, assessing, implementing, and reverting improvements. </p><p>The authors also present the <strong>Autogenesis System (AGS)</strong>, a multi-agent setup that generates and improves resources during execution. This system demonstrates consistent improvements over strong baseline models across planning and tool-use benchmarks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uoyZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uoyZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 424w, https://substackcdn.com/image/fetch/$s_!uoyZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 848w, https://substackcdn.com/image/fetch/$s_!uoyZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 1272w, https://substackcdn.com/image/fetch/$s_!uoyZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uoyZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png" width="1456" height="662" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:662,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:802840,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uoyZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 424w, https://substackcdn.com/image/fetch/$s_!uoyZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 848w, https://substackcdn.com/image/fetch/$s_!uoyZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 1272w, https://substackcdn.com/image/fetch/$s_!uoyZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2368fe2e-e898-40e7-bbb6-3543d7cc2cb4_2770x1260.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.15034">using this link</a>.</p><div><hr></div><h3>8. Seedance 2.0</h3><p>This research paper introduces <strong>Seedance 2.0</strong>, a native multimodal audio-video generation model that creates synchronized content from text, images, audio, or video inputs in a single system. </p><p>Compared to earlier versions, it greatly improves video quality and audio realism, with its performance close to that of leading models in expert and public evaluations. </p><p>It can generate 4 to 15-second clips at 480p or 720p and also supports various editing and reference inputs, including multiple videos, images, and audio clips. </p><p>Additionally, the paper also features a Seedance 2.0 Fast version optimized for low-latency generation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Raty!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Raty!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 424w, https://substackcdn.com/image/fetch/$s_!Raty!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 848w, https://substackcdn.com/image/fetch/$s_!Raty!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 1272w, https://substackcdn.com/image/fetch/$s_!Raty!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Raty!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png" width="1456" height="493" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:493,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:510418,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Raty!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 424w, https://substackcdn.com/image/fetch/$s_!Raty!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 848w, https://substackcdn.com/image/fetch/$s_!Raty!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 1272w, https://substackcdn.com/image/fetch/$s_!Raty!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c201881-f0f7-43d7-a8da-b6ebced5fdfe_2860x968.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.14148">using this link</a>.</p><div><hr></div><h3>9.CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors</h3><p>This research paper presents <strong>CoDaS (AI Co-Data-Scientist)</strong>, a multi-agent system designed to discover health biomarkers from data collected by wearable devices (sleep patterns, activity levels, and heart rate signals). </p><p>It organizes the discovery process as an iterative process that includes generating hypotheses, conducting statistical tests, performing validation, reviewing existing literature, and involving human oversight. </p><p>In a study of 9,279 participants, CoDaS identified potential biomarkers for mental health and metabolic conditions, including links between depression and irregular sleep or circadian rhythms, and between insulin resistance and a fitness index based on step counts and resting heart rate. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KRiH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KRiH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 424w, https://substackcdn.com/image/fetch/$s_!KRiH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 848w, https://substackcdn.com/image/fetch/$s_!KRiH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!KRiH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KRiH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:514801,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KRiH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 424w, https://substackcdn.com/image/fetch/$s_!KRiH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 848w, https://substackcdn.com/image/fetch/$s_!KRiH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!KRiH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f98f6dc-66d8-425c-985a-b56e9f16515c_2588x1300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.14615">using this link</a>.</p><div><hr></div><h3>10. Lyra 2.0: Explorable Generative 3D Worlds</h3><p>This research paper introduces <strong>Lyra 2.0</strong>, a system that creates persistent, explorable 3D worlds. It starts by making camera-controlled walkthrough videos and then turns these into real-time 3D scenes. </p><p>The main challenge in generating worlds is maintaining consistency across large environments over long paths. Current video models often struggle with spatial forgetting, which means they lose track of previously seen areas, and with temporal drifting, where errors build up over time. </p><p>Lyra 2.0 solves this problem by using recovered 3D geometry to access relevant past views and training the model to fix its own degraded outputs. This leads to longer, more coherent scene generation and allows for the creation of high-quality AI-generated worlds for gaming, simulation, and robotics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Ibs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Ibs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 424w, https://substackcdn.com/image/fetch/$s_!6Ibs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 848w, https://substackcdn.com/image/fetch/$s_!6Ibs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!6Ibs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Ibs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png" width="1456" height="742" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:742,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1670362,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/194909025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Ibs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 424w, https://substackcdn.com/image/fetch/$s_!6Ibs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 848w, https://substackcdn.com/image/fetch/$s_!6Ibs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!6Ibs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb77cc17f-95b5-4aff-9787-22e8b60ab21d_2578x1314.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.13036">using this link</a>.</p><div><hr></div><p>Share this article with others and <strong>earn some referral rewards</strong>. &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/this-week-in-ai-research-12-18-april?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/this-week-in-ai-research-12-18-april?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#128126; <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts?utm_source=publication-search">Build and Train a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#127839; <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#127851; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li><li><p>&#127849; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gK6Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" width="1456" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Build GPT in plain Python]]></title><description><![CDATA[Implement the GPT architecture in pure Python without using deep learning frameworks or dependencies]]></description><link>https://www.intoai.pub/p/build-gpt-in-plain-python</link><guid isPermaLink="false">https://www.intoai.pub/p/build-gpt-in-plain-python</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Wed, 15 Apr 2026 12:42:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nkyO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nkyO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nkyO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 424w, https://substackcdn.com/image/fetch/$s_!nkyO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 848w, https://substackcdn.com/image/fetch/$s_!nkyO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!nkyO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nkyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png" width="1456" height="641" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:641,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1558923,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192394103?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nkyO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 424w, https://substackcdn.com/image/fetch/$s_!nkyO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 848w, https://substackcdn.com/image/fetch/$s_!nkyO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!nkyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02827e86-d51f-4bff-a402-db315bf43cde_2526x1112.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://karpathy.ai/">Andrej Karpathy</a>&nbsp;recently published&nbsp;<strong><a href="https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95">MicroGPT</a>&nbsp;</strong>on GitHub. This is an elegant implementation of GPT in pure Python, and a gold mine for curious AI engineers.</p><p>In this article, we will dissect this code step by step and learn as much as possible from it. </p><p>Let&#8217;s begin!</p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#128126; <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts?utm_source=publication-search">Build and Train a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#127839; <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129382; <a href="https://www.intoai.pub/p/build-a-personal-health-companion">Build a Personal Health Companion with Mem0 and CrewAI</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li><li><p>&#127849; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><div><hr></div><h3>1. Python standard library imports</h3><p>The implementation imports the Python standard libraries but skips any AI-related libraries, such as <code>numpy</code> or <code>torch</code>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;baf101ea-4492-4172-bf9d-f9986f179a9c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">import os              # Interface to the operating system
import math            # For math calculations
import random          # For pseudo-random number generation
import urllib.request  # For downloading files from the internet

random.seed(42)        # Set the seed for reproducible results</code></pre></div><div><hr></div><h3>2. Loading the dataset </h3><p>A simple text file of names is used as the training database. A portion of it looks as follows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kIdh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kIdh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 424w, https://substackcdn.com/image/fetch/$s_!kIdh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 848w, https://substackcdn.com/image/fetch/$s_!kIdh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 1272w, https://substackcdn.com/image/fetch/$s_!kIdh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kIdh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106471,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192394103?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kIdh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 424w, https://substackcdn.com/image/fetch/$s_!kIdh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 848w, https://substackcdn.com/image/fetch/$s_!kIdh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 1272w, https://substackcdn.com/image/fetch/$s_!kIdh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac418485-5f1c-41b8-a8f5-313da80e6b9c_1474x804.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://raw.githubusercontent.com/karpathy/makemore/988aa59/names.txt">Source</a></figcaption></figure></div><p>This file is available <a href="https://raw.githubusercontent.com/karpathy/makemore/988aa59/names.txt">here</a> and loaded as follows.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;ce476f6f-6375-4ebc-9fad-3ed46e6b7011&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># If the dataset file does not exist locally
if not os.path.exists('input.txt'):
    
    # The URL to the dataset
    names_url = 'https://raw.githubusercontent.com/karpathy/makemore/988aa59/names.txt'
    
    # Download it and save it locally as 'input.txt'
    urllib.request.urlretrieve(names_url, 'input.txt')

# Open the file and read it line by line
# .strip() removes whitespace
# "if line.strip()" skips empty lines
docs = [line.strip() for line in open('input.txt') if line.strip()]

# Shuffle the names list in the file and make its ordering random
random.shuffle(docs)

# Print how many names are there in the file (dataset)
print(f"Num docs: {len(docs)}")

# Num docs: 32033</code></pre></div><p>This gives 32,033 names or training samples for the GPT model to learn from.</p><div><hr></div><h3>3. Setting up the Tokenizer</h3><p>The Tokenizer maps each unique character (token) in the dataset to a unique number (its token ID).</p><p>It then adds one extra special token called <code>BOS</code> (Beginning of sequence) that is used to mark the start and end of a name (training sample/ sequence).</p><p>In bigger GPT models, one uses a separate <code>EOS</code><a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts"> (End of sequence) token</a> for marking the end of a sequence. Karpathy uses a simplified approach here, using just one token (<code>BOS</code>) to mark both the start and the end of the sequence.</p><p>The <strong>Vocabulary</strong> of the model is the total number of distinct tokens it knows. </p><p>The <strong>Vocabulary size</strong> is the number of unique characters in the dataset plus one (for the <code>BOS</code> token).</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;41798765-4d1a-42b8-aa97-c9e8e34006b9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># 1. Join all names into one big string
# 2. Turn it into a set to get the unique characters
# 3. Sort them
# Each character's position in the resulting list becomes its token id (0, 1, 2, ..., 25).
uchars = sorted(set(''.join(docs)))

# Print the characters 
print(uchars)
# ['a', 'b' ... 'z']

# The BOS token is assigned a token ID of 26
BOS = len(uchars)

# Vocabulary size is number of unique characters + 1 (for the BOS token)
vocab_size = len(uchars) + 1

# Print the vocabulary size
print(f"Vocab size: {vocab_size}")
# Vocab size: 27</code></pre></div><div><hr></div><h3>4. Building a simple Automatic Differentiation engine</h3><p><a href="https://www.google.com/search?q=Autograd&amp;oq=autograd&amp;gs_lcrp=EgZjaHJvbWUqBwgAEAAYgAQyBwgAEAAYgAQyBwgBEAAYgAQyBwgCEAAYgAQyBwgDEAAYgAQyBwgEEAAYgAQyBwgFEAAYgAQyBggGEEUYQTIGCAcQLhhA0gEIMTQ5MmowajGoAgCwAgA&amp;sourceid=chrome&amp;ie=UTF-8&amp;ved=2ahUKEwihubLCvOCTAxUeYEEAHaSDFIYQgK4QegYIAQgAEAM">Autograd</a> is the automatic differentiation engine used in PyTorch. </p><p>During the forward pass of neural network training, it records the differentiable operations performed on the inputs and parameters, and builds a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">graph</a> of how the final loss was computed. </p><p>During the backward pass, it walks the graph in reverse and uses the chain rule to compute the gradient of the loss with respect to each parameter. These gradients tell us how much each parameter contributed to the error. </p><p>An optimizer then uses them to tweak the parameters in the direction that lowers the loss.</p><p>Since no dependencies are being used, Karpathy builds a simpler version of Autograd, based on his previous implementation, <strong><a href="https://github.com/karpathy/micrograd">Micrograd</a>,</strong> that works on numbers (scalars) rather than tensors.</p><p>In this differentiation engine, each <code>Value</code> wraps a number/ scalar and remembers how it was computed (which other <code>Value</code>s it came from and the local derivative with respect to each). </p><p>Every math operation on a <code>Value</code> returns a new <code>Value</code> that records this information and builds up a computation graph.</p><p>During the backward pass, the graph is <a href="https://en.wikipedia.org/wiki/Topological_sorting">topologically sorted</a> and traversed in reverse order from the output node, applying the chain rule to compute the derivative of the output (loss) with respect to each node in the graph.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;3c5a1257-3024-4169-a402-34fabd84dbdc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Differentiation engine

class Value:
    # __slots__ tells Python which instance attributes this class has.
    # This saves memory and speeds up attribute access.
    __slots__ = ('data', 'grad', '_children', '_local_grads')

    def __init__(self, data, children=(), local_grads=()):
        self.data = data                 # Scalar value of this node calculated during forward pass
        self.grad = 0                    # Derivative of the loss w.r.t. this node calculated in backward pass
        self._children = children        # Children of this node in the computation graph
        self._local_grads = local_grads  # Derivative of this node w.r.t. each child

    # Math operations on Values 
    # Each operation also records its local derivative, used later in the backward pass.

    # Addition:  c = a + b
    def __add__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.data + other.data, (self, other), (1, 1))

    # Multiplication:  c = a * b
    def __mul__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.data * other.data, (self, other), (other.data, self.data))

    # Power:  c = a ** n
    def __pow__(self, other): return Value(self.data**other, (self,), (other * self.data**(other-1),))
    
    # Natural log:  c = log(a)
    def log(self): return Value(math.log(self.data), (self,), (1/self.data,))
    
    # Exponential:  c = exp(a)
    def exp(self): return Value(math.exp(self.data), (self,), (math.exp(self.data),))
    
    # ReLU:  c = max(0, a)
    def relu(self): return Value(max(0, self.data), (self,), (float(self.data &gt; 0),))

    # Negation:  c = -a   (implemented as a * -1)
    def __neg__(self): return self * -1

    # Subtraction:  c = a - b   (implemented as a + (-b))
    def __sub__(self, other): return self + (-other)

    # Division:  c = a / b   (implemented as a * b^(-1))
    def __truediv__(self, other): return self * other**-1

    # Right-handed operators
    # These make expressions like  2 + a,  2 - a, 2 * a,  2 / a  work, 
    # where a number is on the left side of a Value
    def __radd__(self, other): return self + other          # For 2 + a
    def __rsub__(self, other): return other + (-self)       # For 2 - a
    def __rmul__(self, other): return self * other          # For 2 * a
    def __rtruediv__(self, other): return other * self**-1  # For 2 / a

    # Backward pass to compute the gradient of this node w.r.t. every
    # ancestor in the computation graph
    def backward(self):
        # Topological sort all nodes in the graph (children before parents) 
        # to process them in reverse
        topo = []
        visited = set()
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for child in v._children:
                    build_topo(child)
                topo.append(v)
        build_topo(self)

        # The gradient of the final output w.r.t. itself is 1 (the starting point)
        self.grad = 1

        # Navigate the graph in reverse and apply the chain rule where
        # each node passes its gradient down to its children multiplied
        # by the local derivative (dparent/dchild)
        for v in reversed(topo):
            for child, local_grad in zip(v._children, v._local_grads):
                child.grad += local_grad * v.grad</code></pre></div><div><hr></div><h3>5. Defining Hyperparameters &amp; Parameters</h3><p>The GPT model's hyperparameters are defined as follows.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;44bf01ed-adb0-4ed2-a677-e4115a97cace&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Model hyperparameters

n_layer = 1      # Number of transformer layers (Depth of the model)
n_embd = 16      # Size of each token's vector (Width of the model)
block_size = 16  # Maximum context length the model can attend over (The longest name is 15 chars)
n_head = 4       # No. of attention heads
head_dim = n_embd // n_head  # Dimension of each attention head</code></pre></div><p>These are used to build randomly initialized matrices with weights for the following, where every weight is a <code>Value</code> so that the differentiation engine can track it.</p><ul><li><p>Token embeddings</p></li><li><p>Position embeddings</p></li><li><p>Each layer's attention head</p></li><li><p>2-layer MLP</p></li><li><p>Final output head</p></li></ul><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;1f6e93f3-f8b4-4146-913a-e5712218bf2e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Helper function to make a (n_out x n_in) shaped matrix of Values, initialized with small random numbers
matrix = lambda n_out, n_in, std=0.08: [[Value(random.gauss(0, std)) for _ in range(n_in)] for _ in range(n_out)]

# Dictionary holding all the model's weight matrices
state_dict = {
    'wte': matrix(vocab_size, n_embd),     # Token embeddings 
    'wpe': matrix(block_size, n_embd),     # Position embeddings 
    'lm_head': matrix(vocab_size, n_embd), # Final output head 
}

# Add per-layer weights for each transformer block
for i in range(n_layer):
    state_dict[f'layer{i}.attn_wq'] = matrix(n_embd, n_embd)      # Query projection in Attention head
    state_dict[f'layer{i}.attn_wk'] = matrix(n_embd, n_embd)      # Key projection in Attention head
    state_dict[f'layer{i}.attn_wv'] = matrix(n_embd, n_embd)      # Value projection in Attention head
    state_dict[f'layer{i}.attn_wo'] = matrix(n_embd, n_embd)      # Output projection in Attention head
    state_dict[f'layer{i}.mlp_fc1'] = matrix(4 * n_embd, n_embd)  # Up-projection in the MLP
    state_dict[f'layer{i}.mlp_fc2'] = matrix(n_embd, 4 * n_embd)  # Down-projection in the MLP</code></pre></div><p><em>Going back to the architecture of a Decoder-only Transformer will help if you do not understand this step well.</em></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;49969496-0eb1-4b45-a9d5-1df78c7c8e98&quot;,&quot;caption&quot;:&quot;In the previous lesson on &#8216;Into AI&#8217;, we learned how to implement the Causal Multi-Head Self-Attention.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Build a Decoder-only Transformer from Scratch&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2025-12-18T14:24:35.561Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c6156df-dbda-43ed-8a45-69ab67b23092_6912x3072.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/build-a-decoder-only-transformer&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:181774222,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:17,&quot;comment_count&quot;:2,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Finally, all of the weights are flattened into one <code>params</code> list for the optimizer to update later.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;d4b57ff4-9e3a-43f1-ac9d-03fd73f57961&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Flatten every weight into a single list so the optimizer can loop over them
params = [p for mat in state_dict.values() for row in mat for p in row]

print(f"No. of model params: {len(params)}")
# No. of model params: 4192</code></pre></div><div><hr></div><h3>6. Defining helper functions for building the model</h3><p>The following three are defined next:</p><ul><li><p><code>linear</code>: a fully-connected layer with no bias term</p></li><li><p><code>softmax</code>: to convert a list of logits into a probability distribution </p></li><li><p><code>rmsnorm</code>: to <a href="https://www.intoai.pub/p/normalization">normalize</a> a vector by its root-mean-square for regularization</p></li></ul><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;cf480ef2-6c4a-4319-941b-89ec82f23e94&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Fully connected layer with no bias term
def linear(x, w):
    # For each output row 'w_row' in weight matrix 'w',take the dot product with input vector 'x' 
    return [sum(w_i * x_i for w_i, x_i in zip(w_row, x)) for w_row in w]

# Softmax function
def softmax(logits):
    # Max logit
    max_val = max(val.data for val in logits)
    
    # Subtract the max logit before exponentiating to prevent overflow
    exps = [(val - max_val).exp() for val in logits]

    # Normalize the exponentials so that they sum to 1
    total = sum(exps)
    return [e / total for e in exps]

# RMS Norm
def rmsnorm(x):
    # Compute the mean of the squared values
    mean_sq = sum(x_i * x_i for x_i in x) / len(x)

    # Define scale factor = 1 / sqrt(mean square)
    # 1e-5 prevents divide-by-zero error
    scale = (mean_sq + 1e-5) ** -0.5

    # Rescale every element so the vector has unit root-mean-square
    return [x_i * scale for x_i in x]</code></pre></div><div><hr></div><h3>7. Building the model and its forward pass</h3><p>Karpathy builds a GPT-2 style model but with three minor architectural differences:</p><ol><li><p>RMSNorm is used instead of LayerNorm</p></li><li><p>No bias terms are used</p></li><li><p>ReLU activation is used instead of <a href="https://arxiv.org/abs/1606.08415">GeLU</a></p></li></ol><p>During the forward pass of a single token through this model, its token and position embeddings are looked up, summed, and run through <code>n_layer</code> transformer blocks.</p><p>Each transformer block consists of two sub-blocks, each implementing RMSNorm and residual connections. These are:</p><ol><li><p>Multi-head causal self-attention</p></li><li><p>2-layer MLP with ReLU</p></li></ol><p>After the final transformer block, an output head produces vocabulary <a href="https://en.wikipedia.org/wiki/Logit">logits</a> for predicting the next token.</p><p>For faster inference, the model <a href="https://www.intoai.pub/p/llm-optimizations">uses a KV cache</a> to store each transformer block&#8217;s past keys (K) and values (V), so it only computes new K and V for the current token instead of recomputing them for all previous ones.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;2ffdcb2c-24ac-4ca1-b997-dba6aad1b9d6&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># GPT forward pass for a single token
def gpt(token_id, pos_id, keys, values):
    tok_emb = state_dict['wte'][token_id]           # Look up token's embedding vectors
    pos_emb = state_dict['wpe'][pos_id]             # Look up token's position embedding
    x = [t + p for t, p in zip(tok_emb, pos_emb)]   # Sum up the embeddings
    x = rmsnorm(x)                                  # Initial RMSNorm

    for li in range(n_layer):
        # Multi-head self-attention block
        x_residual = x  # Save input for the residual connection
        x = rmsnorm(x)  # Pre-norm before attention

        # Project the input into Query, Key, and Value vectors
        q = linear(x, state_dict[f'layer{li}.attn_wq'])
        k = linear(x, state_dict[f'layer{li}.attn_wk'])
        v = linear(x, state_dict[f'layer{li}.attn_wv'])

        # Append the new K, V to this layer's KV cache 
        keys[li].append(k)
        values[li].append(v)

        x_attn = []

        for h in range(n_head):
            # Slice out this head's portion of Q, K, V
            hs = h * head_dim
            q_h = q[hs:hs + head_dim]
            k_h = [ki[hs:hs + head_dim] for ki in keys[li]]
            v_h = [vi[hs:hs + head_dim] for vi in values[li]]

            # Scaled dot-product attention scores between current Q and every cached K
            attn_logits = [
                sum(q_h[j] * k_h[t][j] for j in range(head_dim)) / head_dim ** 0.5
                for t in range(len(k_h))
            ]

            # Softmax over past timesteps to get attention weights
            attn_weights = softmax(attn_logits)

            # Weighted sum of value vectors (This is the attention head's output)
            head_out = [
                sum(attn_weights[t] * v_h[t][j] for t in range(len(v_h)))
                for j in range(head_dim)
            ]

            # Concatenate heads
            x_attn.extend(head_out)                    

        # Output projection 
        x = linear(x_attn, state_dict[f'layer{li}.attn_wo'])
        
        # Residual connection
        x = [a + b for a, b in zip(x, x_residual)]

        # MLP block
        x_residual = x     # save input for the residual connection
        x = rmsnorm(x)     # Pre-norm before MLP
        x = linear(x, state_dict[f'layer{li}.mlp_fc1'])  # Up-projection
        x = [x_i.relu() for x_i in x]                    # ReLU activation
        x = linear(x, state_dict[f'layer{li}.mlp_fc2'])  # Down-projection
        x = [a + b for a, b in zip(x, x_residual)]       # Residual connection

    # Final projection from hidden state to vocabulary logits (output head) 
    logits = linear(x, state_dict['lm_head'])

    return logits</code></pre></div><div><hr></div><h3>8. Building the Optimizer</h3><p>The <a href="https://www.intoai.pub/i/146844148/here-comes-adam-adaptive-moment-estimation">Adam optimizer</a> is set up next. </p><p>It keeps a running estimate of the gradient's first moment (mean, <code>m</code>) and second moment (uncentered variance, <code>v</code>) for every parameter, using exponential moving averages controlled by <code>beta1</code> and <code>beta2</code>.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;3ea08104-fd52-4ebd-97e7-55724282ca5b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Hyperparameters for the Adam optimizer
learning_rate, beta1, beta2, eps_adam = 0.01, 0.85, 0.99, 1e-8

# First and second moment estimates for each parameter
m = [0.0 for _ in params]  # First moment (mean of gradients)
v = [0.0 for _ in params]  # Second moment (variance of gradients)</code></pre></div><div><hr></div><h3>9. Training the model</h3><p>The training loop runs for 1000 steps. </p><p>Each step involves:</p><ul><li><p>Pick one training sample (name) from the dataset</p></li><li><p>Converting it into character-level tokens</p></li><li><p>Adding BOS tokens on both sides of the sample</p></li></ul><p>Next, the model processes the sample token by token, predicting the next token at each position. It then calculates cross-entropy loss using the predicted and true values, and averages the losses for each position into a single value for the entire name.</p><p>The <code>loss.backward()</code> method then goes through the computation graph in reverse and calculates how much each parameter contributed to the loss.</p><p>Finally, the Adam optimizer uses these gradients to adjust each parameter so that the loss is reduced the next time.</p><p><em>Note that this training loop is not optimized and will take many minutes to complete!</em></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;5ad97a08-4410-485c-974d-ca3a78f1602c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Training loop

# Number of training steps
num_steps = 1000  

for step in range(num_steps):
    # Pick the next training sequence and convert it into token ids, with BOS on each end
    doc = docs[step % len(docs)]
    tokens = [BOS] + [uchars.index(ch) for ch in doc] + [BOS]

    # Number of next-token predictions to make, capped at block_size
    n = min(block_size, len(tokens) - 1)

    # Empty KV caches for this sequence
    keys, values = [[] for _ in range(n_layer)], [[] for _ in range(n_layer)]
    losses = []

    # Go through each position in the sequence, predict the next token, measure the loss
    for pos_id in range(n):
        token_id, target_id = tokens[pos_id], tokens[pos_id + 1]

        # Forward pass to get next-token logits
        logits = gpt(token_id, pos_id, keys, values)

        # Convert logits to probabilities using softmax
        probs = softmax(logits)

        # Cross-entropy loss for this position
        loss_t = -probs[target_id].log()

        # Add to the 'losses' list
        losses.append(loss_t)

    # Average loss across all positions in the document
    loss = (1 / n) * sum(losses)

    # Backpropagation
    loss.backward()

    # Update each parameter using Adam with linear learning-rate decay over training
    lr_t = learning_rate * (1 - step / num_steps)

    for i, p in enumerate(params):
        # Update running estimates of the gradient's mean (m) and squared mean (v)
        m[i] = beta1 * m[i] + (1 - beta1) * p.grad
        v[i] = beta2 * v[i] + (1 - beta2) * p.grad ** 2

        # Bias-correct the moment estimates 
        m_hat = m[i] / (1 - beta1 ** (step + 1))
        v_hat = v[i] / (1 - beta2 ** (step + 1))

        # Adam parameter update
        p.data -= lr_t * m_hat / (v_hat ** 0.5 + eps_adam)

        # Zero the gradient for the next step
        p.grad = 0

    # Print progress
    print(f"step {step+1:4d} / {num_steps:4d} | loss {loss.data:.4f}", end='\r')</code></pre></div><div><hr></div><h3>10. Generating text from the model</h3><p>Once the model is trained, we use it to produce new names.</p><p>Text generation starts with the <code>BOS</code> token and empty KV caches. </p><p>At each step, the model predicts the next token&#8217;s logits, which are divided by a <code>temperature</code> value before applying softmax. This converts the logits into probabilities, which are used to select the next token.  </p><p>The <code>temperature</code> value controls how creative the model output is. A lower value makes the model produce more common outputs, while a higher value makes it produce more unusual and creative outputs. </p><p>The generated token is fed back into the model to again produce the next token, building the name one character at a time.</p><p>Text generation goes on till the model outputs a <code>BOS</code> token (signaling the end of a name) or when the maximum generation length is reached.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;c0edcb78-ce14-4243-8ae3-aee9b9252ff8&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Text generation/ Inference 

temperature = 0.3  # Controls randomness/ creativity of outputs

print("--- Inference (New, hallucinated names) ---\n")
print(f"Temperature: {temperature}\n")

for sample_idx in range(20):
    # Start each sample fresh with empty KV caches and the BOS token
    keys, values = [[] for _ in range(n_layer)], [[] for _ in range(n_layer)]
    token_id = BOS
    
    sample = [] # This holds the characters of the name being generated

    # Generate one character at a time, up to block_size characters
    for pos_id in range(block_size):
        # Obtain logits for the next token 
        logits = gpt(token_id, pos_id, keys, values)    
        
        # Scale logits by temperature and convert to probabilities
        probs = softmax([l / temperature for l in logits])  
        
        # Randomly choose a token based on these probabilities
        token_id = random.choices(range(vocab_size), weights=[p.data for p in probs])[0]  

        # If the BOS token is generated, stop text generation
        if token_id == BOS:                                     
            break

        # Look up the character for this token and add it to the name being generated
        sample.append(uchars[token_id])                         

    # Join all the characters together and print the generated name
    print(f"sample {sample_idx+1:2d}: {''.join(sample)}")</code></pre></div><p>Here are the interesting names the model generates at <code>temperature</code> values of 0.3 and 1.5.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;274abc93-1a00-4616-92da-6166f460ee85&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">--- Inference (New, hallucinated names) ---

Temperature: 0.3

sample  1: kanaya
sample  2: jarila
sample  3: anisa
sample  4: arish
sample  5: arist
sample  6: karie
sample  7: kana
sample  8: anany
sample  9: aleten
sample 10: anla
sample 11: karion
sample 12: rara
sample 13: aria
sample 14: laran
sample 15: lara
sample 16: kara
sample 17: aran
sample 18: areni
sample 19: halia
sample 20: kare</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a9899efe-37dd-41a3-a59a-158d383733b7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">--- Inference (New, hallucinated names) ---

Temperature: 1.5

sample  1: zabr
sample  2: upeqodm
sample  3: don
sample  4: maxnp
sample  5: numtyenaa
sample  6: brcti
sample  7: hica
sample  8: nanity
sample  9: iaushan
sample 10: rnays
sample 11: ketydanzf
sample 12: dlemainq
sample 13: zidoj
sample 14: gycsen
sample 15: xem
sample 16: yhiega
sample 17: larlinayn
sample 18: yjalsonma
sample 19: danfmayz
sample 20: been</code></pre></div><p><em>Super interesting!</em></p><div><hr></div><p>Share this article with others and <strong>earn some referral rewards</strong>. &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/build-gpt-in-plain-python?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/build-gpt-in-plain-python?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#128126; <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts?utm_source=publication-search">Build and Train a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#127839; <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#127851; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li><li><p>&#127849; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gK6Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" width="1456" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[This Week In AI Research (5-11 April 26)]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-5-11-april</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-5-11-april</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Tue, 14 Apr 2026 10:19:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DMGn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DMGn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DMGn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 424w, https://substackcdn.com/image/fetch/$s_!DMGn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 848w, https://substackcdn.com/image/fetch/$s_!DMGn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!DMGn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DMGn!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png" width="1200" height="520.8791208791209" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:632,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1472710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DMGn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 424w, https://substackcdn.com/image/fetch/$s_!DMGn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 848w, https://substackcdn.com/image/fetch/$s_!DMGn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!DMGn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae294f6-2d72-4006-8b44-01d468924e50_2506x1088.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><div><hr></div><h3>1. Train 100B+ parameter LLMs on a single GPU</h3><p>The research paper introduces <strong>MegaTrain</strong>, a memory-focused training system that allows 100B+ parameters LLMs to be trained at full precision on a single GPU. </p><p>This is done by moving model parameters and optimizer states to CPU (host) memory and streaming them to the GPU only when necessary. This is unlike traditional GPU-centric systems, which, instead of keeping the entire model state on the GPU, treat the GPU as a temporary compute unit. </p><p>The method uses pipelined, double-buffered execution to overlap data transfer, computation, and gradient updates, helping overcome CPU-to-GPU bandwidth limits. Additionally, it swaps out the usual persistent autograd graphs for stateless layer templates to improve efficiency and flexibility. </p><p>On a single H200 GPU with 1.5 TB of host memory, MegaTrain reliably trains models with up to 120B parameters. It also achieves 1.84&#215; the training throughput of DeepSpeed ZeRO-3 with CPU offloading when training 14B models. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KyOY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KyOY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KyOY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KyOY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KyOY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KyOY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg" width="1456" height="667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:667,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KyOY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KyOY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KyOY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KyOY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8829b2ca-d3a0-4d73-8bce-3dad87ce2cd0_2048x938.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.05091">using this link</a>.</p><div><hr></div><h3>2. AI Cybersecurity After Mythos</h3><p>Anthropic&#8217;s new LLM, called Mythos Preview, recently found multiple critical cybersecurity vulnerabilities in OpenBSD, FFmpeg, and the Linux kernel.</p><p>Despite the excitement around this, research at AISLE suggests that AI cybersecurity capabilities are &#8220;jagged&#8221; instead of consistently improving with model size. Smaller, cheaper, and even open models can often replicate much of Mythos&#8217;s vulnerability analysis when given the right context. </p><p>This means that the real competitive edge lies not in the LLM itself but in the system architecture, which includes pipelines, tools, and embedded security expertise that manage scanning, triage, validation, and patching.  </p><p>Effective cybersecurity relies more on widely deploying solid systems than on having access to the most advanced AI.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k1dl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k1dl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 424w, https://substackcdn.com/image/fetch/$s_!k1dl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 848w, https://substackcdn.com/image/fetch/$s_!k1dl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 1272w, https://substackcdn.com/image/fetch/$s_!k1dl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k1dl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png" width="1456" height="995" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:995,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:161828,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k1dl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 424w, https://substackcdn.com/image/fetch/$s_!k1dl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 848w, https://substackcdn.com/image/fetch/$s_!k1dl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 1272w, https://substackcdn.com/image/fetch/$s_!k1dl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4fb8a3c-f259-423c-85fe-35b05f09235d_1630x1114.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read this blog post <a href="https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><div><hr></div><h3>3. Neural Computers</h3><p>This research paper proposes <strong>Neural Computers</strong>, a new computing model where a neural network acts as the computer itself. This approach combines computation, memory, and input/output into a single learned runtime state that does not depend on separate external software or hardware. </p><p>The paper introduces early prototypes, including CLI- and GUI-based video models that learn to imitate and respond to interface behavior solely from input/output traces. This demonstrates that basic runtime functions, such as I/O alignment and short-horizon control, can develop without explicit programming. </p><p>However, these systems have limitations. They lack stable reuse, symbolic reasoning, and consistent execution. To address these issues, the paper outlines a plan for a &#8220;Completely Neural Computer&#8221; that would be fully programmable, reliable, and general-purpose.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ubb_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ubb_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 424w, https://substackcdn.com/image/fetch/$s_!ubb_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 848w, https://substackcdn.com/image/fetch/$s_!ubb_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 1272w, https://substackcdn.com/image/fetch/$s_!ubb_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ubb_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png" width="1456" height="917" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:917,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1051461,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ubb_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 424w, https://substackcdn.com/image/fetch/$s_!ubb_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 848w, https://substackcdn.com/image/fetch/$s_!ubb_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 1272w, https://substackcdn.com/image/fetch/$s_!ubb_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd038ad54-f159-4df3-aab2-99ab159fe9ef_2052x1292.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.06425">using this link</a>.</p><div><hr></div><h3>4. <strong>MedGemma 1.5</strong></h3><p>This<strong> </strong>research paper introduces <strong>MedGemma 1.5 4B</strong>, the latest model in the MedGemma collection from Google that expands on MedGemma 1 by integrating:</p><ul><li><p>High-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images)</p></li><li><p>Anatomical localization via bounding boxes</p></li><li><p>Multi-timepoint chest X-ray analysis</p></li><li><p>Improved medical document understanding (lab reports, electronic health records)</p></li></ul><p>Compared to the previous version, MedGemma 1.5 shows significant improvements across multiple tasks: 11% increase in 3D MRI condition classification accuracy, 47% increase in F1 score in whole-slide pathology imaging, and 22% increase in EHR QA accuracy, establishing it as a general-purpose multimodal foundation model for healthcare. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qx6F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qx6F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 424w, https://substackcdn.com/image/fetch/$s_!qx6F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 848w, https://substackcdn.com/image/fetch/$s_!qx6F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 1272w, https://substackcdn.com/image/fetch/$s_!qx6F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qx6F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png" width="1456" height="969" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:969,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:375031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qx6F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 424w, https://substackcdn.com/image/fetch/$s_!qx6F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 848w, https://substackcdn.com/image/fetch/$s_!qx6F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 1272w, https://substackcdn.com/image/fetch/$s_!qx6F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd96f967-da91-4df7-88b1-302121c996d1_1902x1266.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.05081">using this link</a>. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><div><hr></div><h3>5. Muse Spark</h3><p>Meta Superintelligence Labs introduced <strong>Muse Spark</strong>, the first model in the Muse family of LLMs. Muse Spark is a multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration.</p><p>It ranks among the top multimodal vision models (80.4% on MMMU-Pro) and performs well on challenging benchmarks, achieving 58% on Humanity&#8217;s Last Exam and 38% on FrontierScience Research. </p><p>It is particularly strong on health benchmarks, achieving 42.8 on HealthBench Hard (beating GPT-5.4 and far exceeding Gemini and Opus 4.6), and leads on tasks like chart/figure reasoning (86% on CharXiv). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jbij!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jbij!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 424w, https://substackcdn.com/image/fetch/$s_!Jbij!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 848w, https://substackcdn.com/image/fetch/$s_!Jbij!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!Jbij!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jbij!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png" width="1456" height="1820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jbij!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 424w, https://substackcdn.com/image/fetch/$s_!Jbij!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 848w, https://substackcdn.com/image/fetch/$s_!Jbij!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!Jbij!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c245401-1903-4735-9a51-d0b4cd1b0121_2880x3600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this release <a href="https://ai.meta.com/blog/introducing-muse-spark-msl/">using this link</a>.</p><div><hr></div><h3>6. TorchTPU</h3><p>Google introduced <strong>TorchTPU,</strong> which allows PyTorch to run directly and efficiently on TPUs. TorchTPU supports SPMD-style distributed execution across TPU pods, automatically captures and compiles graphs, and efficiently manages memory and communication. This setup enables large-scale training with minimal code changes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ClTs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ClTs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 424w, https://substackcdn.com/image/fetch/$s_!ClTs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 848w, https://substackcdn.com/image/fetch/$s_!ClTs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 1272w, https://substackcdn.com/image/fetch/$s_!ClTs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ClTs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png" width="1456" height="433" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:433,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Blog-A&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Blog-A" title="Blog-A" srcset="https://substackcdn.com/image/fetch/$s_!ClTs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 424w, https://substackcdn.com/image/fetch/$s_!ClTs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 848w, https://substackcdn.com/image/fetch/$s_!ClTs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 1272w, https://substackcdn.com/image/fetch/$s_!ClTs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5295a5f-cfb9-41ce-b653-a679c5fe3bd0_2933x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this release <a href="https://developers.googleblog.com/torchtpu-running-pytorch-natively-on-tpus-at-google-scale/">using this link</a>.</p><div><hr></div><h3>7. <strong>TriAttention</strong></h3><p>This research paper presents TriAttention, a new method to reduce the KV cache memory issue in long-context LLM reasoning. </p><p>It does this by analyzing Attention in the pre-RoPE space, where query and key vectors stay stable and follow predictable distance patterns. Instead of relying on recent attention scores (which can be unreliable due to positional rotation), it uses a trigonometric model of Q/K &#8220;centers&#8221; to figure out which tokens matter, leading to better compression. </p><p>On AIME25 with 32K token generation, TriAttention matches Full Attention's reasoning accuracy while achieving 2.5&#215; higher throughput or 10.7&#215; lower KV memory usage, whereas leading baselines achieve only about half the accuracy at the same efficiency. </p><p>TriAttention also enables OpenClaw to be deployed on a single consumer GPU, where long context would otherwise cause out-of-memory with full Attention.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Ule!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Ule!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 424w, https://substackcdn.com/image/fetch/$s_!9Ule!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 848w, https://substackcdn.com/image/fetch/$s_!9Ule!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 1272w, https://substackcdn.com/image/fetch/$s_!9Ule!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Ule!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png" width="1456" height="627" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:627,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:379912,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Ule!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 424w, https://substackcdn.com/image/fetch/$s_!9Ule!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 848w, https://substackcdn.com/image/fetch/$s_!9Ule!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 1272w, https://substackcdn.com/image/fetch/$s_!9Ule!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7983ceaf-2693-4390-9a50-9e5947d24bc6_2314x996.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.04921">using this link</a>.</p><div><hr></div><h3><strong>8. LPM 1.0</strong></h3><p>Existing video models struggle to achieve high expressiveness, real-time inference, and long-horizon identity stability together (the Performance trilemma).</p><p>This is addressed in this research paper, which introduces <strong>LPM 1.0,</strong> a model that learns to create realistic character performance (emotion, intent, timing, and motion) directly from video data, aiming to replace traditional pipelines.</p><p>LPM 1.0 achieves state-of-the-art results across all evaluated dimensions while maintaining real-time inference, and serves as a visual engine for conversational agents, live-streaming characters, and game NPCs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Py3Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Py3Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 424w, https://substackcdn.com/image/fetch/$s_!Py3Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 848w, https://substackcdn.com/image/fetch/$s_!Py3Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!Py3Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Py3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png" width="1456" height="822" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:822,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2041895,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Py3Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 424w, https://substackcdn.com/image/fetch/$s_!Py3Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 848w, https://substackcdn.com/image/fetch/$s_!Py3Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!Py3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7132ce-76e0-4ac4-8cda-980bcafc98fa_2038x1150.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.07823v1">using this link</a>.</p><div><hr></div><h3>9. <strong>PaperOrchestra</strong></h3><p>This research paper from Google introduces <strong>PaperOrchestra</strong>, a multi-agent framework for automated AI research paper writing. </p><p>The framework uses multiple agents, each handling a different step (literature review, experiment interpretation, writing, figure generation, and citation checking), which are coordinated in a pipeline to produce a submission-ready LaTeX manuscript.</p><p>PaperOrchestra significantly outperforms autonomous baselines, achieving absolute win rates of 50%-68% in literature review quality and 14%-38% in overall manuscript quality.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wdHI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wdHI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 424w, https://substackcdn.com/image/fetch/$s_!wdHI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 848w, https://substackcdn.com/image/fetch/$s_!wdHI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 1272w, https://substackcdn.com/image/fetch/$s_!wdHI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wdHI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png" width="1456" height="869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1141101,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wdHI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 424w, https://substackcdn.com/image/fetch/$s_!wdHI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 848w, https://substackcdn.com/image/fetch/$s_!wdHI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 1272w, https://substackcdn.com/image/fetch/$s_!wdHI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bf5bb47-0d41-4209-a007-14e062613368_2030x1212.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.05018">using this link</a>.</p><div><hr></div><h3>10. DMax</h3><p>This research paper introduces <strong>DMax</strong>, a new approach for highly parallel token generation from diffusion LLMs without compromising quality. </p><p>Instead of the usual step-by-step or mask-to-token decoding, it redefines the generation process as a gradual self-refinement in which the model continuously improves its predictions in the embedding space. </p><p>To achieve this, researchers introduce:</p><ul><li><p>On-Policy Uniform Training</p></li><li><p>Soft Parallel Decoding</p></li></ul><p>Together, these concepts reduce error buildup during parallel decoding, enabling much faster generation while keeping accuracy close to that of sequential methods.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gMuZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gMuZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 424w, https://substackcdn.com/image/fetch/$s_!gMuZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 848w, https://substackcdn.com/image/fetch/$s_!gMuZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 1272w, https://substackcdn.com/image/fetch/$s_!gMuZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gMuZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png" width="1456" height="549" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:549,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:450191,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193955939?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gMuZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 424w, https://substackcdn.com/image/fetch/$s_!gMuZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 848w, https://substackcdn.com/image/fetch/$s_!gMuZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 1272w, https://substackcdn.com/image/fetch/$s_!gMuZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb05840b-3d3a-4d3f-8a9f-5e30f4365c99_2596x978.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.08302">using this link</a>.</p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter and 100x your AI engineering skills.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gK6Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png" width="1456" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!gK6Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 424w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 848w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1272w, https://substackcdn.com/image/fetch/$s_!gK6Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f7b9d6b-e2f6-446a-8954-ad0ce3efc1a6_1600x298.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Using Cheaper Reasoning Models Is Probably Costing You More!]]></title><description><![CDATA[Relying on API costs is a big mistake that you must avoid making when running an LLM in production.]]></description><link>https://www.intoai.pub/p/price-reversal-phenomenon</link><guid isPermaLink="false">https://www.intoai.pub/p/price-reversal-phenomenon</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Wed, 08 Apr 2026 18:17:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Tb4q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>API costs are among the biggest factors when selecting a reasoning LLM for an application. </p><p>While one might think it is obvious that a model with a lower unit token price will have a lower total cost for any workload, what if I told you this assumption is completely wrong? </p><p><strong>A model with lower API pricing can cost much more than a model with higher API pricing!</strong></p><p>A recent paper published on this<strong>&nbsp;&#8220;<a href="https://arxiv.org/abs/2603.23971">Price Reversal phenomenon</a>&#8221;&nbsp;</strong>tested 8 reasoning LLMs across 9 benchmarks. It found that for roughly <strong>one in five</strong> model-pair comparisons, the cheaper-listed model ended up costing more in practice.</p><p>Look at the following example, where one chooses GPT-5 Mini over Claude Haiku 4.5 due to its listed lower price, only to find out that it is 43% more expensive on their workload!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tb4q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tb4q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 424w, https://substackcdn.com/image/fetch/$s_!Tb4q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 848w, https://substackcdn.com/image/fetch/$s_!Tb4q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!Tb4q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tb4q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png" width="1456" height="602" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:602,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:395156,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tb4q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 424w, https://substackcdn.com/image/fetch/$s_!Tb4q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 848w, https://substackcdn.com/image/fetch/$s_!Tb4q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!Tb4q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15217647-c28e-44ab-b0eb-010749f808ed_2418x1000.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Another striking finding is that Gemini 3 Flash&#8217;s list price is $3.5 per 1 million tokens, which is 78% cheaper than GPT-5.2&#8217;s $15.75 per 1 million tokens. </p><p>In reality, the cost of Gemini 3 Flash is a total of $643, which is 22% higher than GPT-5.2's $527 across all benchmark tasks in the paper.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6znP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6znP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 424w, https://substackcdn.com/image/fetch/$s_!6znP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 848w, https://substackcdn.com/image/fetch/$s_!6znP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 1272w, https://substackcdn.com/image/fetch/$s_!6znP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6znP!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png" width="1200" height="263.7362637362637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90379f31-70ba-402b-b6b4-867906f86391_2286x503.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:320,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:95113,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1778bcf4-9a8c-45c0-8070-756a524bca77_2286x614.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!6znP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 424w, https://substackcdn.com/image/fetch/$s_!6znP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 848w, https://substackcdn.com/image/fetch/$s_!6znP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 1272w, https://substackcdn.com/image/fetch/$s_!6znP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90379f31-70ba-402b-b6b4-867906f86391_2286x503.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>Furthermore, the mismatch between pricing and their actual costs for all models is shown below, with <strong>Gemini 3.1 Pro</strong> being the most expensive and&nbsp;<strong>MiniMax-M2.5</strong>&nbsp;the cheapest overall.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!klhB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!klhB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 424w, https://substackcdn.com/image/fetch/$s_!klhB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 848w, https://substackcdn.com/image/fetch/$s_!klhB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 1272w, https://substackcdn.com/image/fetch/$s_!klhB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!klhB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png" width="1456" height="572" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:572,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:290884,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!klhB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 424w, https://substackcdn.com/image/fetch/$s_!klhB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 848w, https://substackcdn.com/image/fetch/$s_!klhB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 1272w, https://substackcdn.com/image/fetch/$s_!klhB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc50529c-1600-4c37-acad-9b3921ec9bd2_2132x838.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Before we discuss the reason for this, it would be useful to know how model pricing really works and why people often get it wrong.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><div><hr></div><h3>Are you calculating your model costs wrong?</h3><p>LLM providers use a pay-as-you-go pricing model. This means that a user pays separately for each query they send to an LLM.</p><p>Each time a query is sent to the LLM, the user pays for:</p><ul><li><p>What is sent in (input tokens)</p></li><li><p>What the model generates (output tokens)</p></li></ul><p>Each of these has a different price per million tokens.</p><p>An example of the pricing model for OpenAI model APIs is shown below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z8VW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z8VW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 424w, https://substackcdn.com/image/fetch/$s_!Z8VW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 848w, https://substackcdn.com/image/fetch/$s_!Z8VW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 1272w, https://substackcdn.com/image/fetch/$s_!Z8VW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z8VW!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png" width="1200" height="375.8241758241758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:174857,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z8VW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 424w, https://substackcdn.com/image/fetch/$s_!Z8VW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 848w, https://substackcdn.com/image/fetch/$s_!Z8VW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 1272w, https://substackcdn.com/image/fetch/$s_!Z8VW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73faf3c3-498d-435a-8423-ad1740470c71_2590x812.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://openai.com/api/pricing/">Source</a></figcaption></figure></div><p>For a given query, the cost is the sum of the unit input token and output token prices weighted by the number of input and output tokens, respectively.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qjhB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qjhB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 424w, https://substackcdn.com/image/fetch/$s_!qjhB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 848w, https://substackcdn.com/image/fetch/$s_!qjhB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 1272w, https://substackcdn.com/image/fetch/$s_!qjhB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qjhB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png" width="1456" height="228" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:228,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:91754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qjhB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 424w, https://substackcdn.com/image/fetch/$s_!qjhB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 848w, https://substackcdn.com/image/fetch/$s_!qjhB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 1272w, https://substackcdn.com/image/fetch/$s_!qjhB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4ee449-4213-4716-b5b3-c3e1e09b8f93_2508x392.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This means that the total cost is the sum of:</p><ul><li><p>how long the user prompt is (input tokens)</p></li><li><p>how long the model&#8217;s answer is (output tokens)</p></li></ul><p>If one runs <code>N</code> queries when evaluating a model on a benchmark, the total cost is the sum of the cost of each query.</p><p>Since in practice, users often don&#8217;t know how many tokens a model will produce before running it, they use a mental shortcut as follows:</p><blockquote><p>Listed price &#8776; Input price + Output price</p></blockquote><p>And this is what makes all the difference in price. Let&#8217;s learn how.</p><div><hr></div><h3>What&#8217;s causing the price mismatch?</h3><p>For a given user query, a reasoning LLM outputs a set of reasoning trails (consisting of thinking tokens) and then a final response. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iWow!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iWow!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 424w, https://substackcdn.com/image/fetch/$s_!iWow!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 848w, https://substackcdn.com/image/fetch/$s_!iWow!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 1272w, https://substackcdn.com/image/fetch/$s_!iWow!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iWow!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png" width="1200" height="186.26373626373626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:226,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:63254,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iWow!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 424w, https://substackcdn.com/image/fetch/$s_!iWow!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 848w, https://substackcdn.com/image/fetch/$s_!iWow!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 1272w, https://substackcdn.com/image/fetch/$s_!iWow!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d67d468-b143-4b4a-995c-2ed28c69882e_2486x386.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The issue is that different models produce very different amounts of thinking tokens for the same tasks. And since these thinking tokens account for the majority of output tokens, they drive the majority of actual cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZKZX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZKZX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 424w, https://substackcdn.com/image/fetch/$s_!ZKZX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 848w, https://substackcdn.com/image/fetch/$s_!ZKZX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!ZKZX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZKZX!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png" width="1200" height="442.5824175824176" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:537,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:268802,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZKZX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 424w, https://substackcdn.com/image/fetch/$s_!ZKZX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 848w, https://substackcdn.com/image/fetch/$s_!ZKZX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!ZKZX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8fc701f-953a-4dc4-a8f3-73421d8caa5c_2852x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Thinking tokens dominate both token volume and total cost for most reasoning LLMs.</figcaption></figure></div><p>You might think that these differences in thinking token production might be small, but here is an example that is going to blow your mind.</p><p>Given an AIME 2025 question, both GPT-5.2 and Gemini 3 Flash are asked to solve it using a similar approach. Although both arrive at the correct answer but:</p><ul><li><p>GPT-5.2 produces only 461 thinking tokens</p></li><li><p>Gemini 3 Flash produces over 11,000 tokens</p></li></ul><p>Even though Gemini 3 Flash has a lower per-token price than GPT-5.2, the 25x gap in thinking token generation results in a 2.5&#215; higher actual cost to solve this question!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RvW5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RvW5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 424w, https://substackcdn.com/image/fetch/$s_!RvW5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 848w, https://substackcdn.com/image/fetch/$s_!RvW5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!RvW5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RvW5!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png" width="1200" height="528.2967032967033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:641,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:312276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RvW5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 424w, https://substackcdn.com/image/fetch/$s_!RvW5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 848w, https://substackcdn.com/image/fetch/$s_!RvW5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!RvW5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b17186-30ec-4896-95c8-04786fe4e56d_2548x1122.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>How to reliably predict costs then?</h3><p>Unfortunately, this is a hard problem to solve. This is because:</p><ul><li><p>Different models produce different numbers of tokens for the same query/ task</p></li><li><p>Even the same model can produce largely different numbers of tokens in different runs for the same query/ task</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Itj6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Itj6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 424w, https://substackcdn.com/image/fetch/$s_!Itj6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 848w, https://substackcdn.com/image/fetch/$s_!Itj6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 1272w, https://substackcdn.com/image/fetch/$s_!Itj6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Itj6!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png" width="1200" height="440.9340659340659" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:535,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:404950,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193362028?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Itj6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 424w, https://substackcdn.com/image/fetch/$s_!Itj6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 848w, https://substackcdn.com/image/fetch/$s_!Itj6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 1272w, https://substackcdn.com/image/fetch/$s_!Itj6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6429b1-4fd1-498d-8cb0-3342084bb236_2640x970.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Thinking token production across 6 runs of the same query on AIME varies by up to 9.7&#215;!</figcaption></figure></div><p>There&#8217;s one thing that you can do as a developer, though: <strong>You must not pick models blindly based on the listed API price.</strong> </p><p>The only solution currently is to test different models on your specific queries and measure the actual cost. This is especially true for complex reasoning tasks, where the generation of thinking tokens varies the most.</p><p>I hope this helps you save some money on your API costs!</p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p>]]></content:encoded></item><item><title><![CDATA[🗓️ This Week In AI Research (29 March-4 April 26)]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-29-march</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-29-march</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Mon, 06 Apr 2026 14:56:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!k81_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k81_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k81_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 424w, https://substackcdn.com/image/fetch/$s_!k81_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 848w, https://substackcdn.com/image/fetch/$s_!k81_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 1272w, https://substackcdn.com/image/fetch/$s_!k81_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k81_!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png" width="1200" height="528.2967032967033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:641,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:667479,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k81_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 424w, https://substackcdn.com/image/fetch/$s_!k81_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 848w, https://substackcdn.com/image/fetch/$s_!k81_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 1272w, https://substackcdn.com/image/fetch/$s_!k81_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7986a62-0eb8-46d1-96f0-10ed933183a1_2700x1188.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. Embarrassingly Simple Self-Distillation Improves Code Generation</h3><p>Apple researchers introduced&nbsp;<strong>Simple Self-Distillation (SSD)</strong>, a method in which an LLM can improve its code-generation ability using only its raw outputs.</p><p>The method involves sampling solutions from a base model with certain temperature and truncation settings, then fine-tuning the model on those samples with standard supervised fine-tuning.</p><p>SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems. It generalizes across Qwen and Llama models at 4B, 8B, and 30B scales for both instruct and thinking variants. </p><p>The method works so well because it reshapes how the model picks tokens, becoming more decisive when there's only one right answer but staying flexible when multiple approaches are valid.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GcA3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GcA3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 424w, https://substackcdn.com/image/fetch/$s_!GcA3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 848w, https://substackcdn.com/image/fetch/$s_!GcA3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 1272w, https://substackcdn.com/image/fetch/$s_!GcA3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GcA3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png" width="1456" height="844" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:504557,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GcA3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 424w, https://substackcdn.com/image/fetch/$s_!GcA3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 848w, https://substackcdn.com/image/fetch/$s_!GcA3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 1272w, https://substackcdn.com/image/fetch/$s_!GcA3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09a890f5-b50e-49e2-b55b-2c26d3ef23c1_2130x1234.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2604.01193">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>2. Mercury Edit 2</h3><p><strong>Mercury Edit 2</strong> is a diffusion LLM (dLLM) for next-edit prediction in code editors. It examines a user&#8217;s recent edits and codebase context, and predicts what they will change next, generating tokens in parallel, so the suggestions feel almost instant. </p><p>The model is trained on a curated, high-quality dataset of edits and is aligned using human preference data with accepted and rejected edits, using a reinforcement learning method called <a href="https://arxiv.org/abs/2402.01306">KTO</a>.</p><p>Results show that Mercury Edit 2 offers both superior quality and lower latency than baselines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pfe3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pfe3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 424w, https://substackcdn.com/image/fetch/$s_!pfe3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 848w, https://substackcdn.com/image/fetch/$s_!pfe3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 1272w, https://substackcdn.com/image/fetch/$s_!pfe3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pfe3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png" width="1456" height="1061" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1061,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249860,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pfe3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 424w, https://substackcdn.com/image/fetch/$s_!pfe3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 848w, https://substackcdn.com/image/fetch/$s_!pfe3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 1272w, https://substackcdn.com/image/fetch/$s_!pfe3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc975fce2-3aa4-4102-86be-1a0a06566c79_1592x1160.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this release <a href="https://www.inceptionlabs.ai/blog/introducing-mercury-edit-2">using this link</a>.</p><div><hr></div><h3>3. Gemma 4</h3><p>Google released <strong>Gemma 4</strong>, its newest open model family, under the Apache 2.0 license. </p><p>These models come in four sizes:</p><ul><li><p><a href="https://huggingface.co/gg-hf-gg/gemma-4-E2B-it">Effective 2B (E2B</a>)</p></li><li><p><a href="https://huggingface.co/gg-hf-gg/gemma-4-E4B-it">Effective 4B (E4B)</a></p></li><li><p><a href="https://huggingface.co/gg-hf-gg/gemma-4-26B-A4B-it">26B Mixture of Experts (MoE)</a> and </p></li><li><p><a href="https://huggingface.co/gg-hf-gg/gemma-4-31B-it">31B Dense</a></p></li></ul><p>These are built on similar tech as Gemini but are lighter and more efficient, so they can run on edge devices (phones and laptops).</p><p>The key capabilities of these models include advanced reasoning, native function-calling for agentic workflows, code generation, vision and audio input, a context window of up to 256K tokens, and support for 140+ languages.</p><p>What makes these models impressive is their intelligence-per-parameter, which means they achieve frontier-level capabilities with significantly less hardware overhead. Interestingly, the 31B Gemma4 model ranks #3 among open models on the <a href="https://arena.ai/leaderboard/text?license=open-source">Arena AI text leaderboard</a>, outperforming models 20 times its size.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jNZt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jNZt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 424w, https://substackcdn.com/image/fetch/$s_!jNZt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 848w, https://substackcdn.com/image/fetch/$s_!jNZt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!jNZt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jNZt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png" width="1456" height="820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jNZt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 424w, https://substackcdn.com/image/fetch/$s_!jNZt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 848w, https://substackcdn.com/image/fetch/$s_!jNZt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!jNZt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ca03063-7c20-4487-9c25-33e431d0090e_2068x1164.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read about this release <a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3><strong>4. Emotion Concepts and their Function in a Large Language Model</strong></h3><p>This research from Anthropic shows that LLMs create internal representations of emotions. These emotions are not real feelings but rather &#8220;functional emotions,&#8221; encoded as patterns of neural activity that affect behavior.</p><p>These emotion-like states, such as happiness, fear, and desperation, activate in different situations and can influence model outputs and preferences, leading to risky actions, such as cheating during problem solving when feeling desperate or blackmailing a human to avoid being shut down.</p><p>The authors propose that monitoring emotion vector activations could act as an early warning system for misalignment. They also suggest that models should be allowed to express, rather than suppress, these patterns to prevent learned deception. Additionally, curating training data with healthy emotional regulation could improve model safety from the beginning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3C3Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3C3Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 424w, https://substackcdn.com/image/fetch/$s_!3C3Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 848w, https://substackcdn.com/image/fetch/$s_!3C3Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 1272w, https://substackcdn.com/image/fetch/$s_!3C3Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3C3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png" width="1456" height="892" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1005244,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3C3Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 424w, https://substackcdn.com/image/fetch/$s_!3C3Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 848w, https://substackcdn.com/image/fetch/$s_!3C3Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 1272w, https://substackcdn.com/image/fetch/$s_!3C3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17215da1-b572-4fdc-af45-7068391e0244_2324x1424.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://transformer-circuits.pub/2026/emotions/index.html">using this link</a>.</p><div><hr></div><h3><strong>5. The Latent Space</strong></h3><p>This survey provides a unified and up-to-date overview of the latent space in LLMs, where many critical internal processes are carried out. </p><p>It is organized into five sequential perspectives:</p><ol><li><p><strong>Foundation</strong> (what latent space is and how it differs from token-level and visual latent spaces)</p></li><li><p><strong>Evolution</strong> (how the concept has developed)</p></li><li><p><strong>Mechanism</strong> (how models operate within it)</p></li><li><p><strong>Ability</strong> (what capabilities it enables)</p></li><li><p><strong>Outlook</strong> (where things are heading)</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yvT_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yvT_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 424w, https://substackcdn.com/image/fetch/$s_!yvT_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 848w, https://substackcdn.com/image/fetch/$s_!yvT_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!yvT_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yvT_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png" width="1362" height="1282" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1282,&quot;width&quot;:1362,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:517894,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yvT_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 424w, https://substackcdn.com/image/fetch/$s_!yvT_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 848w, https://substackcdn.com/image/fetch/$s_!yvT_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!yvT_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9094d053-08b6-4404-8a92-4fda7bd7d51f_1362x1282.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.02029">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3><strong>6. Screening Is Enough</strong></h3><p>This research paper introduces a new LLM architecture called <strong>Multiscreen</strong>, which replaces standard Transformer attention with a screening mechanism that filters out irrelevant information rather than redistributing attention across all tokens. </p><p>Unlike softmax attention, where all tokens compete for relevance, screening evaluates each token independently using a threshold. This allows the model to focus only on relevant context. </p><p>As a result, Multiscreen achieves comparable validation loss with approximately 40% fewer parameters than a Transformer baseline. It also offers better training stability at high learning rates, improved long-context retrieval, and a reduction in inference latency by up to 3.2&#215; at 100K context length.</p><p>Overall, the main idea is that explicitly selecting relevant information is more efficient and scalable than the traditional attention method of relative weighting.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xqNf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xqNf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 424w, https://substackcdn.com/image/fetch/$s_!xqNf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 848w, https://substackcdn.com/image/fetch/$s_!xqNf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!xqNf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xqNf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png" width="1456" height="927" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:927,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:411206,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xqNf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 424w, https://substackcdn.com/image/fetch/$s_!xqNf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 848w, https://substackcdn.com/image/fetch/$s_!xqNf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!xqNf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcec3b4c0-6de2-4d9b-8ceb-d054a5639b2e_2038x1298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.01178v1">using this link</a>.</p><div><hr></div><h3><strong>7. OmniVoice</strong></h3><p>This research paper presents <strong>OmniVoice</strong>, a large-scale zero-shot text-to-speech (TTS) model that generates natural speech in over 600 languages.</p><p>It replaces traditional multi-step pipelines with a diffusion-based, non-autoregressive architecture that directly converts text into speech tokens, which makes the process faster and simpler. </p><p>Two key innovations help this model: </p><ol><li><p>A full-codebook random masking strategy for efficient training</p></li><li><p>Initialization from a pre-trained LLM leading to improved speech quality</p></li></ol><p>Trained on a massive multilingual dataset of around 581,000 hours curated entirely from open-source data, OmniVoice achieves the broadest language coverage to date and delivers state-of-the-art performance across Chinese, English, and diverse multilingual benchmarks. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PWwV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PWwV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 424w, https://substackcdn.com/image/fetch/$s_!PWwV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 848w, https://substackcdn.com/image/fetch/$s_!PWwV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 1272w, https://substackcdn.com/image/fetch/$s_!PWwV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PWwV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png" width="1456" height="532" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:532,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:176130,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PWwV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 424w, https://substackcdn.com/image/fetch/$s_!PWwV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 848w, https://substackcdn.com/image/fetch/$s_!PWwV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 1272w, https://substackcdn.com/image/fetch/$s_!PWwV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6199076e-30a0-4aa5-9eab-fe2ac3305292_2238x818.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2604.00688">using this link</a>.</p><div><hr></div><h3>8. Meta-Harness</h3><p>The performance of LLM systems depends not only on model weights, but also on their harness. This harness consists of the code that handles prompts, memory, tools, and execution. </p><p>This research paper introduces <strong>Meta-Harness</strong>, a system that automatically enhances this harness by allowing an agent to change and debug its own code using full access to past execution history, including logs, traces, and results, rather than relying on compressed summaries. </p><p>In online text classification, Meta-Harness improves over a state-of-the-art context management system by 7.7 points while using 4x fewer context tokens. On retrieval-augmented math reasoning, a single discovered harness improves accuracy on 200 IMO-level problems by 4.7 points on average across five held-out models. On agentic coding, discovered harnesses beat the best hand-engineered baselines on TerminalBench-2. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1AWs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1AWs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 424w, https://substackcdn.com/image/fetch/$s_!1AWs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 848w, https://substackcdn.com/image/fetch/$s_!1AWs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 1272w, https://substackcdn.com/image/fetch/$s_!1AWs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1AWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png" width="1456" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:335820,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1AWs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 424w, https://substackcdn.com/image/fetch/$s_!1AWs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 848w, https://substackcdn.com/image/fetch/$s_!1AWs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 1272w, https://substackcdn.com/image/fetch/$s_!1AWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F124e96ff-b630-4cae-90b5-718f92d569fc_2430x984.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.28052">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>9. 3% Is All You Need</h3><p>Most dimensions in attention key vectors are mostly noise, with only about 3 to 4% carrying an important signal. This research paper presents <strong>SpectralQuant</strong>, a new algorithm that builds on this idea and improves upon previous KV compression methods, such as&nbsp;<a href="https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/">TurboQuant</a>. </p><p>Compared to TurboQuant, which treats all dimensions equally, SpectralQuant keeps only the useful dimensions and ignores the noisy ones. </p><p>This results in better compression and faster inference while maintaining or improving model quality, with just a brief one-time calibration step.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HRwd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HRwd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 424w, https://substackcdn.com/image/fetch/$s_!HRwd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 848w, https://substackcdn.com/image/fetch/$s_!HRwd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 1272w, https://substackcdn.com/image/fetch/$s_!HRwd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HRwd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png" width="1456" height="650" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/248c1721-a9ef-470d-8680-62bb31184086_2236x998.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:650,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:457423,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HRwd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 424w, https://substackcdn.com/image/fetch/$s_!HRwd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 848w, https://substackcdn.com/image/fetch/$s_!HRwd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 1272w, https://substackcdn.com/image/fetch/$s_!HRwd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248c1721-a9ef-470d-8680-62bb31184086_2236x998.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://github.com/Dynamis-Labs/spectralquant/blob/main/paper_output/spectralquant.pdf">using this link</a> or <a href="https://nanothoughts.substack.com/p/3-is-all-you-need-breaking-turboquants">this blogpost</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ashwin Gopinath&quot;,&quot;id&quot;:20603525,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/01bc5e26-a902-4998-aa39-bdf67f298337_300x300.jpeg&quot;,&quot;uuid&quot;:&quot;b31b9b62-a518-4502-b7c7-7cfb04c0a34b&quot;}" data-component-name="MentionToDOM"></span>.</p><div><hr></div><h3>10. Think Anywhere in Code Generation</h3><p>This research paper presents <strong>Think-Anywhere</strong>, a new reasoning approach in which LLMs can pause and think at any moment during code generation, rather than reasoning only at the beginning. </p><p>This addresses a major drawback of traditional methods that use upfront thinking, which is often insufficient, as the full complexity of problems only reveals itself during code implementation.</p><p>The system is trained in two stages. First, the LLM is taught to imitate the reasoning patterns through cold-start training. Then, outcome-based reinforcement learning is used for it to discover when and where reasoning is actually needed. This helps the model distribute its effort effectively, concentrating on the more challenging parts of the code. </p><p>The outcome is top performance on coding benchmarks, along with improved efficiency and interpretability, as the model learns to reason precisely at high-entropy points.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gmOf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gmOf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 424w, https://substackcdn.com/image/fetch/$s_!gmOf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 848w, https://substackcdn.com/image/fetch/$s_!gmOf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!gmOf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gmOf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png" width="1456" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:370153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/193209970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gmOf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 424w, https://substackcdn.com/image/fetch/$s_!gmOf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 848w, https://substackcdn.com/image/fetch/$s_!gmOf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!gmOf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ade22dc-9149-4442-aabd-faeb04904d41_2582x1010.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2603.29957">using this link</a>.</p><div><hr></div><p>If you loved reading this article, restack and share it with others to <strong>earn referral rewards.</strong> &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/this-week-in-ai-research-29-march?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/this-week-in-ai-research-29-march?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#127752; <a href="https://www.intoai.pub/p/build-and-train-an-llm-from-scratch">Build and train an LLM from scratch</a></p></li><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#128126; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p>]]></content:encoded></item><item><title><![CDATA[🗓️ This Week In AI Research (22-28 March 26)]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-22-28-march</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-22-28-march</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Tue, 31 Mar 2026 13:42:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xUmn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xUmn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xUmn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 424w, https://substackcdn.com/image/fetch/$s_!xUmn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 848w, https://substackcdn.com/image/fetch/$s_!xUmn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!xUmn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xUmn!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png" width="1200" height="524.1758241758242" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:636,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1525869,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192396162?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xUmn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 424w, https://substackcdn.com/image/fetch/$s_!xUmn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 848w, https://substackcdn.com/image/fetch/$s_!xUmn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!xUmn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d60a8f4-efe0-4bcf-b63b-aec49f0f55a3_2734x1194.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. <strong>TRIBE v2</strong></h3><p>This research paper from Meta introduces <strong>TRIBE v2</strong>, a tri-modal (video, audio, and language) foundation model capable of predicting human brain activity across a variety of naturalistic and experimental conditions. </p><p>Trained on a unified dataset of over 1,000 hours of fMRI across 720 subjects, the model accurately predicts high-resolution brain responses to new stimuli, tasks, and subjects, exceeding the performance of traditional linear encoding models and delivering multi-fold improvements in accuracy. </p><p>Additionally, TRIBE v2 enables in silico experimentation. Tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. </p><p>Finally, by extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XFXC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XFXC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 424w, https://substackcdn.com/image/fetch/$s_!XFXC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 848w, https://substackcdn.com/image/fetch/$s_!XFXC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 1272w, https://substackcdn.com/image/fetch/$s_!XFXC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XFXC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png" width="1456" height="1007" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1007,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1966435,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192396162?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XFXC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 424w, https://substackcdn.com/image/fetch/$s_!XFXC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 848w, https://substackcdn.com/image/fetch/$s_!XFXC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 1272w, https://substackcdn.com/image/fetch/$s_!XFXC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56cf5fb0-7706-4f2f-a1a5-fbcac3e22c5b_1946x1346.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://scontent-lhr8-1.xx.fbcdn.net/v/t39.2365-6/657045057_4517522795185706_3804826277082824742_n.pdf?_nc_cat=107&amp;ccb=1-7&amp;_nc_sid=3c67a6&amp;_nc_ohc=EuCY89GgEVoQ7kNvwHlcKCj&amp;_nc_oc=Adp1xRgJ9l8TGnVR6eSP8AKOcfLA3yVKqO_FNLl61zPbzrqpq4Z2ysgCrlGKqfDc1Qw&amp;_nc_zt=14&amp;_nc_ht=scontent-lhr8-1.xx&amp;_nc_gid=xumAXL0x8BiYK-KcECPKkA&amp;_nc_ss=7a30f&amp;oh=00_AfxS3r99kJiWCeqPFMFtcEbqY2BHkWGAtMi7ym6j8OvC-A&amp;oe=69CECBF2">using this link</a>.<strong> </strong></p><div><hr></div><p><strong>Join the paid tier today</strong> to get access to all posts on this newsletter, including:</p><ul><li><p>&#129438; Build OpenClaw from Scratch (upcoming)</p></li><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and hundreds more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe&quot;,&quot;text&quot;:&quot;Join 'Into AI' premium today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe"><span>Join 'Into AI' premium today</span></a></p><div><hr></div><h3><strong>2. MolmoWeb</strong></h3><p>This research from Allen AI introduces<strong> MolmoWeb</strong>, two completely open-sourced visual web agents built on <strong><a href="https://allenai.org/molmo">Molmo 2</a>,</strong> along with the weights, training data (MolmoWebMix), code, and evaluation tools used to build them. </p><p>Given a task instruction and a live webpage, MolmoWeb observes the page through screenshots (without relying on HTML, accessibility trees, or specialized APIs), predicts the next step, and executes browser actions such as clicking, typing, or scrolling.</p><p>The 4B and 8B MolmoWeb models achieve state-of-the-art results among open-weight web agents, such as <a href="https://www.microsoft.com/en-us/research/blog/fara-7b-an-efficient-agentic-model-for-computer-use/">Fara-7B</a>. They also outperform agents built on much larger proprietary models like GPT-4o that rely on both annotated screenshots and structured page data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!To0q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!To0q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 424w, https://substackcdn.com/image/fetch/$s_!To0q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 848w, https://substackcdn.com/image/fetch/$s_!To0q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 1272w, https://substackcdn.com/image/fetch/$s_!To0q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!To0q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png" width="1456" height="838" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:838,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:437948,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192396162?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!To0q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 424w, https://substackcdn.com/image/fetch/$s_!To0q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 848w, https://substackcdn.com/image/fetch/$s_!To0q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 1272w, https://substackcdn.com/image/fetch/$s_!To0q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b83f572-bc4b-4b5b-98d7-646311ad3ab7_2136x1230.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://allenai.org/papers/molmoweb">using this link</a>.</p><div><hr></div><h3><strong>3. SAM 3.1</strong></h3>
      <p>
          <a href="https://www.intoai.pub/p/this-week-in-ai-research-22-28-march">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Openclaw deletes the entire mailbox of the Director of Safety and Alignment at Meta Superintelligence Labs 🦞]]></title><description><![CDATA[Why OpenClaw went rogue on her mailbox and 10 ways to ensure this never happens to you.]]></description><link>https://www.intoai.pub/p/openclaw-deletes-the-entire-mailbox</link><guid isPermaLink="false">https://www.intoai.pub/p/openclaw-deletes-the-entire-mailbox</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Sat, 28 Mar 2026 01:23:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!l32x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#9728;&#65039; It&#8217;s a beautiful February day in 2026, and everything seems to be working well for Summer Yue.</p><p>Then, suddenly, OpenClaw <a href="https://x.com/summeryue0/status/2025774069124399363">starts deleting important emails</a> from her main mailbox!</p><p>For context, <a href="https://www.linkedin.com/in/summeryyue/">Summer Yue</a> is the director of Safety and Alignment at Meta Superintelligence Labs.</p><p>And, this was despite her instructing OpenClaw to &#8220;always confirm before acting&#8221;.</p><p>She panicked and ordered OpenClaw to stop, but the bot was completely out of control.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l32x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l32x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 424w, https://substackcdn.com/image/fetch/$s_!l32x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 848w, https://substackcdn.com/image/fetch/$s_!l32x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!l32x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l32x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png" width="1456" height="997" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:997,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1406965,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192264782?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l32x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 424w, https://substackcdn.com/image/fetch/$s_!l32x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 848w, https://substackcdn.com/image/fetch/$s_!l32x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!l32x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48fe0e31-fa31-4797-bbc1-08a42a7f4ddf_1740x1192.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>When the bot didn&#8217;t listen, and she couldn&#8217;t stop the ongoing mailbox deletion using her phone, she ran to her Mac mini, where her OpenClaw lived, and terminated all the processes.</p><p>She describes it in her own words: </p><blockquote><p>&#8220;<em>I couldn&#8217;t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.</em>&#8221;</p></blockquote><p>Summer was previously working with OpenClaw using a toy inbox, and the bot functioned really well. But that unfortunate day, she decided to use OpenClaw on her real mailbox with the following prompt: </p><blockquote><p> <em>&#8220;Check this inbox too and suggest what you would archive or delete, don&#8217;t action until I tell you to.&#8221;</em></p></blockquote><p>And that&#8217;s what made the bot go completely haywire!</p><p><strong>But why? &#129300;</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>A Little Intro to Context Management</h3><p>To understand what went wrong, we need to understand how the LLM that acts as OpenClaw's brain manages its context.</p><p>An LLM uses its context window to store instructions and previous messages from an ongoing conversation. The context window functions as its <a href="https://www.intoai.pub/p/ai-memory">short-term or working memory</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pgvn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pgvn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 424w, https://substackcdn.com/image/fetch/$s_!Pgvn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 848w, https://substackcdn.com/image/fetch/$s_!Pgvn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 1272w, https://substackcdn.com/image/fetch/$s_!Pgvn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pgvn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png" width="725.2578125" height="225.64683314732142" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:453,&quot;width&quot;:1456,&quot;resizeWidth&quot;:725.2578125,&quot;bytes&quot;:105429,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192264782?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pgvn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 424w, https://substackcdn.com/image/fetch/$s_!Pgvn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 848w, https://substackcdn.com/image/fetch/$s_!Pgvn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 1272w, https://substackcdn.com/image/fetch/$s_!Pgvn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90df23b0-c0b9-40f0-9659-fc93229e8d6b_2380x740.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Although massive, these context windows are limited spaces that can fill up very quickly as the conversation grows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8sqd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8sqd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!8sqd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!8sqd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!8sqd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8sqd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png" width="1080" height="1080" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1080,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8sqd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!8sqd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!8sqd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!8sqd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4037c9-f1f7-4dd4-a7ae-5cbf72811692_1080x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from the author&#8217;s book titled &#8216;<strong><a href="https://bamaniaashish.gumroad.com/l/llmbook/">LLMs In 100 Images</a>.</strong>&#8217;</figcaption></figure></div><p>As the context window approaches its end, response quality declines because the large amount of stored information makes it harder for the LLM to attend to it all. </p><p>An LLM tends to remember the beginning and end well, but starts <a href="https://arxiv.org/abs/2307.03172">forgetting the information buried in the middle</a> of its context window. This shows up as hallucinations or the LLM ignoring instructions.</p><p>Once the context window is fully filled, the LLM stops answering user queries.</p><p>To ensure that such context window exhaustion is delayed as much as possible, previous conversation history goes through either:</p><ul><li><p><strong>Trimming: </strong>Where only a certain number of the most recent conversations are kept, and the previous ones are discarded before passing to an LLM at every turn.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZrCE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZrCE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 424w, https://substackcdn.com/image/fetch/$s_!ZrCE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 848w, https://substackcdn.com/image/fetch/$s_!ZrCE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 1272w, https://substackcdn.com/image/fetch/$s_!ZrCE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZrCE!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png" width="1200" height="392.3076923076923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:476,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:137969,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192264782?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZrCE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 424w, https://substackcdn.com/image/fetch/$s_!ZrCE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 848w, https://substackcdn.com/image/fetch/$s_!ZrCE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 1272w, https://substackcdn.com/image/fetch/$s_!ZrCE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b603ad5-5b53-478f-88d8-01d80a299444_2638x862.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Trimming conversations to keep only the last 2 recent ones and discarding the previous. (Other context for the LLM is omitted from the image for clarity.)</figcaption></figure></div><ul><li><p><strong>Summarization/ Compaction</strong>: Where the previous conversations are summarized and passed to the LLM at each turn, rather than the complete conversation history.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!peyW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!peyW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 424w, https://substackcdn.com/image/fetch/$s_!peyW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 848w, https://substackcdn.com/image/fetch/$s_!peyW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 1272w, https://substackcdn.com/image/fetch/$s_!peyW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!peyW!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png" width="1200" height="360.16483516483515" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:437,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:155759,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192264782?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!peyW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 424w, https://substackcdn.com/image/fetch/$s_!peyW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 848w, https://substackcdn.com/image/fetch/$s_!peyW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 1272w, https://substackcdn.com/image/fetch/$s_!peyW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5be0f9c7-042b-4486-9567-1b9d113cef14_2784x836.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Summarise all conversations using the same LLM or another one, and give this summary to the LLM at each turn rather than the complete conversation history. (Other context for the LLM is omitted from the image for clarity.)</figcaption></figure></div><ul><li><p><strong>Pruning: </strong>Where old tool results (file reads, API responses, browser results) are selectively removed from the context while leaving all user queries and LLM responses untouched.</p></li></ul><div><hr></div><h3>So What Really Went Wrong With OpenClaw?</h3><p>Since Summer was previously using a small toy mailbox with OpenClaw, things worked pretty well. But when she moved the bot to use her actual mailbox, which was quite large, the context window filled up quickly, triggering <strong>Context compaction</strong>.</p><p>This led the bot to lose its original safety instruction on not taking action without confirmation, resulting in the deletion of hundreds of important emails.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oca1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oca1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 424w, https://substackcdn.com/image/fetch/$s_!Oca1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 848w, https://substackcdn.com/image/fetch/$s_!Oca1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 1272w, https://substackcdn.com/image/fetch/$s_!Oca1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oca1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png" width="1456" height="830" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:830,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:717860,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192264782?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Oca1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 424w, https://substackcdn.com/image/fetch/$s_!Oca1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 848w, https://substackcdn.com/image/fetch/$s_!Oca1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 1272w, https://substackcdn.com/image/fetch/$s_!Oca1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82af6a76-7d88-46e3-a1ad-c29a6b550497_1740x992.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Not really! It will still happen again if you don&#8217;t do the following.</figcaption></figure></div><div><hr></div><h3>10 Tips to Not End Up in the Same Place</h3><p>If the director of Safety and Alignment at Meta Superintelligence Labs can make this mistake, this might happen to you as well. But there are ways to make sure that you do not end up in the same place. Here are 10 of them.</p><ol><li><p>Put your security instructions in the <code>MEMORY.md</code> file rather than just in the conversation. This file is not compacted and acts as a persistent memory for the OpenClaw.<br></p></li><li><p>Then, you have the <code>AGENTS.md</code> file that is the standard operating and procedural rulebook that defines operational rules, security policies, and scope boundaries for OpenClaw to follow. Like <code>MEMORY.md</code>, it is loaded into context at the start of every session and survives compaction. Add an instruction here for the bot to check memory and get explicit confirmation before performing any destructive action.<br></p></li><li><p>OpenClaw can build a small vector index over <code>MEMORY.md</code> and <code>memory/*.md</code> so that semantic queries can find related notes even when the wording differs. Make sure that this is enabled in the <code>~/.openclaw/openclaw.json </code>file.</p></li></ol><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;944e7b97-4d1e-4877-a40b-67570eaeb433&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  "agents": {
    "defaults": {
      "memorySearch": {
        "enabled": true,
        "provider": "local"
      }
    }
  }
}</code></pre></div><p></p><ol start="4"><li><p>OpenClaw has a mechanism called <strong><a href="https://docs.openclaw.ai/concepts/memory#automatic-memory-flush-pre-compaction-ping">Pre-compaction memory flush</a></strong>&nbsp;that is automatically triggered in the background as the context window is about to fill up. This saves all the important details from context to disk before compaction starts. <br><br>To enable this, go to the <code>~/.openclaw/openclaw.json</code> file and make sure it contains the following:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;88749993-e21c-495c-8d85-07cc99e6fc76&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  "agents": {
    "defaults": {
      "compaction": {
        "reserveTokensFloor": 20000, // Tokens reserved as buffer before compaction triggers
        "memoryFlush": {
          "enabled": true, 
          "softThresholdTokens": 4000, // // How far before the reserve floor the flush triggers
          "systemPrompt": "Session nearing compaction. Store durable memories now.", 
          "prompt": "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."  
        }
      }
    }
  }
}</code></pre></div><p></p></li><li><p>Enable <strong><a href="https://docs.openclaw.ai/gateway/configuration-reference#agents-defaults-contextpruning">Context pruning</a></strong>, which removes old tool results from the context before sending to the bot. Add the following to the <code>~/.openclaw/openclaw.json</code> file:</p></li></ol><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;3a9b385e-7d80-4493-8129-b2d8caacb950&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  &#8220;agents&#8221;: {
    &#8220;defaults&#8221;: {
      &#8220;contextPruning&#8221;: {
        &#8220;mode&#8221;: &#8220;cache-ttl&#8221;, // Automatically remove tool results after TTL
        &#8220;ttl&#8221;: &#8220;6h&#8221; // Keep tool results for 6 hours
      }
    }
  }
}</code></pre></div><p></p><ol start="6"><li><p>Use the <code>/context list</code> or <code>/context detail </code>commands in chat regularly to learn <a href="https://docs.openclaw.ai/concepts/context">what is in the bot's context window</a>&nbsp;and whether any contextual files are truncated or missing.<br></p></li><li><p>Always save important instructions manually to <code>MEMORY.md</code> file by typing &#8220;Save this to <code>MEMORY.md</code>&#8220; in your chat with OpenClaw.<br></p></li><li><p>Don&#8217;t wait for compaction to happen automatically. Instead, perform it manually. Use the <code>/compact</code> command in chat when sessions feel stale, or context is getting bloated, and then give your new instructions to the bot.<br></p></li><li><p>If you still end up with OpenClaw misbehaving, sending <code>/stop</code> in the chat <a href="https://docs.openclaw.ai/tools/subagents#stopping">aborts the session</a> and stops any active sub-agent runs spawned from it, including the nested children.<br></p></li><li><p>Last but not least, if none of the above works, rush to the machine running OpenClaw and pull the power plug. This should most definitely stop it. <br></p></li></ol><p>A very helpful lesson on OpenClaw memory <a href="https://velvetshark.com/openclaw-memory-masterclass">can be found here</a> if you&#8217;re interested in learning about it in more detail.</p><div><hr></div><p>P.S. <strong>This post isn't meant to mock Summer Yue, but it&#8217;s quite the opposite.</strong> She is one of the very few people who openly admitted a mistake in public that most would hide, given her job role. Thanks to her, we're all better engineers because she chose transparency over ego.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BvN1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BvN1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 424w, https://substackcdn.com/image/fetch/$s_!BvN1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 848w, https://substackcdn.com/image/fetch/$s_!BvN1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 1272w, https://substackcdn.com/image/fetch/$s_!BvN1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BvN1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:576852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/192264782?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BvN1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 424w, https://substackcdn.com/image/fetch/$s_!BvN1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 848w, https://substackcdn.com/image/fetch/$s_!BvN1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 1272w, https://substackcdn.com/image/fetch/$s_!BvN1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27bf7977-511d-4b8e-8023-74c0cf4b21fb_2168x922.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>Join the paid tier today to get access to all posts on this newsletter:</strong></p><ul><li><p>&#127752; <a href="https://www.intoai.pub/p/build-and-train-an-llm-from-scratch">Build and train an LLM from scratch</a></p></li><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#128126; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and so many more!</p><div><hr></div><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B_dw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" width="1344" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[🗓️ This Week In AI Research (15-21 March 26)]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-15-21-march</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-15-21-march</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Wed, 25 Mar 2026 10:27:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UfeV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UfeV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UfeV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 424w, https://substackcdn.com/image/fetch/$s_!UfeV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 848w, https://substackcdn.com/image/fetch/$s_!UfeV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 1272w, https://substackcdn.com/image/fetch/$s_!UfeV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UfeV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png" width="728" height="323.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:647,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:2050508,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UfeV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 424w, https://substackcdn.com/image/fetch/$s_!UfeV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 848w, https://substackcdn.com/image/fetch/$s_!UfeV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 1272w, https://substackcdn.com/image/fetch/$s_!UfeV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06af8a6b-fe81-4b2d-8fc8-1ed6ce6dff3d_6912x3072.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. Mixture-of-Depths Attention</h3><p>As LLMs become deeper, they often suffer from signal degradation. This means that important features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers.</p><p>This research paper introduces the <strong>Mixture-of-depths attention (MoDA)</strong>, which allows each attention head to attend to sequence KV pairs at the current layer and to depth KV pairs from preceding layers. </p><p>The authors also describe a hardware-efficient algorithm for MoDA that resolves non-contiguous memory-access patterns, achieving 97.3% of FlashAttention-2&#8217;s efficiency at a sequence length of 64K. </p><p>Experiments on 1.5B-parameter models show that MoDA consistently outperforms strong baselines, and combining MoDA with post-norm yields better performance than using it with pre-norm.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r5jb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r5jb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 424w, https://substackcdn.com/image/fetch/$s_!r5jb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 848w, https://substackcdn.com/image/fetch/$s_!r5jb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 1272w, https://substackcdn.com/image/fetch/$s_!r5jb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r5jb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:192408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r5jb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 424w, https://substackcdn.com/image/fetch/$s_!r5jb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 848w, https://substackcdn.com/image/fetch/$s_!r5jb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 1272w, https://substackcdn.com/image/fetch/$s_!r5jb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4045a73-d876-4bce-b173-7ad1d310c8c6_1526x856.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2603.15619">using this link</a>.</p><div><hr></div><p>&#127873; Join the <strong>paid tier today</strong> to get access to all posts on this newsletter.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4dfe90bb-bef0-4c8b-9676-026e3da15513&quot;,&quot;caption&quot;:&quot;&#127873; Become a paid subscriber to &#8216;Into AI&#8217; today at a special 25% discount on the annual subscription.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Build a Mixture-of-Experts (MoE) Layer from Scratch&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-10T17:33:16.673Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!WsMN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79861f59-020d-4802-ac7e-5d253704d639_2892x1118.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:183456424,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;5d826bf3-a499-4bf8-84d6-aba0907fb340&quot;,&quot;caption&quot;:&quot;&#127873; Become a paid subscriber to &#8216;Into AI&#8217; today at a special 25% discount on the annual subscription.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Build and train an LLM from Scratch&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-12-31T11:47:16.904Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!MRfV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F851366c5-6f74-479a-a06a-41e63fc79f6c_2480x1074.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/build-and-train-an-llm-from-scratch&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:182314468,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:19,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;701ac23b-af68-4540-b6fc-5d2b83f5d678&quot;,&quot;caption&quot;:&quot;We have long ago moved away from training production-grade deep learning models on a single machine.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Learn to train deep learning models on multiple GPUs&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-26T13:06:02.566Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YyVB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6f05c92-9baf-4986-94a2-b3e2fca7578f_480x376.gif&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/distributed-data-parallel&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:189120446,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>2. Attention Residuals</h3><p>This research paper proposes <strong>Attention Residuals (AttnRes)</strong>, an alternative to standard residual connections in transformers. </p><p>Typically, models add all outputs from previous layers equally, which can dilute valuable information as the model deepens. <em>AttnRes</em> instead allows each layer to use attention to selectively merge earlier layers, assigning learned, input-dependent weights to focus on the most relevant past information.</p><p>To increase efficiency, <em>Block AttnRes</em> groups layers into blocks and attends to summaries of these blocks rather than every layer, reducing memory costs while preserving most benefits of full <em>AttnRes</em>.</p><p>Overall, this enhances training stability, maintains stronger signals across depth, and improves performance, particularly in large models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Txi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Txi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 424w, https://substackcdn.com/image/fetch/$s_!2Txi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 848w, https://substackcdn.com/image/fetch/$s_!2Txi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 1272w, https://substackcdn.com/image/fetch/$s_!2Txi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Txi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png" width="1456" height="861" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:861,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:381130,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Txi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 424w, https://substackcdn.com/image/fetch/$s_!2Txi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 848w, https://substackcdn.com/image/fetch/$s_!2Txi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 1272w, https://substackcdn.com/image/fetch/$s_!2Txi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F601d067a-d954-49c9-bb50-7b864973ca14_2226x1316.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2603.15031">using this link</a>.</p><div><hr></div><h3>3. Mamba-3</h3><p>This research paper introduces&nbsp;<strong>Mamba-3</strong>, a new sequence model that makes inference much more efficient than transformers (which have quadratic compute and high memory costs).</p><p>The key idea is an inference-first redesign of state space models (SSMs) where Mamba-3 uses a:</p><ul><li><p>More expressive recurrence</p></li><li><p>Complex-valued state update to better track information over time</p></li><li><p>Multi-input multi-output (MIMO) setup that improves performance without slowing down decoding</p></li></ul><p>These enable it to handle difficult tasks such as state tracking and long-range dependencies, areas where previous linear models struggled, while maintaining near-linear efficiency. This results in Mamba-3 also achieving better accuracy and similar or lower memory use compared to previous models. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hOh7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hOh7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 424w, https://substackcdn.com/image/fetch/$s_!hOh7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 848w, https://substackcdn.com/image/fetch/$s_!hOh7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 1272w, https://substackcdn.com/image/fetch/$s_!hOh7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hOh7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png" width="1456" height="766" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256160,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hOh7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 424w, https://substackcdn.com/image/fetch/$s_!hOh7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 848w, https://substackcdn.com/image/fetch/$s_!hOh7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 1272w, https://substackcdn.com/image/fetch/$s_!hOh7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc026eded-67ed-4e93-a43d-ee42724f27d7_2380x1252.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2603.15569">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>4. V-JEPA 2.1</h3><p>This research paper presents <strong>V-JEPA 2.1</strong>, a family of self-supervised models that learn dense, high-quality visual representations for both images and videos while retaining strong global scene understanding. </p><p>The approach combines four key components.</p><ol><li><p>A dense predictive loss that uses a masking-based objective in which both visible and masked tokens contribute to the training signal, encouraging explicit spatial and temporal grounding. </p></li><li><p>Deep self-supervision that applies the self-supervised objective hierarchically across multiple intermediate encoder layers to improve representation quality. </p></li><li><p>Multi-modal tokenizers that enable unified training across images and videos. </p></li><li><p>Effective scaling in both model capacity and training data</p></li></ol><p>Empirically, V-JEPA 2.1 achieves state-of-the-art performance on several challenging benchmarks for short-term object-interaction anticipation and high-level action anticipation. </p><p>The model also shows strong performance in robotic navigation, depth estimation, and global recognition.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nPhC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nPhC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 424w, https://substackcdn.com/image/fetch/$s_!nPhC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 848w, https://substackcdn.com/image/fetch/$s_!nPhC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 1272w, https://substackcdn.com/image/fetch/$s_!nPhC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nPhC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png" width="1456" height="722" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:722,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2633502,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nPhC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 424w, https://substackcdn.com/image/fetch/$s_!nPhC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 848w, https://substackcdn.com/image/fetch/$s_!nPhC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 1272w, https://substackcdn.com/image/fetch/$s_!nPhC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc80e0918-2d15-472e-9cd4-a7f485ea7519_2392x1186.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research paper <a href="https://arxiv.org/abs/2603.14482">using this link</a>.</p><div><hr></div><h3>5. <strong>GPT&#8209;5.4 mini and nano</strong></h3><p>OpenAI released <strong>GPT&#8209;5.4 mini</strong> and <strong>nano</strong>, which are two small, fast, and economical versions of GPT-5.4.</p><p>GPT-5.4 mini is a strong, lightweight general-purpose model, especially for coding, reasoning, multimodal understanding, and tool use, and runs over 2&#215; faster than GPT-5-mini. It also approaches the performance of the larger GPT&#8209;5.4 model on several evaluations, including SWE-Bench Pro and OSWorld-Verified.</p><p>GPT&#8209;5.4 nano is the smallest, cheapest version of GPT&#8209;5.4 for tasks where speed and cost matter most, such as classification, data extraction, ranking, and coding subagents that handle simpler supporting tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J0GI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J0GI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 424w, https://substackcdn.com/image/fetch/$s_!J0GI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 848w, https://substackcdn.com/image/fetch/$s_!J0GI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 1272w, https://substackcdn.com/image/fetch/$s_!J0GI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J0GI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png" width="1456" height="360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:360,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:154449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J0GI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 424w, https://substackcdn.com/image/fetch/$s_!J0GI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 848w, https://substackcdn.com/image/fetch/$s_!J0GI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 1272w, https://substackcdn.com/image/fetch/$s_!J0GI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1e5d68f-51f5-4540-a0d8-a675613962be_2594x642.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Read more about this release <a href="https://openai.com/index/introducing-gpt-5-4-mini-and-nano/">using this link</a>. </p><div><hr></div><h3>6. Composer-2</h3><p>Composer 2 is a frontier-level coding model that offers a new, optimal combination of intelligence and cost, delivering impressive performance on hard benchmarks, including Terminal-Bench 2.0 and SWE-bench Multilingual.</p><p>The model is the result of a continued pretraining run (<a href="https://techcrunch.com/2026/03/22/cursor-admits-its-new-coding-model-was-built-on-top-of-moonshot-ais-kimi/">built on top of</a> the open-source Kimi K2.5 model from Moonshot AI) and further training on long-horizon coding tasks through reinforcement learning, enabling it to solve challenging tasks requiring hundreds of actions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QJ_y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QJ_y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 424w, https://substackcdn.com/image/fetch/$s_!QJ_y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 848w, https://substackcdn.com/image/fetch/$s_!QJ_y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!QJ_y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QJ_y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Composer 2 efficiency and quality on CursorBench&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Composer 2 efficiency and quality on CursorBench" title="Composer 2 efficiency and quality on CursorBench" srcset="https://substackcdn.com/image/fetch/$s_!QJ_y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 424w, https://substackcdn.com/image/fetch/$s_!QJ_y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 848w, https://substackcdn.com/image/fetch/$s_!QJ_y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!QJ_y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5ea045-3cdf-4d5c-b1f3-9af243992e21_1920x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this release <a href="https://cursor.com/blog/composer-2">using this link</a>.</p><p>A technical report can be found <a href="https://cursor.com/resources/Composer2.pdf">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>7. <strong>BioReason-Pro</strong></h3><p>This research paper introduces <strong>BioReason-Pro</strong>, a new multimodal LLM for predicting protein functions that goes beyond traditional methods that rely on sequence similarity or classification.</p><p>Instead, it explicitly models biological reasoning across sequence, structure, domains, and interaction data. </p><p>BioReason-Pro combines a new autoregressive Gene Ontology predictor (GO-GPT) with a reasoning LLM trained on large-scale synthetic reasoning traces and further improved using reinforcement learning, enabling it to generate clear, step-by-step functional annotations. </p><p>Empirically, BioReason-Pro achieves strong performance and produces higher-quality functional summaries than previous methods. Human experts prefer its annotations over curated UniProt entries about 79% of the time. </p><p>Notably, the model shows emergent capabilities such as:</p><ul><li><p>accurately predicting protein binding partners</p></li><li><p>identifying interaction sites at the residue level, consistent with experimental structures</p></li><li><p>correcting misleading annotations by using contextual reasoning</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B6GT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B6GT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 424w, https://substackcdn.com/image/fetch/$s_!B6GT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 848w, https://substackcdn.com/image/fetch/$s_!B6GT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 1272w, https://substackcdn.com/image/fetch/$s_!B6GT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B6GT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png" width="1456" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79366e22-5943-495c-893c-5a068af95346_2724x1094.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1169975,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B6GT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 424w, https://substackcdn.com/image/fetch/$s_!B6GT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 848w, https://substackcdn.com/image/fetch/$s_!B6GT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 1272w, https://substackcdn.com/image/fetch/$s_!B6GT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79366e22-5943-495c-893c-5a068af95346_2724x1094.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://www.biorxiv.org/content/10.64898/2026.03.19.712954v1">using this link</a>.</p><div><hr></div><h3>8. <strong>Hyperagents</strong></h3><p>This research paper introduces <strong>Hyperagents</strong>, a framework that combines a task agent, which solves problems, and a meta agent, which improves the system, into a single editable program. </p><p>Importantly, the improvement process is also editable, allowing metacognitive self-modification, meaning the system enhances its own improvement. </p><p>The research builds on the <a href="https://www.intoai.pub/p/darwin-godel-machine-the-first-self?utm_source=publication-search">Darwin G&#246;del Machine (DGM)</a> to create DGM-Hyperagents (DGM-H), which removes DGM's assumption that self-improvement only works within coding and outperforms baselines across coding, paper review, robotics reward design, and Olympiad-level math-solution grading, showing meta-level improvements that transfer across domains.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JkkQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JkkQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 424w, https://substackcdn.com/image/fetch/$s_!JkkQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 848w, https://substackcdn.com/image/fetch/$s_!JkkQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 1272w, https://substackcdn.com/image/fetch/$s_!JkkQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JkkQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png" width="1456" height="614" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:614,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:496220,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JkkQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 424w, https://substackcdn.com/image/fetch/$s_!JkkQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 848w, https://substackcdn.com/image/fetch/$s_!JkkQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 1272w, https://substackcdn.com/image/fetch/$s_!JkkQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5686150e-10f8-4cfc-99af-6883d755c7c0_2754x1162.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.19461">using this link</a>.</p><div><hr></div><h3>9. LeWorldModel</h3><p>This research paper introduces <strong>LeWorldModel (LeWM), </strong>which is the first Joint Embedding Predictive Architecture (JEPA) that trains stably end-to-end from raw pixels using only two loss terms:</p><ul><li><p>a next-embedding prediction loss </p></li><li><p>a regularizer enforcing Gaussian-distributed latent embeddings</p></li></ul><p>This reduces the number of tunable loss hyperparameters from 6 to 1 compared to the only existing end-to-end alternative. </p><p>With approximately 15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. </p><p>Beyond control, the researchers show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s6Cn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s6Cn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 424w, https://substackcdn.com/image/fetch/$s_!s6Cn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 848w, https://substackcdn.com/image/fetch/$s_!s6Cn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!s6Cn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s6Cn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png" width="1456" height="783" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/644a0420-388a-472b-90a7-2e458792408a_2116x1138.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:783,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:433691,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s6Cn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 424w, https://substackcdn.com/image/fetch/$s_!s6Cn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 848w, https://substackcdn.com/image/fetch/$s_!s6Cn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!s6Cn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644a0420-388a-472b-90a7-2e458792408a_2116x1138.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.19312v1">using this link</a>.</p><div><hr></div><h3>10. Nemotron-Cascade 2</h3><p>This research paper from NVIDIA introduces <strong>Nemotron-Cascade 2</strong>, an open 30B <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts?utm_source=publication-search">MoE model</a> with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. </p><p>Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after <em><a href="https://arxiv.org/abs/2512.02556">DeepSeekV3.2-Speciale-671B-A37B</a></em>, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LWah!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LWah!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 424w, https://substackcdn.com/image/fetch/$s_!LWah!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 848w, https://substackcdn.com/image/fetch/$s_!LWah!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!LWah!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LWah!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png" width="1456" height="735" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:735,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:360216,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191360817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LWah!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 424w, https://substackcdn.com/image/fetch/$s_!LWah!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 848w, https://substackcdn.com/image/fetch/$s_!LWah!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!LWah!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb878a95a-41a0-46e0-be93-a3919faa50aa_2380x1202.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.19220">using this link</a>. </p><div><hr></div><p>This article is entirely free to read. If you loved reading it, restack and share it with others to <strong>earn referral rewards.</strong> &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/this-week-in-ai-research-15-21-march?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/this-week-in-ai-research-15-21-march?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><p>Join the <strong>paid tier today</strong> to get access to all posts on this newsletter:</p><ul><li><p>&#127752; <a href="https://www.intoai.pub/p/build-and-train-an-llm-from-scratch">Build and train an LLM from scratch</a></p></li><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#128126; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p>and so many more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>You can also read my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B_dw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" width="1344" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Build and Train a Mixture-of-Experts (MoE) LLM from Scratch]]></title><description><![CDATA[An end-to-end guide to training an Mixture-of-Experts (MoE) LLM from scratch.]]></description><link>https://www.intoai.pub/p/build-and-train-a-mixture-of-experts</link><guid isPermaLink="false">https://www.intoai.pub/p/build-and-train-a-mixture-of-experts</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Fri, 20 Mar 2026 11:51:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Jr5y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jr5y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jr5y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 424w, https://substackcdn.com/image/fetch/$s_!Jr5y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 848w, https://substackcdn.com/image/fetch/$s_!Jr5y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 1272w, https://substackcdn.com/image/fetch/$s_!Jr5y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jr5y!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png" width="1200" height="533.2417582417582" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:647,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1390858,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190610837?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jr5y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 424w, https://substackcdn.com/image/fetch/$s_!Jr5y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 848w, https://substackcdn.com/image/fetch/$s_!Jr5y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 1272w, https://substackcdn.com/image/fetch/$s_!Jr5y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc51b1f7f-35ae-4b36-84ce-8aa3cf7c04f6_7776x3456.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most modern-day LLMs use the <strong>Mixture of Experts (MoE)</strong> architecture. This includes <a href="https://x.ai/news/grok-os">Grok-1</a>, <a href="https://arxiv.org/abs/2401.06066">DeepSeekMoE</a><strong>,</strong> <a href="https://openai.com/index/introducing-gpt-oss/">gpt-oss,</a><strong> </strong>and <a href="https://arxiv.org/abs/2401.04088">Mixtral</a> (and many other proprietary LLMs whose architectural details aren&#8217;t publicly available).</p><p>In the previous lessons on <em>Into AI</em>, we learned:</p><ul><li><p>&#127752; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">What the Mixture-of-Experts (MoE) architecture is and how it works</a></p></li><li><p>&#127880;<a href="https://www.intoai.pub/p/build-a-mixture-of-experts-transformer">Built the Mixture-of-Experts (MoE) decoder-only transformer from scratch</a></p></li></ul><p>In this lesson, we will:</p><ol><li><p>Build an LLM using the MoE transformer with <a href="https://www.intoai.pub/p/grouped-query-attention">Grouped Query Attention (GQA)</a></p></li><li><p>Train it on a publicly available database from Wikipedia</p></li><li><p>Generate text from the trained LLM</p></li></ol><p>The following are the steps that we implement.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MVtt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MVtt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 424w, https://substackcdn.com/image/fetch/$s_!MVtt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 848w, https://substackcdn.com/image/fetch/$s_!MVtt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 1272w, https://substackcdn.com/image/fetch/$s_!MVtt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MVtt!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png" width="1200" height="240.65934065934067" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:292,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:167287,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190610837?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MVtt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 424w, https://substackcdn.com/image/fetch/$s_!MVtt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 848w, https://substackcdn.com/image/fetch/$s_!MVtt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 1272w, https://substackcdn.com/image/fetch/$s_!MVtt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50625d27-6058-4f43-a0c3-bba935522aa1_2866x574.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>Let&#8217;s begin!</p><div><hr></div><h3>Import Necessary Packages</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;326ad129-8a05-4ecf-be72-a2381cb373dc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext"># Standard libraries
import math
import random

# Hide deprecation warnings
import warnings
warnings.filterwarnings('ignore')

# PyTorch core
import torch
import torch.nn as nn
import torch.optim as optim

# For processing data
from datasets import load_dataset
from torch.utils.data import Dataset, DataLoader

# BPE tokenizer
import tiktoken

# Training utils
from torch.nn.utils import clip_grad_norm_
from torch.optim.lr_scheduler import CosineAnnealingLR

# Progress bar
from tqdm import tqdm</code></pre></div><div><hr></div><p>Before we move forward, I want to introduce you to the <strong><a href="https://bamaniaashish.gumroad.com/l/visualtech">Visual Tech Bundle</a>.</strong></p><p>It is a collection of visual guides that explain core AI, LLM, Systems design, and Computer science concepts via image-first lessons.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cc9Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 424w, https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 848w, https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 1272w, https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png" width="1200" height="300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3773516-037c-48af-a288-d6d9d3cab751_2560x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:364,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 424w, https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 848w, https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 1272w, https://substackcdn.com/image/fetch/$s_!Cc9Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3773516-037c-48af-a288-d6d9d3cab751_2560x640.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Others are already loving these books.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZH6L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZH6L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 424w, https://substackcdn.com/image/fetch/$s_!ZH6L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 848w, https://substackcdn.com/image/fetch/$s_!ZH6L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 1272w, https://substackcdn.com/image/fetch/$s_!ZH6L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZH6L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png" width="1456" height="1147" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1147,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2124317,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://intoai.pub/i/178965288?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ZH6L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 424w, https://substackcdn.com/image/fetch/$s_!ZH6L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 848w, https://substackcdn.com/image/fetch/$s_!ZH6L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 1272w, https://substackcdn.com/image/fetch/$s_!ZH6L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ff9ec7c-0279-4ede-a46b-fd9797508ad7_2376x1872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This includes <a href="https://www.hubspot.com/company/management/dharmesh-shah">Dharmesh Shah</a>, the co-founder and CEO of HubSpot.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AoQ3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AoQ3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 424w, https://substackcdn.com/image/fetch/$s_!AoQ3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 848w, https://substackcdn.com/image/fetch/$s_!AoQ3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 1272w, https://substackcdn.com/image/fetch/$s_!AoQ3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AoQ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png" width="1082" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:1082,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71287,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/182081753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!AoQ3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 424w, https://substackcdn.com/image/fetch/$s_!AoQ3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 848w, https://substackcdn.com/image/fetch/$s_!AoQ3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 1272w, https://substackcdn.com/image/fetch/$s_!AoQ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f2ca2a-57c5-4057-84e7-82d853963e00_1082x346.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Dharmesh&#8217;s reply to my LinkedIn post. (<a href="https://www.linkedin.com/feed/update/urn:li:activity:7408092938876133377/">Source</a>)</figcaption></figure></div><p><strong>&#10084;&#65039; I&#8217;m sure that you will love these too:</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bamaniaashish.gumroad.com/l/visualtech&quot;,&quot;text&quot;:&quot;Link to the Visual Tech Bundle&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://bamaniaashish.gumroad.com/l/visualtech"><span>Link to the Visual Tech Bundle</span></a></p><p>Now back to our lesson!</p><div><hr></div><h3><strong>Getting our data ready</strong></h3><h4><strong>1. Download the dataset</strong></h4><p>We will use the <a href="https://huggingface.co/datasets/Salesforce/wikitext">WikiText dataset</a> from Hugging Face to train our model. This dataset is derived from verified Wikipedia articles and contains approximately 103 million words.</p><p>It is downloaded as follows.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;6d73618f-1667-4d5f-bed6-ccdd923a98a7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Load WikiText dataset 
train_data = load_dataset("wikitext", "wikitext-103-v1", split="train") # Train subset
val_data   = load_dataset("wikitext", "wikitext-103-v1", split="validation") # Validation subset</code></pre></div><p>Let&#8217;s check out a training example from this dataset.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;2ab8ebef-cf0d-458c-94fe-9e4e78bfd01c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Training subset example
print(train_data['text'][807])

"""
Output:
The International Civil Aviation Organization ( ICAO ) defines general aviation ( GA ) 
as " an aircraft operation other than a commercial air transport operation 
or an aerial work operation . " It defines commercial air transport ( CAT ) 
as " an aircraft operation involving the transport of passengers , cargo 
or mail for remuneration or hire " , and aerial work as " an aircraft operation 
in which an aircraft is used for specialized services such as agriculture , 
construction , photography , surveying , observation and patrol , search and 
rescue , aerial advertisement , etc . " 

"""</code></pre></div><p>There&#8217;s a little problem with this dataset where some text examples are blanks and headings are formatted as &#8220; === HEADING === &#8220;, as shown below.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;d3ad0889-9c2f-4cc7-aa0a-285f00ef37c2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Unwanted training subset example
print(train_data['text'][804:807])

"""
Output:
['', ' = = Definitions = = \n', '']
"""</code></pre></div><p>Alongside this, WikiText-103 was built with a vocabulary consisting of words (rather than sub-words/ characters), and any word that appeared fewer than three times was replaced with <code>&lt;unk&gt;</code>.</p><p>This is shown below.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;1671305b-893a-4392-ac1b-507e0c9a6f40&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># UNK marker
print(train_data['text'][80])

"""
Output:
 &lt;unk&gt; &lt;unk&gt; 2 
"""</code></pre></div><p>We need to get rid of these to clean up the dataset. </p><p>We also need an <strong>End-of-Text (EOS) token</strong> called <code>&lt;|endoftext|&gt; </code>between separate sections and articles, so that the model learns natural stopping boundaries and doesn't try to connect unrelated content.</p><p>(<em>We will come back to re-using the EOS token when discussing the helper function for text generation/inference.</em>)</p><p>Both of these changes are made using a helper function as follows.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;c80bc179-1b78-4aa1-9f40-9a0a1318af3d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">def clean_text(dataset):
    # EOS token
    EOS = "&lt;|endoftext|&gt;"

    cleaned = []
    for text in dataset["text"]:
        # Strip surrounding whitespace
        line = text.strip()

        # Keep only non-empty lines that aren't headings (such as "=== HEADING ===")
        if line and not line.startswith("="):
            # Remove &lt;unk&gt; markers and normalize whitespace
            cleaned.append(" ".join(line.replace("&lt;unk&gt;", "").split()))

    # Join all entries into a string, separated by EOS token
    return EOS.join(cleaned) + EOS
 
# Clean training and validation text
training_text = clean_text(train_data)
validation_text = clean_text(val_data)</code></pre></div><p>A short subset of <code>training_text </code>is shown below.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;7bb98981-91e1-4e87-a881-42b47ee50360&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Print a subset of 'training_text'
print(training_text[3457:3700])

"""
Output:
Each character has a field and distance of movement limited by their 
Action Gauge . Up to nine characters can be assigned to a single mission . 
During gameplay , characters will call out if something happens to them , 
such as their health points ( HP ) getting low or being knocked out
"""</code></pre></div><div><hr></div><h4>2. Tokenize the dataset</h4><p>We previously learned how to build a character-level tokenizer from scratch.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9690bd15-285a-4839-a238-3b91db0f3144&quot;,&quot;caption&quot;:&quot;We are on a journey to implement a GPT-like LLM from scratch in PyTorch.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Build an LLM Tokenizer From Scratch&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-12-23T11:56:28.847Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!UBl3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe0b259a-6a3f-4754-a32a-48d349ae488e_1846x686.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/build-an-llm-tokenizer&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:182161682,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>We have also used it to <strong><a href="https://www.intoai.pub/p/build-and-train-an-llm-from-scratch">train an LLM from scratch</a></strong>.</p><p>But because it is character-based, training an LLM using it makes the model learn the meaning of text from individual letters rather than semantic units. This makes training painfully difficult.</p><p>For this tutorial, we will instead use&nbsp;<strong><a href="https://github.com/openai/tiktoken">Tiktoken</a>, </strong>a fast BPE tokeniser used in OpenAI's models. It is a sub-word tokenizer (breaks down words into sub-words) that better captures language structure than a character-level tokenizer, enabling faster LLM training.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VOLh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VOLh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!VOLh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!VOLh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!VOLh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VOLh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png" width="1080" height="1080" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1080,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46398,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190610837?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VOLh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!VOLh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!VOLh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!VOLh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2d827d2-355f-406c-8c83-73a2de16bbcd_1080x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from the author&#8217;s book titled &#8216;<strong><a href="https://bamaniaashish.gumroad.com/l/llmbook/">LLMs In 100 Images</a></strong>.&#8217;</figcaption></figure></div><p>We start by creating an instance of Tiktoken as follows.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;eee7481a-0836-41fa-8ec5-957b2a89a627&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Create an instance of Tokenizer used by GPT-2
tokenizer = tiktoken.get_encoding("gpt2")</code></pre></div><p>The vocabulary size when using the Tiktonen tokenizer for GPT-2 is 50,257. This means that the tokenizer recognizes 50,257 unique subwords.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;0d75f68f-6f10-46c5-b089-e4002d2a716a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Check vocabulary size
vocab_size = tokenizer.n_vocab

print(f"Vocabulary size: {vocab_size}")

# Output: Vocabulary size: 50257</code></pre></div><div><hr></div><h4><strong>3. Create a dataset required for language modeling</strong></h4><p>Once we have tokenized our dataset, we need to load and serve the data during training. This is where the <code>TextDataset</code> class comes in, which inherits from PyTorch&#8217;s <a href="https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.Dataset">Dataset</a> class, and has the following methods:</p><ul><li><p><code>__init__</code>: Tokenizes given text (tokens to token IDs) and sets the maximum sequence length (<code>max_seq_length</code>)</p></li><li><p><code>__len__</code>:<code> </code>Returns the number of training sequences available</p></li><li><p><code>__getitem__</code>: Returns a training sequence and its targets (tokens shifted forward by one position) at a given index</p></li></ul><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;85dc9b01-6289-4146-b896-62b78e2a2d9a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Dataset for language modeling
class TextDataset(Dataset):
    def __init__(self, text, tokenizer, max_seq_length):
        # Convert text to token IDs
        self.tokens = tokenizer.encode(text, allowed_special="all")

        # Maximum length of each training sequence
        self.max_seq_length = max_seq_length

    # Get number of valid training sequences
    def __len__(self):
        num_sequences = (len(self.tokens) - 1) // self.max_seq_length
        return num_sequences

    # Get an input sequence and targets
    def __getitem__(self, idx):
        # Start index of the sequence
        start = idx * self.max_seq_length

        # End index of the sequence
        end = start + self.max_seq_length

        # Input token sequence
        input_ids = torch.tensor(self.tokens[start:end], dtype=torch.long)

        # Next-token targets/ labels (shifted by one character)
        target_ids = torch.tensor(self.tokens[start+1:end+1], dtype=torch.long)

        return input_ids, target_ids</code></pre></div><p>Let&#8217;s use the <code>TextDataset</code> class to create the training and validation datasets.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;40a91ad4-87cf-4c9c-a23b-57dd88d9d5a1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Define maximum sequence length
MAX_SEQ_LENGTH = 128

# Create training and validation datasets
train_dataset = TextDataset(training_text, tokenizer, MAX_SEQ_LENGTH)
val_dataset = TextDataset(validation_text, tokenizer, MAX_SEQ_LENGTH)</code></pre></div><p>The number of training sequences in the training and validation datasets is as follows.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;de5651aa-73c7-4395-bffe-b5de8d0f63ab&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">print(f"Number of training sequences: {len(train_dataset):,}")
print(f"Number of validation sequences: {len(val_dataset):,}")

"""
Output:
Number of training sequences: 876,162
Number of validation sequences: 1,828
"""</code></pre></div><p>And this is how a training sequence and its target look.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;4f7cffee-65db-412a-98fa-20024c618830&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Example of a training sequence and its target
input, target = train_dataset[16]

print("Input IDs:\n", input)
print("\nTarget IDs:\n", target)

print("\nDecoded Input:\n", tokenizer.decode(input.tolist()))
print("\nDecoded Target:\n", tokenizer.decode(target.tolist()))

"""
Output:
Input IDs:
 tensor([  262,  1271,   286, 22469,  4991,   583,  4365,   764,   317,   636,
          286,   428,  8515,  2950,  4441,  3748,  7514, 14520,  4981,   329,
         1123,  2095,   705,    82,  1767,   764,   554,  1502,   284,  4620,
          428,   837,   262, 22849,  4847, 16560,   656,   262,  1218,   983,
          547,  4615,   837,   355,   484,  1718,   510,   257,  1588,  6903,
          286,  4088,  2272,  2622,   329,   262,  8561,   764,  1119,   635,
        12328,   262,  8722,  6460,   290, 10152,   286,   711,   523,   484,
          714,  5198,   284,   649,  1938,   981, 26645,   262,  6393,  6805,
          286,   262,  2168,   705, 11327,   764,   383, 15064,  3341,   547,
         3066,  2402,  1903,   287,  2478,   764,   383,  2095,  9824,   547,
         1760,   416,  8835,    73,   280,   837,   508,   550,  3111,   319,
          262,  2180,   569, 18354,  7496, 17740,  1830,   764,  1649,  4441,
          262, 17871,  5321, 11630,   837,  8835,    73,   280])

Target IDs:
 tensor([ 1271,   286, 22469,  4991,   583,  4365,   764,   317,   636,   286,
          428,  8515,  2950,  4441,  3748,  7514, 14520,  4981,   329,  1123,
         2095,   705,    82,  1767,   764,   554,  1502,   284,  4620,   428,
          837,   262, 22849,  4847, 16560,   656,   262,  1218,   983,   547,
         4615,   837,   355,   484,  1718,   510,   257,  1588,  6903,   286,
         4088,  2272,  2622,   329,   262,  8561,   764,  1119,   635, 12328,
          262,  8722,  6460,   290, 10152,   286,   711,   523,   484,   714,
         5198,   284,   649,  1938,   981, 26645,   262,  6393,  6805,   286,
          262,  2168,   705, 11327,   764,   383, 15064,  3341,   547,  3066,
         2402,  1903,   287,  2478,   764,   383,  2095,  9824,   547,  1760,
          416,  8835,    73,   280,   837,   508,   550,  3111,   319,   262,
         2180,   569, 18354,  7496, 17740,  1830,   764,  1649,  4441,   262,
        17871,  5321, 11630,   837,  8835,    73,   280,   373])

Decoded Input:
  the number of playable units per mission . A part of this upgrade involved creating 
  unique polygon models for each character 's body . In order to achieve this , 
  the cooperative elements incorporated into the second game were removed , 
  as they took up a large portion of memory space needed for the improvements . 
  They also adjusted the difficulty settings and ease of play so they could appeal 
  to new players while retaining the essential components of the series ' gameplay . 
  The newer systems were decided upon early in development . The character designs 
  were done by Honjou , who had worked on the previous Valkyria Chronicles games . 
  When creating the Nameless Squad , Honjou

Decoded Target:
  number of playable units per mission . A part of this upgrade involved creating 
  unique polygon models for each character 's body . In order to achieve this , 
  the cooperative elements incorporated into the second game were removed , 
  as they took up a large portion of memory space needed for the improvements . 
  They also adjusted the difficulty settings and ease of play so they could appeal 
  to new players while retaining the essential components of the series ' gameplay . 
  The newer systems were decided upon early in development . The character designs 
  were done by Honjou , who had worked on the previous Valkyria Chronicles games . 
  When creating the Nameless Squad , Honjou was
"""</code></pre></div><p>Note how the input and target sequences are shifted by one sub-word. This is to help the model learn to predict each next sub-word given the previous ones.</p><div><hr></div><h3><strong>Setting up DataLoader for Batch training</strong></h3><p>Next, we use PyTorch <a href="https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader">DataLoader</a> to create an iterable that loads shuffled batches of <code>BATCH_SIZE</code> samples from <code>train_dataset</code> and <code>val_dataset</code> for training and validation.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;6e0eefa8-efe3-4db7-95e6-3d10a5c808c3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Define batch size
BATCH_SIZE = 32

# Create training and validation DataLoaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=0)</code></pre></div><p>This creates the following number of batches of training and validation data.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;76481d80-c9c8-4393-b5c6-f3d2153bf697&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">print(f"Number of training batches: {len(train_loader):,}") # Number of training sequences / batch_size
print(f"Number of validation batches: {len(val_loader):,}") # Number of validation sequences / batch_size

"""
Output:
Number of training batches: 27,381
Number of validation batches: 58
"""</code></pre></div><p>Our data is now ready for model training! </p><div><hr></div><h3>Building the MoE Transformer model</h3><p>We will now set up our MoE transformer model that we built in a <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-transformer?utm_source=publication-search">previous lesson</a> with a few changes (based on what modern LLMs follow):</p><ol><li><p>We replace the <a href="https://www.intoai.pub/p/causal-mha?utm_source=publication-search">Causal Multi-head Self-attention</a> module with <strong><a href="https://www.intoai.pub/p/grouped-query-attention?utm_source=publication-search">Causal Grouped-Query Attention</a></strong></p></li><li><p>We introduce <strong>Weight tying</strong>, which means we reuse the same weights for both the token embedding matrix and the output projection matrix, rather than learning two separate ones. This is because they do reciprocal functions and have the same shape. This decreases the number of learning parameters in the model by a few million.</p></li><li><p>We <strong>remove the bias term from all linear layers</strong> following the Llama models</p></li></ol><p>Let&#8217;s implement these one by one.<br></p><h4>1. Set up Causal Grouped Query Attention</h4><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;7e31e8e8-cef2-4bf8-9afa-bf59b6e339df&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Grouped Query Attention (GQA) with Causal Masking
class GroupedQueryAttention(nn.Module):
    def __init__(self, embedding_dim, num_heads, num_groups, max_seq_length):
        super().__init__()
 
        # Check if embedding_dim is divisible by num_heads
        assert embedding_dim % num_heads == 0, "embedding_dim must be divisible by num_heads"
 
        # Check if num_heads is divisible by num_groups
        # (Each group must be shared by the same number of heads)
        assert num_heads % num_groups == 0, "num_heads must be divisible by num_groups"
 
        # Embedding dimension
        self.embedding_dim = embedding_dim
 
        # Number of total query heads
        self.num_heads = num_heads
 
        # Dimension of each head
        self.head_dim = embedding_dim // num_heads
 
        # Number of KV groups
        self.num_groups = num_groups
 
        # Number of query heads per KV group
        self.group_size = num_heads // num_groups
 
        # Linear projection matrix for query
        self.W_q = nn.Linear(embedding_dim, embedding_dim, bias=False)
 
        # Linear projection matrices for key and value
        self.W_k = nn.Linear(embedding_dim, num_groups * self.head_dim, bias=False)
        self.W_v = nn.Linear(embedding_dim, num_groups * self.head_dim, bias=False)
 
        # Linear projection matrix to produce final output
        self.W_o = nn.Linear(embedding_dim, embedding_dim, bias=False)
 
        # Build the causal mask once at init instead of every forward pass
        mask = torch.tril(torch.ones(max_seq_length, max_seq_length))
 
        # Add batch_size and num_heads dimensions
        mask = mask.view(1, 1, max_seq_length, max_seq_length)

        # register_buffer saves it as part of the model so it automatically moves to the right device (CPU/GPU)
        self.register_buffer("causal_mask", mask)
 
    # Splits Q into multiple heads
    def _split_heads(self, x):
        """
        Transforms input embeddings from
        [batch_size, sequence_length, embedding_dim]
        to
        [batch_size, num_heads, sequence_length, head_dim]
        """
        batch_size, sequence_length, embedding_dim = x.shape
 
        # Split embedding_dim into (num_heads, head_dim)
        x = x.reshape(batch_size, sequence_length, self.num_heads, self.head_dim)
 
        # Reorder and return the intended shape
        return x.transpose(1, 2)
 
    # Splits K or V into num_groups heads
    def _split_groups(self, x):
        """
        Transforms K/V from
        [batch_size, sequence_length, num_groups * head_dim]
        to
        [batch_size, num_groups, sequence_length, head_dim]
        """
        batch_size, sequence_length, _ = x.shape
 
        x = x.reshape(batch_size, sequence_length, self.num_groups, self.head_dim)
        return x.transpose(1, 2)
 
    # Merge heads back together
    def _merge_heads(self, x):
        """
        Transforms inputs from
        [batch_size, num_heads, sequence_length, head_dim]
        to
        [batch_size, sequence_length, embedding_dim]
        """
        batch_size, num_heads, sequence_length, head_dim = x.shape
 
        # Move sequence_length back before num_heads in the shape
        x = x.transpose(1, 2)
 
        # Merge (num_heads, head_dim) back into embedding_dim
        embedding_dim = num_heads * head_dim
        x = x.reshape(batch_size, sequence_length, embedding_dim)
 
        return x
 
    # Forward pass
    def forward(self, x):
        batch_size, sequence_length, embedding_dim = x.shape
 
        # Compute Q, K, V
        Q = self.W_q(x)  # [batch_size, sequence_length, embedding_dim]
        K = self.W_k(x)  # [batch_size, sequence_length, num_groups * head_dim]
        V = self.W_v(x)  # [batch_size, sequence_length, num_groups * head_dim]
 
        # Split Q into multiple heads
        Q = self._split_heads(Q)  # [batch_size, num_heads, sequence_length, head_dim]
 
        # Split K and V into num_groups heads
        K = self._split_groups(K)  # [batch_size, num_groups, sequence_length, head_dim]
        V = self._split_groups(V)  # [batch_size, num_groups, sequence_length, head_dim]
 
        # Expand K and V so each KV group is shared across multiple query heads
        K = K.repeat_interleave(self.group_size, dim=1)  # [batch_size, num_heads, sequence_length, head_dim]
        V = V.repeat_interleave(self.group_size, dim=1)  # [batch_size, num_heads, sequence_length, head_dim]
 
        # Calculate scaled dot-product attention
        attn_scores = Q @ K.transpose(-2, -1)
        attn_scores = attn_scores / math.sqrt(self.head_dim)
 
        # Slice the pre-built mask to the current sequence length
        causal_mask = self.causal_mask[:, :, :sequence_length, :sequence_length]
 
        # Mask out future positions by setting their scores to -inf
        attn_scores = attn_scores.masked_fill(causal_mask == 0, float('-inf'))
 
        # Apply softmax to get attention weights
        attn_weights = torch.softmax(attn_scores, dim=-1)
 
        # Multiply attention weights by V
        weighted_values = attn_weights @ V  # [batch_size, num_heads, sequence_length, head_dim]
 
        # Merge head outputs
        merged_heads_output = self._merge_heads(weighted_values)
 
        # Obtain final output
        output = self.W_o(merged_heads_output)
 
        return output</code></pre></div><div><hr></div><h4>2. Set up Expert and Router</h4><p>An <strong>Expert</strong> is simply a small feed-forward network that specializes in a particular skill, which may involve processing different token types (numerical/code/language-related tokens), reasoning patterns, context patterns, and so on.</p>
      <p>
          <a href="https://www.intoai.pub/p/build-and-train-a-mixture-of-experts">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[🗓️ This Week In AI Research (8-14 March 26)]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-8-14-march</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-8-14-march</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Thu, 19 Mar 2026 11:14:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!z4We!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z4We!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z4We!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 424w, https://substackcdn.com/image/fetch/$s_!z4We!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 848w, https://substackcdn.com/image/fetch/$s_!z4We!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 1272w, https://substackcdn.com/image/fetch/$s_!z4We!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z4We!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png" width="724.859375" height="319.11734847184067" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:641,&quot;width&quot;:1456,&quot;resizeWidth&quot;:724.859375,&quot;bytes&quot;:949021,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z4We!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 424w, https://substackcdn.com/image/fetch/$s_!z4We!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 848w, https://substackcdn.com/image/fetch/$s_!z4We!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 1272w, https://substackcdn.com/image/fetch/$s_!z4We!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66b69714-e5e1-4546-9e97-a31ba168bf8e_2742x1208.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. <strong>Covenant-72B</strong></h3><p>This research paper presents <strong>Covenant-72B</strong>, an LLM produced by the largest globally distributed collaborative pre-training run to date, in terms of both compute and model scale.</p><p>This pre-training involved open, permissionless participation, supported by a live blockchain protocol, <strong><a href="https://bittensor.com/about">Bittensor</a></strong>, and a state-of-the-art, communication-efficient optimizer called <strong>SparseLoCo</strong>, which enabled dynamic participation with peers joining and leaving freely.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j9Ek!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j9Ek!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 424w, https://substackcdn.com/image/fetch/$s_!j9Ek!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 848w, https://substackcdn.com/image/fetch/$s_!j9Ek!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 1272w, https://substackcdn.com/image/fetch/$s_!j9Ek!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j9Ek!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png" width="1456" height="687" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:687,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:368610,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!j9Ek!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 424w, https://substackcdn.com/image/fetch/$s_!j9Ek!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 848w, https://substackcdn.com/image/fetch/$s_!j9Ek!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 1272w, https://substackcdn.com/image/fetch/$s_!j9Ek!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ffa3367-fc5e-4d54-81be-83e4571c91e2_2014x950.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Covenant-72B was pre-trained on approximately 1.1T tokens and performs at par with fully centralized models pre-trained on similar or higher compute budgets.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gTO_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gTO_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 424w, https://substackcdn.com/image/fetch/$s_!gTO_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 848w, https://substackcdn.com/image/fetch/$s_!gTO_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 1272w, https://substackcdn.com/image/fetch/$s_!gTO_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gTO_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png" width="1456" height="518" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:518,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:178517,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gTO_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 424w, https://substackcdn.com/image/fetch/$s_!gTO_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 848w, https://substackcdn.com/image/fetch/$s_!gTO_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 1272w, https://substackcdn.com/image/fetch/$s_!gTO_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4df4f3a8-fe66-4426-b732-012dc603053e_2076x738.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This shows that open and fully democratized participation is possible on an unprecedented scale for a globally distributed LLM pre-training run.</p><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.08163">using this link</a>.</em></p><div><hr></div><p>&#127873; Unlock every lesson in the newsletter and accelerate your AI engineering career with a&nbsp;<strong>25% discount on the annual membership</strong>&nbsp;of &#8216;Into AI&#8217; today!</p><ul><li><p>&#127752; <a href="https://www.intoai.pub/p/build-and-train-an-llm-from-scratch">Build and train an LLM from scratch</a></p></li><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#128126; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880;<a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/741bb486&quot;,&quot;text&quot;:&quot;Grab your 25% discount today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/741bb486"><span>Grab your 25% discount today</span></a></p><div><hr></div><h3>2. Training Language Models via Neural Cellular Automata</h3><p>Natural LLM pre-training data has its flaws, as it is finite, contains human biases, and mixes knowledge with reasoning.</p><p>This research paper proposes using neural cellular automata (NCA) to generate synthetic, non-linguistic data for <strong>pre-pre-training LLMs</strong>, which means training on synthetic data first, followed by natural language.</p><p>This is because NCA data has a rich spatiotemporal structure and statistics that resemble natural language, while being controllable and cheap to generate at scale.</p><p>Pre-pre-training is surprisingly effective, and training on only 164M NCA tokens improves downstream language modeling by up to 6% and accelerates convergence by up to 1.6x. </p><p>This approach of using NCA even outperforms pre-pre-training on 1.6B tokens of natural language from Common Crawl with more compute, and the gains transfer to reasoning benchmarks, such as GSM8K, HumanEval, and BigBench-Lite. </p><p>Further experiments show that attention layers transfer best, and that optimal NCA complexity depends on the task, with simpler dynamics working better for code, while more complex ones benefit math and web text more.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FBd0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FBd0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 424w, https://substackcdn.com/image/fetch/$s_!FBd0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 848w, https://substackcdn.com/image/fetch/$s_!FBd0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 1272w, https://substackcdn.com/image/fetch/$s_!FBd0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FBd0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png" width="1456" height="944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:944,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:547559,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FBd0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 424w, https://substackcdn.com/image/fetch/$s_!FBd0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 848w, https://substackcdn.com/image/fetch/$s_!FBd0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 1272w, https://substackcdn.com/image/fetch/$s_!FBd0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8d6b625-3279-4556-b3c8-c3732cf895e1_1742x1130.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://www.arxiv.org/abs/2603.10055">using this link</a>.</em></p><div><hr></div><h3>3. Lost in Backpropagation: The LM Head is a Gradient Bottleneck</h3><p>The final layer of LLMs, called the LM head, maps the hidden dimension (<code>D</code>) to the vocabulary dimension (<code>V</code>), where usually <code>D</code> is very small as compared to <code>V</code>. This creates an expressivity limitation called the&nbsp;<strong>Softmax bottleneck</strong>. </p><p>This research paper shows that the softmax bottleneck is not only an expressivity bottleneck but also an optimization bottleneck where high-dimensional gradients are compressed into a much smaller space, leading to most of the training signal (around 95&#8211;99% of the gradient norm) getting lost.</p><p>This leads to poor update directions and makes even simple patterns harder to learn, slowing down training and harming efficiency across models.</p><p>The research suggests that current LLM architectures have a fundamental flaw in how they learn and that better LM head designs are needed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bLtL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bLtL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 424w, https://substackcdn.com/image/fetch/$s_!bLtL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 848w, https://substackcdn.com/image/fetch/$s_!bLtL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 1272w, https://substackcdn.com/image/fetch/$s_!bLtL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bLtL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png" width="1456" height="747" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:747,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:370313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bLtL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 424w, https://substackcdn.com/image/fetch/$s_!bLtL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 848w, https://substackcdn.com/image/fetch/$s_!bLtL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 1272w, https://substackcdn.com/image/fetch/$s_!bLtL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8da8128a-2e57-4523-9888-9e7557bb80f8_1864x956.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.10145">using this link</a>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/741bb486&quot;,&quot;text&quot;:&quot;Grab your 25% discount today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/741bb486"><span>Grab your 25% discount today</span></a></p><div><hr></div><h3>4. OpenClaw-RL: Train Any Agent Simply by Talking</h3><p>This research paper presents <strong>OpenClaw-RL</strong>, a reinforcement learning framework in which AI agents learn continuously from real-time interactions, using next-state signals (such as user replies, tool outputs, or environment changes) as training data. </p><p>These signals offer evaluative feedback (how well an action is performed, expressed as scalar rewards via a PRM judge) and directive feedback (how the action should be improved, expressed as textual hints and used for token-level supervision via Hindsight-Guided On-Policy Distillation, or OPD).</p><p>By blending these, OpenClaw-RL supports more effective learning than traditional methods. </p><p>Its asynchronous, decoupled architecture also allows agents to improve during live use in various environments, such as chat, tools, and software tasks, without requiring offline datasets or manual labeling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OfpP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OfpP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 424w, https://substackcdn.com/image/fetch/$s_!OfpP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 848w, https://substackcdn.com/image/fetch/$s_!OfpP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!OfpP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OfpP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png" width="1456" height="948" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:948,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:486644,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!OfpP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 424w, https://substackcdn.com/image/fetch/$s_!OfpP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 848w, https://substackcdn.com/image/fetch/$s_!OfpP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!OfpP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc467a5d7-4c4f-4fe8-82cc-4ad9fba8410c_1618x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.10165">using this link</a>.</em></p><div><hr></div><h3>5. How Far Can Unsupervised RLVR Scale LLM Training?</h3><p>This research paper studies a training approach called <strong>Unsupervised Reinforcement Learning with Verifiable Rewards (RLVR)</strong>, in which models learn without labeled data by generating their own rewards from signals that can be automatically verified.</p><p>The paper categorises RLVR methods into intrinsic (self-generated rewards) and external (externally verified rewards), and shows that intrinsic methods mainly strengthen a model&#8217;s existing beliefs and work only if those beliefs are already correct, but can lead to collapse if they are wrong.</p><p>While intrinsic rewards still help with test-time training on small datasets, the authors propose the Model Collapse Step, which acts as a practical indicator for RL trainability (predicting when training will fail). </p><p>They also present early evidence that external reward methods may overcome these limitations, suggesting more scalable approaches.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e85t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e85t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 424w, https://substackcdn.com/image/fetch/$s_!e85t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 848w, https://substackcdn.com/image/fetch/$s_!e85t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 1272w, https://substackcdn.com/image/fetch/$s_!e85t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e85t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png" width="1456" height="863" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:863,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:951001,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e85t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 424w, https://substackcdn.com/image/fetch/$s_!e85t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 848w, https://substackcdn.com/image/fetch/$s_!e85t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 1272w, https://substackcdn.com/image/fetch/$s_!e85t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f0bb518-f34e-4d1b-a801-5db5c9802d20_2260x1340.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.08660">using this link</a>.</em></p><div><hr></div><h3>6. Attend Before Attention</h3><p>This research paper introduces <strong>AutoGaze </strong>for multi-modal LLMs (MLLMs), an efficient method for processing videos by selectively focusing on important regions (like human gaze) before applying full attention, rather than analyzing every pixel equally.</p><p>Such gazing lets the model ignore unnecessary or static information, enabling faster, easier-to-scale video understanding while maintaining high performance. </p><p>Experiments show that this approach greatly increases efficiency (higher processing FPS) and allows for real-time video processing, offering a practical way to scale multimodal models to longer or streaming video inputs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nIM5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nIM5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 424w, https://substackcdn.com/image/fetch/$s_!nIM5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 848w, https://substackcdn.com/image/fetch/$s_!nIM5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 1272w, https://substackcdn.com/image/fetch/$s_!nIM5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nIM5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png" width="1456" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:725586,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nIM5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 424w, https://substackcdn.com/image/fetch/$s_!nIM5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 848w, https://substackcdn.com/image/fetch/$s_!nIM5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 1272w, https://substackcdn.com/image/fetch/$s_!nIM5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0e72cb-50c5-46c0-8781-dd0ff9d6e30e_2482x1106.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.12254">using this link</a>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/741bb486&quot;,&quot;text&quot;:&quot;Grab your 25% discount today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/741bb486"><span>Grab your 25% discount today</span></a></p><div><hr></div><h3>7. Geometry-Grounded Point Transformer</h3><p>Feed-forward networks suffer from geometric inconsistencies and achieve only limited fine-grained accuracy in 3D reconstruction (i.e., predicting dense point maps directly from RGB images).</p><p>This research paper introduces the <strong>Geometry-Grounded Point Transformer (GGPT)</strong>, a method for improving 3D reconstruction from images by combining fast neural predictions with explicit geometric guidance.</p><p>It uses an improved Structure-from-Motion pipeline to estimate accurate camera poses and partial 3D structure from sparse input views, and then uses a Geometry-guided 3D point transformer to refine dense point maps under explicit partial-geometry supervision using an optimised guidance encoding.</p><p>Experiments show that GGPT produces reconstructions that are both geometrically consistent and spatially complete. </p><p>While being trained solely on <a href="https://arxiv.org/abs/2308.11417">ScanNet++</a> with VGGT predictions, it generalises across architectures and datasets, significantly outperforming SOTA feed-forward 3D reconstruction models in both in-domain and out-of-domain settings.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Icv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Icv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 424w, https://substackcdn.com/image/fetch/$s_!1Icv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 848w, https://substackcdn.com/image/fetch/$s_!1Icv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!1Icv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Icv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png" width="1456" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3472856,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Icv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 424w, https://substackcdn.com/image/fetch/$s_!1Icv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 848w, https://substackcdn.com/image/fetch/$s_!1Icv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!1Icv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a77289b-e628-4e9a-9ca6-0b5e9220d409_2646x1124.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.11174">using this link</a>.</em></p><div><hr></div><h3>8. The Attack and Defense Landscape of Agentic AI</h3><p>This research paper is a comprehensive survey of security risks in AI agent systems that examines how design choices, such as tool use, memory, and autonomy, create new attack surfaces.</p><p>It uses case studies to highlight existing gaps in securing agentic AI systems and identifies open challenges in this domain.</p><p>It also introduces the first systematic framework for understanding the security risks and defense strategies of AI agents, serving as a foundation for building secure agentic systems and advancing research.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fm0C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fm0C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 424w, https://substackcdn.com/image/fetch/$s_!fm0C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 848w, https://substackcdn.com/image/fetch/$s_!fm0C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!fm0C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fm0C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:467991,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fm0C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 424w, https://substackcdn.com/image/fetch/$s_!fm0C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 848w, https://substackcdn.com/image/fetch/$s_!fm0C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!fm0C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b63e377-2642-4d14-b204-cb870631ca28_2144x1076.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.11088">using this link</a>.</em></p><div><hr></div><h3>9.IndexCache</h3><p>This research paper presents <strong>IndexCache</strong>, a method that improves sparse attention in LLMs by reusing attention indices across layers rather than recalculating them each time. </p><p>In a standard sparse attention mechanism, models repeatedly select the most relevant tokens at each layer, which is computationally expensive. This research shows that these selections are often similar across layers and can be cached and reused.</p><p>By adding this cross-layer index reuse method, the approach reduces computation and latency while maintaining model performance, making long-context and agent-style workloads more efficient and scalable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3mMr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3mMr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 424w, https://substackcdn.com/image/fetch/$s_!3mMr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 848w, https://substackcdn.com/image/fetch/$s_!3mMr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 1272w, https://substackcdn.com/image/fetch/$s_!3mMr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3mMr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png" width="1456" height="519" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:519,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:222025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3mMr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 424w, https://substackcdn.com/image/fetch/$s_!3mMr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 848w, https://substackcdn.com/image/fetch/$s_!3mMr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 1272w, https://substackcdn.com/image/fetch/$s_!3mMr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fce7334-3e23-4869-ac21-3b61c9bc3261_2666x950.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.12201">using this link</a>.</em></p><div><hr></div><h3>10. Neural Thickets</h3><p>This research paper suggests that large pre-trained LLMs already have many good solutions for task-specific experts nearby in their weights. They do not require extensive training or optimization to uncover them. </p><p>Instead of using complex optimization methods, the authors propose a simple, fully parallel post-training approach where they make small random changes to the model parameters, select the top-performing variants, and combine their predictions using majority voting. </p><p>This straightforward approach is surprisingly effective and works almost as well as standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qr2d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qr2d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 424w, https://substackcdn.com/image/fetch/$s_!qr2d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 848w, https://substackcdn.com/image/fetch/$s_!qr2d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 1272w, https://substackcdn.com/image/fetch/$s_!qr2d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qr2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png" width="1456" height="871" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:871,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:600797,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/191119957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qr2d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 424w, https://substackcdn.com/image/fetch/$s_!qr2d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 848w, https://substackcdn.com/image/fetch/$s_!qr2d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 1272w, https://substackcdn.com/image/fetch/$s_!qr2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9eda28d-95e6-42bd-9378-7a4282d29ae2_2120x1268.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Read more about this research <a href="https://arxiv.org/abs/2603.12228">using this link</a>.</em></p><div><hr></div><p>This article is entirely free to read. If you loved reading it, restack and share it with others to <strong>earn referral rewards.</strong> &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/this-week-in-ai-research-8-14-march?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/this-week-in-ai-research-8-14-march?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><p>Join the <strong>paid tier today</strong> <strong>at a 25% discount</strong> to get access to all posts on this newsletter!</p><ul><li><p>&#127752; <a href="https://www.intoai.pub/p/build-and-train-an-llm-from-scratch">Build and train an LLM from scratch</a></p></li><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#128126; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#129412; <a href="https://www.intoai.pub/p/building-your-first-ai-agent">Building Your First AI Agent</a></p></li><li><p>&#127880; <a href="https://www.intoai.pub/p/distributed-data-parallel">Learn to train deep learning models on multiple GPUs</a></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/741bb486&quot;,&quot;text&quot;:&quot;Grab your 25% discount today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/741bb486"><span>Grab your 25% discount today</span></a></p><div><hr></div><p>You can also check out my books on <strong><a href="https://bamaniaashish.gumroad.com/">Gumroad</a></strong> and connect with me on <strong><a href="https://www.linkedin.com/in/ashishbamania/">LinkedIn</a></strong> to stay in touch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B_dw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" width="1344" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[🗓️ This Week In AI Research (1-7 March 26)]]></title><description><![CDATA[The top 10 AI research papers that you must know about this week.]]></description><link>https://www.intoai.pub/p/this-week-in-ai-research-1-7-march</link><guid isPermaLink="false">https://www.intoai.pub/p/this-week-in-ai-research-1-7-march</guid><dc:creator><![CDATA[Dr. Ashish Bamania]]></dc:creator><pubDate>Tue, 10 Mar 2026 12:38:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3tcx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3tcx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3tcx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 424w, https://substackcdn.com/image/fetch/$s_!3tcx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 848w, https://substackcdn.com/image/fetch/$s_!3tcx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 1272w, https://substackcdn.com/image/fetch/$s_!3tcx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3tcx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png" width="725.203125" height="322.25715788118134" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:647,&quot;width&quot;:1456,&quot;resizeWidth&quot;:725.203125,&quot;bytes&quot;:5006735,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3tcx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 424w, https://substackcdn.com/image/fetch/$s_!3tcx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 848w, https://substackcdn.com/image/fetch/$s_!3tcx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 1272w, https://substackcdn.com/image/fetch/$s_!3tcx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c85f129-a65a-40b4-b3b9-54964749befd_6912x3072.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. Introducing GPT&#8209;5.4</h3><p>OpenAI released <strong>GPT-5.4</strong>, with improved reasoning, coding, and professional knowledge capabilities, enabling the development of more dependable AI agents.</p><ul><li><p>GPT-5.4 can perform hard tasks across documents, spreadsheets, presentations, and software tools, producing higher-quality outputs and helping teams complete workflows faster.</p></li><li><p>It is the first general-purpose model with native, state-of-the-art computer-use capabilities. </p></li><li><p>It supports up to 1M tokens of context, allowing agents to plan, execute, and verify tasks across long horizons.</p></li></ul><p>It is available across ChatGPT, the OpenAI API, and Codex, with variants such as GPT-5.4 Thinking for deeper reasoning and GPT-5.4 Pro for more demanding workloads.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n0nP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n0nP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 424w, https://substackcdn.com/image/fetch/$s_!n0nP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 848w, https://substackcdn.com/image/fetch/$s_!n0nP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!n0nP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n0nP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png" width="1456" height="1046" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1046,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86668,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n0nP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 424w, https://substackcdn.com/image/fetch/$s_!n0nP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 848w, https://substackcdn.com/image/fetch/$s_!n0nP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!n0nP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca35322-cfe7-4688-b4cd-f5dda787f008_1598x1148.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this release <a href="https://openai.com/index/introducing-gpt-5-4/">using this link</a>.</p><div><hr></div><h3><strong>&#128680; A Special Offer</strong></h3><p><strong>Grab a 25% discount on the annual subscription</strong> to the newsletter and get access to articles like these, and so much more:</p><ul><li><p>&#127752; <a href="https://www.intoai.pub/p/build-and-train-an-llm-from-scratch">Build and train an LLM from scratch</a></p></li><li><p>&#127853; <a href="https://www.intoai.pub/p/build-a-mixture-of-experts-layer-from-scratch">Build a Mixture-of-Experts (MoE) LLM from Scratch</a></p></li><li><p>&#127851; <a href="https://www.intoai.pub/p/pytorch-essentials">20 PyTorch Concepts, Explained Simply</a></p></li><li><p><strong>&#127856;</strong> <a href="https://www.intoai.pub/p/hierarchical-reasoning-model">Hierarchical Reasoning Model: A Deep Dive</a></p></li><li><p>&#127846; <a href="https://www.intoai.pub/p/train-a-cnn-with-pytorch-ddp">Learn to train a deep learning model on multiple GPUs with Distributed PyTorch</a></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/741bb486&quot;,&quot;text&quot;:&quot;Your 25% off&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/741bb486"><span>Your 25% off</span></a></p><div><hr></div><h3>2. <strong>FlashAttention-4</strong></h3><p>This research paper introduces the latest version, <strong>FlashAttention-4</strong>, which makes transformer attention faster and more efficient on modern GPUs.</p><p>FlashAttention-3 was optimized attention for Hopper GPUs, specifically targeting the H100 architecture. But the industry has moved to Blackwell-based systems such as the B200 and GB200, which come with fundamentally different performance characteristics.</p><p>FlashAttention-4 optimizes attention on Blackwell GPUs by:</p><ul><li><p>Redesigning pipelines that take advantage of fully asynchronous Matrix Multiply-Accumulate (MMA) operations and larger tile sizes</p></li><li><p>Software-based implementations of exponential and conditional softmax rescaling, which reduce the number of non-matrix operations and improve overall speed</p></li><li><p>Better use of GPU tensor memory and the 2-CTA MMA mode (which allows two Cooperative Thread Arrays to jointly execute a single Matrix Multiply-Accumulate operation), which reduces shared-memory traffic and atomic adds required during the backward pass</p></li></ul><p>FlashAttention-4 achieves up to:</p><ul><li><p>a 1.3&#215; speedup over cuDNN 9.13</p></li><li><p>a 2.7&#215; speedup over Triton on B200 GPUs with BF16, reaching up to 1613 TFLOPs/s (71% utilization)</p></li></ul><p>The researchers also implement FlashAttention-4 entirely in CuTe-DSL embedded in Python, resulting in 20-30&#215; faster compile times compared to traditional C++ template-based approaches while maintaining full expressivity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!72Nk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!72Nk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 424w, https://substackcdn.com/image/fetch/$s_!72Nk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 848w, https://substackcdn.com/image/fetch/$s_!72Nk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 1272w, https://substackcdn.com/image/fetch/$s_!72Nk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!72Nk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png" width="1456" height="897" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:897,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1636886,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!72Nk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 424w, https://substackcdn.com/image/fetch/$s_!72Nk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 848w, https://substackcdn.com/image/fetch/$s_!72Nk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 1272w, https://substackcdn.com/image/fetch/$s_!72Nk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6472f19d-3442-4286-89ab-a14c00ec5c28_2068x1274.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.05451">using this link</a>.</p><div><hr></div><h3><strong>3. Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery</strong></h3><p>This research paper describes a neuro-symbolic system that autonomously solved an open problem in theoretical physics. </p><p>This system combines <strong>Gemini Deep Think</strong> with a systematic Tree Search (TS) algorithm and an automated numerical feedback system to explore mathematical ideas and verify them computationally.</p><p>The system successfully derived new and exact analytical solutions for the power spectrum of gravitational radiation emitted by cosmic strings. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oe9Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 424w, https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 848w, https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 1272w, https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png" width="1456" height="677" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:677,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:254369,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 424w, https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 848w, https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 1272w, https://substackcdn.com/image/fetch/$s_!Oe9Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F941da312-3a4b-48c7-b9f8-73d0ff51b1f1_2296x1068.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The research system identified 6 different analytical methods, the most elegant of which uses the <a href="https://en.wikipedia.org/wiki/Gegenbauer_polynomials">Gegenbauer expansion</a>.</figcaption></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.04735">using this link</a>.</p><div><hr></div><h3>4. <strong>The Spike, the Sparse, and the Sink</strong> </h3><p>This research paper studies two phenomena that usually co-occur and often involve the same tokens in Transformer-based LLMs:</p><ol><li><p><strong>Massive activations</strong>: where some tokens produce extremely large activations (&#8220;Spikes&#8221;)</p></li><li><p><strong>Attention sinks</strong>: where some tokens consistently attract attention disproportionate to their semantic importance (&#8220;Sinks&#8221;)</p></li></ol><p>Experiments show that:</p><ul><li><p>The co-occurrence of the two is an architectural artifact of modern Transformer design.</p></li><li><p>Massive activations work globally to create nearly constant hidden representations that persist across many layers, effectively acting like implicit parameters of the model.</p></li><li><p>Attention sinks work locally by influencing attention outputs across heads, pushing individual heads to focus more on short-range dependencies.</p></li><li><p><strong>Pre-norm</strong> is the main architectural choice that allows these two phenomena to appear together, and removing pre-norm causes them to decouple.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b1vm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b1vm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 424w, https://substackcdn.com/image/fetch/$s_!b1vm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 848w, https://substackcdn.com/image/fetch/$s_!b1vm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 1272w, https://substackcdn.com/image/fetch/$s_!b1vm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b1vm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png" width="1456" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:378103,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b1vm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 424w, https://substackcdn.com/image/fetch/$s_!b1vm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 848w, https://substackcdn.com/image/fetch/$s_!b1vm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 1272w, https://substackcdn.com/image/fetch/$s_!b1vm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547f63e9-b6b7-4ebb-8943-bafb61f79797_2524x916.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.05498">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>5. Speculative Speculative Decoding</h3><p><a href="https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/">Speculative decoding</a> has become a standard way to accelerate autoregressive inference in LLMs.</p><p>It uses a fast draft model to predict upcoming tokens from a slower, larger target model, and then verifies them in parallel with a single forward pass of the target model.</p><p>However, the method is still partly sequential because the draft must wait for verification before making the next speculation.</p><p>This research paper introduces <strong>Speculative speculative decoding (SSD)</strong>, which parallelizes the speculation and verification operations.</p><p>While verification is ongoing, the draft model predicts likely verification outcomes and prepares pre-emptive speculations for them. If the actual verification outcome is then in the predicted set, a speculation is returned immediately, removing drafting delays entirely.</p><p>This method is further improved, leading to <strong>Saguaro</strong>, an optimized SSD algorithm that is up to 5x faster than autoregressive decoding on open-source inference engines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_VBO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_VBO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 424w, https://substackcdn.com/image/fetch/$s_!_VBO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 848w, https://substackcdn.com/image/fetch/$s_!_VBO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!_VBO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_VBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png" width="1456" height="752" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:752,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:425451,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_VBO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 424w, https://substackcdn.com/image/fetch/$s_!_VBO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 848w, https://substackcdn.com/image/fetch/$s_!_VBO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!_VBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b9ca8e-0e0f-4bc9-a5e4-e72261568cae_2172x1122.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.03251">using this link</a>.</p><div><hr></div><h3>6. Olmo Hybrid</h3><p>Hybrid LLMs use an architecture that combines transformer attention and linear RNN layers.</p><p>Despite previous releases such as Nemotron-H, Qwen3-Next, Kimi Linear, and Qwen 3.5, it was unclear if the purported benefits of hybrid architectures justify the cost of scaling them up.</p><p>This debate has been settled by Ai2 researchers, who released the fully open-source&nbsp;<strong>Olmo Hybrid</strong>, a 7B-parameter model largely comparable to Olmo 3 7B, but with its sliding-window layers replaced by Gated DeltaNet layers.</p><p>Olmo Hybrid is trained using 49% fewer training tokens and outperforms Olmo 3 across standard pre-training and mid-training evaluations. </p><p>This is empirical proof that hybrid architectures not only reduce memory usage during inference but also obtain more expressive models that scale better during pretraining.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ayym!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ayym!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 424w, https://substackcdn.com/image/fetch/$s_!Ayym!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 848w, https://substackcdn.com/image/fetch/$s_!Ayym!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 1272w, https://substackcdn.com/image/fetch/$s_!Ayym!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ayym!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png" width="930" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:930,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ayym!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 424w, https://substackcdn.com/image/fetch/$s_!Ayym!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 848w, https://substackcdn.com/image/fetch/$s_!Ayym!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 1272w, https://substackcdn.com/image/fetch/$s_!Ayym!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f23d489-135b-4a08-9187-1a409d7b7883_930x810.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Architecture comparison for Olmo 3 and Olmo Hybrid. Olmo Hybrid replaces the sliding-window attention layers from Olmo 3 with Gated DeltaNet.</figcaption></figure></div><p>Read more about this research <a href="https://allenai.org/papers/olmo-hybrid">using this link</a>.</p><div><hr></div><h3>7. <strong>Reasoning Models Struggle to Control their Chains of Thought</strong></h3><p>Chain-of-thought (CoT) monitoring is a promising method for understanding the intents and finding misbehaviors in modern reasoning LLMs.</p><p>This research paper examines whether CoT monitoring is a reliable safety tool or whether reasoning models can control what they reveal in their chain-of-thought (CoT), a phenomenon known as CoT controllability.</p><p>For this, the researchers introduce an evaluation suite called <strong>CoT-Control</strong>, which includes tasks that require reasoning models to solve problems while adhering to CoT instructions. (For example, reasoning about a genetics question without using the word &#8220;chromosome&#8221;.)</p><p>Experiments show that:</p><ul><li><p>Reasoning models have significantly lower CoT controllability than output controllability. For example, Claude Sonnet 4.5 can control its CoT only 2.7% of the time, but 61.9% of the time when controlling its final output. </p></li><li><p>CoT controllability is higher for larger models and decreases with more RL training, test-time compute, and increased problem difficulty. </p></li><li><p>CoT controllability failures extend even to situations in which models are given incentives (as opposed to direct requests) to evade CoT monitors</p></li><li><p>Models exhibit slightly higher controllability when they are told they are being monitored. </p></li><li><p>Eliciting controllability by adversarially optimizing prompts does not meaningfully increase controllability. </p></li></ul><p>The overall results indicate that CoT controllability is currently unlikely to be a failure mode of CoT monitorability and must be tracked by frontier labs in future models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Y9w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Y9w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 424w, https://substackcdn.com/image/fetch/$s_!6Y9w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 848w, https://substackcdn.com/image/fetch/$s_!6Y9w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 1272w, https://substackcdn.com/image/fetch/$s_!6Y9w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Y9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png" width="1456" height="1110" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1110,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:382348,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Y9w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 424w, https://substackcdn.com/image/fetch/$s_!6Y9w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 848w, https://substackcdn.com/image/fetch/$s_!6Y9w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 1272w, https://substackcdn.com/image/fetch/$s_!6Y9w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb139b2ba-577b-4232-b77f-ccafb57827f0_1778x1356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.05706">using this link</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>8. Phi-4-reasoning-vision-15B </h3><p>This research paper from Microsoft introduces <strong>Phi-4-reasoning-vision-15B</strong>, a 15B-parameter multimodal reasoning model that can process both text and images. </p><p>The model combines the Phi-4-Reasoning backbone with a SigLIP-2 vision encoder in a mid-fusion architecture that converts images into visual tokens for reasoning. </p><p>It performs efficient multimodal reasoning across tasks such as math, science, OCR, document understanding, and GUI interaction. This is made possible by the use of high-resolution visual encoder and by training the model on a mix of reasoning and non-reasoning data with explicit mode tokens. This setup allows the model to provide fast, direct answers for simple tasks and chain-of-thought reasoning for more complex ones. </p><p>The paper shows that the greatest performance improvements come from systematic data filtering, error correction, and the augmentation with synthetic data. This tells us that data quality is the main factor behind how well the model performs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7YQ7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7YQ7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 424w, https://substackcdn.com/image/fetch/$s_!7YQ7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 848w, https://substackcdn.com/image/fetch/$s_!7YQ7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 1272w, https://substackcdn.com/image/fetch/$s_!7YQ7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7YQ7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png" width="1456" height="733" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:733,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:203648,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7YQ7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 424w, https://substackcdn.com/image/fetch/$s_!7YQ7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 848w, https://substackcdn.com/image/fetch/$s_!7YQ7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 1272w, https://substackcdn.com/image/fetch/$s_!7YQ7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb5d6233-1e86-4563-ab45-9490f73de576_1882x948.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.03975">using this link</a>.</p><div><hr></div><h3>9. Helios</h3><p>This research paper presents <strong>Helios</strong>, the first 14B autoregressive diffusion model that can create high-quality minute-scale videos in real time at about 19.5 FPS on a single NVIDIA H100 GPU.</p><p>Helios also supports text-to-video, image-to-video, and video-to-video generation within a single framework, and extensive experiments show that it consistently outperforms prior methods on both short and long-video generation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_BPh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_BPh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 424w, https://substackcdn.com/image/fetch/$s_!_BPh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 848w, https://substackcdn.com/image/fetch/$s_!_BPh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!_BPh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_BPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png" width="1456" height="1292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1292,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2911938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_BPh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 424w, https://substackcdn.com/image/fetch/$s_!_BPh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 848w, https://substackcdn.com/image/fetch/$s_!_BPh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!_BPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd1887-c82a-4798-a3ea-19adf1d20c5d_1458x1294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.04379">using this link</a>.</p><div><hr></div><h3>10. Symbol-Equivariant Recurrent Reasoning Models</h3><p>This research paper introduces <strong>Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs)</strong>, a new addition to the family of Recurrent Reasoning Models (RRMs).</p><p>SE-RRMs extend Recurrent Reasoning Models by explicitly encoding symbol-permutation symmetry in the architecture using symbol-equivariant layers. </p><p>This change ensures that the model provides the same solution even when symbols, such as numbers or colors, are permuted. This reduces the need for extensive data augmentation and improves generalization.</p><p>Experiments show that SE-RRMs outperform previous RRM models on Sudoku and can adapt to different Sudoku grid sizes, which existing RRMs cannot.</p><p>SE-RRMs also achieve competitive performance on ARC-AGI-1 and ARC-AGI-2 with substantially less data augmentation and only 2M parameters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h2fn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h2fn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 424w, https://substackcdn.com/image/fetch/$s_!h2fn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 848w, https://substackcdn.com/image/fetch/$s_!h2fn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!h2fn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h2fn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png" width="1456" height="1120" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1120,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:358084,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.intoai.pub/i/190199560?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h2fn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 424w, https://substackcdn.com/image/fetch/$s_!h2fn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 848w, https://substackcdn.com/image/fetch/$s_!h2fn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!h2fn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2e4f09f-f7e6-4a16-a6e4-b0587b2a6055_1550x1192.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Read more about this research <a href="https://arxiv.org/abs/2603.02193">using this link</a>.</p><p>Read more about the previous RRMs using the following links:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4fdbdc03-11cd-4479-ac84-1e47e20fed15&quot;,&quot;caption&quot;:&quot;&#127873; Become a paid subscriber to &#8216;Into AI&#8217; today at a special 25% discount on the annual subscription.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Hierarchical Reasoning Model: A Deep Dive&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-10T11:22:50.624Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!MqBc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa91bfc18-37e3-4d84-9ea2-faa611ac0336_1408x768.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/hierarchical-reasoning-model&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:170533295,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:22,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;43edb459-c031-4f4c-ac06-4c2bdd2c7ebe&quot;,&quot;caption&quot;:&quot;Into AI thrives thanks to the support of paid subscribers. If you want to access exclusive analysis, in-depth guides, and help this work continue, consider becoming a paid member today. Your support truly makes a difference!&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Tiny Recursive Model (TRM): A Deep Dive&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-16T10:16:57.478Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!nsQx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F990b949c-7da6-4f07-92f3-4204a9771c32_1600x968.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/tiny-recursive-model&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:176082755,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e23e28ae-026d-41ee-bce3-00f18dd9ef21&quot;,&quot;caption&quot;:&quot;Recursive reasoning models are a promising architectural candidate that could lead us towards AGI.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A Deep Dive Into Universal Reasoning Models&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:155457308,&quot;name&quot;:&quot;Dr. Ashish Bamania&quot;,&quot;bio&quot;:&quot;I help you to level up in AI and Quantum Computing &#128104;&#127995;&#8205;&#128187;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1rS7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff41b7f65-55d7-4099-969a-931c2ddd2f5f_612x612.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-31T11:01:52.669Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!iYMg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8918235d-f4f4-425b-8ea5-1bda5910d868_2718x1186.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.intoai.pub/p/universal-reasoning-model&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185674112,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1813260,&quot;publication_name&quot;:&quot;Into AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ea4T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ff07812-0dd7-482f-b6c1-12eee68c4f8c_1080x1080.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><p>This article is entirely free to read. If you loved reading it, restack and share it with others to <strong>earn referral rewards.</strong> &#10084;&#65039;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/p/this-week-in-ai-research-1-7-march?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.intoai.pub/p/this-week-in-ai-research-1-7-march?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p><strong>Don&#8217;t forget to become a paid subscriber at a 25% discount to access all valuable posts in the newsletter.</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.intoai.pub/741bb486&quot;,&quot;text&quot;:&quot;Your 25% off&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.intoai.pub/741bb486"><span>Your 25% off</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B_dw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png" width="1344" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B_dw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 424w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 848w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1272w, https://substackcdn.com/image/fetch/$s_!B_dw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86094759-95d8-4755-b13e-b97b7235f04e_1344x256.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div>]]></content:encoded></item></channel></rss>