<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Jordi Villar</title><description>Data generalist with a strong engineering background</description><link>https://jordivillar.com/</link><item><title>2026W10</title><link>https://jordivillar.com/reads/2026w10/</link><guid isPermaLink="true">https://jordivillar.com/reads/2026w10/</guid><pubDate>Mon, 09 Mar 2026 09:35:28 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://terriblesoftware.org/2026/03/03/nobody-gets-promoted-for-simplicity/&quot;&gt;Nobody Gets Promoted for Simplicity&lt;/a&gt; — The Engineer A vs Engineer B comparison is painfully recognizable. A ships a 50-line solution in days, B builds an event-driven architecture over three weeks, and B gets the promotion because “designed scalable architecture” reads better in a review doc than “solved it simply.” The line that stuck: “anyone can add complexity. It takes experience and confidence to leave it out.” I’ve been on both sides of this — writing the simple solution and feeling like I had to apologize for it, and reviewing someone else’s overengineered system while thinking “this didn’t need to exist.” The interview culture point is sharp too: system design rounds actively punish straightforward answers by pushing “what about ten million users?” until you cave.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://newsletter.manager.dev/p/dont-become-an-engineering-manager&quot;&gt;Don’t become an Engineering Manager&lt;/a&gt; — Zaides used to argue that management experience was universally valuable. Now he’s reversed that position, and the reasons are hard to dismiss: Amazon increased its IC-to-manager ratio by 15%, Staff engineers out-earn EMs across the industry, and the pace of technical change makes stepping away from code genuinely risky. The detail about his friend getting Staff offers paying 20-30% more than his internal EM promotion says a lot about how the market has shifted. What I respect is that Zaides stays an EM anyway because he enjoys it — he’s not telling you to optimize for comp, he’s telling you to stop assuming the management track is the default next step.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.theguardian.com/lifeandstyle/2026/feb/24/stranger-secret-how-to-talk-to-anyone-why-you-should&quot;&gt;The stranger secret: how to talk to anyone – and why you should&lt;/a&gt; — The opening anecdote is what stays with me: an elderly woman on a nearly empty train asks to sit nearby, and they end up talking for 50 minutes. The author senses, without being told, that this woman is probably heading back to an empty house and needs to process her day out loud. Sometimes the most useful thing you can do is just not say no to proximity. Remote work has made me worse at this — I’ve optimized so hard for uninterrupted focus that I’ve forgotten what it’s like to let a random conversation happen.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.mendral.com/blog/llms-are-good-at-sql&quot;&gt;LLMs Are Good at SQL. We Gave Ours Terabytes of CI Logs.&lt;/a&gt; — The counterintuitive move here is denormalization: they stamp 48 metadata columns onto every single log line instead of normalizing into separate tables. It sounds wasteful until you realize ClickHouse compresses repeated values so aggressively that &lt;code&gt;commit_message&lt;/code&gt; hits a 301:1 compression ratio. The agent behavior patterns are interesting too — across 8,534 sessions, they found agents don’t run one clever query, they investigate iteratively like a human would, starting broad then drilling in. Average investigation: 4.4 queries. The freshness point is the real takeaway: “did I break this, or was it already broken?” is the question that actually matters, and you can only answer it with current data.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pseudosingleton.com/leaving-google-improved-my-life/&quot;&gt;Leaving Google has actively improved my life&lt;/a&gt; — The most telling detail isn’t about Google’s products — it’s about distribution. Google pays Apple $20B annually to stay the default on iOS, and Chrome holds ~70% browser market share. People don’t choose Google, they just don’t un-choose it. The author’s point about search is the one I keep coming back to: switching to alternatives turned “Googling” back into “surfing the web,” reconnecting with actual discovery instead of algorithm-filtered answers. I’ve made similar switches and the difference is real, though I’ll admit YouTube is the one I can’t quit either.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://moultano.wordpress.com/2026/02/22/the-hunt-for-dark-breakfast/&quot;&gt;The Hunt for Dark Breakfast&lt;/a&gt; — This is the kind of thing the internet was made for. The author maps breakfast foods onto a simplex using milk, eggs, and flour ratios, identifies known clusters (the Pancake Local Group, the Egg Singularity), and discovers a gap — a theoretical breakfast that should exist but doesn’t. The breakthrough: IHOP adds pancake batter to omelettes, which means the “Dark Breakfast Abyss” is occupiable. There’s even a recipe: ¼ cup milk, 4 eggs, ½ cup flour, instructions unknown. Ending the whole thing with a Lovecraft quote about humanity’s inability to correlate all its contents is the perfect touch.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://taalas.com/the-path-to-ubiquitous-ai/&quot;&gt;The path to ubiquitous AI&lt;/a&gt; — The ENIAC parallel is the framing that makes the technical claims click: room-sized and impractical became ubiquitous through specialization, and Taalas is betting the same applies to inference. Hard-wiring a single model (Llama 3.1 8B) into custom silicon to get 17K tokens/second at 20x lower cost and 10x lower power is a bold trade-off — you lose flexibility entirely but gain numbers that would change what’s economically viable. The aggressive quantization (3-bit and 6-bit) introduces quality degradation, so the real question is whether “good enough but everywhere” beats “better but expensive.” They spent $30M of $200M raised with a 24-person team, which at least shows discipline.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2026W07</title><link>https://jordivillar.com/reads/2026w07/</link><guid isPermaLink="true">https://jordivillar.com/reads/2026w07/</guid><pubDate>Sun, 22 Feb 2026 12:39:09 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://paoramen.fika.bar/hedonism-and-entrepreneurship-in-barcelona-01KGJKT719W1KGG16JYZ4Y7Y5S&quot;&gt;Hedonism and Entrepreneurship in Barcelona&lt;/a&gt; — A potential acquisition dies because a vegan exec unknowingly eats mayo on patatas bravas. A critical migration stalls at 69% while the engineer responsible plays ping-pong. Having lived the Barcelona startup scene myself, this one hit close. The absurdity is the point — startup success is so much more arbitrary than anyone wants to admit, and the line between “almost rich” and “back to Monday” is thinner than your Series A pitch deck suggests.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://steipete.me/posts/2026/openclaw&quot;&gt;OpenClaw, OpenAI and the future&lt;/a&gt; — After 13 years building PSPDFKit, Steinberger explicitly chose not to build another company. Instead he’s joining OpenAI and turning OpenClaw into a foundation. I keep thinking about the tension here: he wants to keep things open and independent, but the path to maximum impact runs through one of the most powerful closed labs in the world. Whether that trade-off holds up depends on how seriously OpenAI takes the foundation model. Time will tell.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.foxtrotluna.social/theyre-putting-blue-food-coloring-in-everything/&quot;&gt;They’re putting blue food coloring in everything&lt;/a&gt; — It’s not about blue food, obviously. It’s about how unwanted things get normalized — first one restaurant does it, then “all the best restaurants” do it, then your friend sneaks it into homemade food and tells you you’re overreacting. The detail that got me is the protagonist saying “I just think it tastes weird” and being told “most people say it’s just fine.” I’ve had that exact conversation about too many things in tech.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html&quot;&gt;Why I joined OpenAI&lt;/a&gt; — The turning point wasn’t a demo or a benchmark — it was his hairstylist casually mentioning she uses ChatGPT all the time. She recognized ChatGPT as a brand more readily than Intel, where Gregg was a Fellow. That contrast says more about where computing impact has shifted than any industry report. The environmental framing is interesting too: at this scale, performance engineering isn’t just cost optimization, it’s resource consumption with real planetary consequences.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.iankduncan.com/engineering/2026-02-05-github-actions-killing-your-team/&quot;&gt;GitHub Actions Is Slowly Killing Your Engineering Team&lt;/a&gt; — The fact that an entire cottage industry of startups exists solely to fix GitHub Actions’ slow runners tells you everything. The comparison to Internet Explorer is spot on — it wins because it ships with the thing, not because it’s good. I’ve felt the pain of debugging through four pages of loading spinners to find a failed step, and the escape hatch of “just write a bash script” that inevitably becomes 800 lines of unmaintainable CI logic. Default integration is a powerful moat, even for mediocre products.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://danwang.co/2025-letter/&quot;&gt;2025 letter&lt;/a&gt; — Wang’s framing of the US economy as “a highly leveraged bet on deep learning” stuck with me. The Xiaomi vs Apple comparison is brutal: Xiaomi shipped an EV in 4 years while Apple spent 10 years and $10B before giving up entirely. His concept of Silicon Valley’s “soft Leninism” — groupthink disguised as meritocracy — is uncomfortable because it’s hard to argue against when you see how the industry moves in lockstep. Not sure I fully buy the symmetry he draws between SV and the CCP’s self-seriousness, but it made me think.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Four Years at Tinybird</title><link>https://jordivillar.com/blog/tinybird/</link><guid isPermaLink="true">https://jordivillar.com/blog/tinybird/</guid><pubDate>Sat, 21 Feb 2026 11:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I worked at Tinybird for 4 years, 2 months, and 21 days. It’s been an incredible experience that taught me a lot about startups, engineering, and myself. Still, it feels like I joined a lifetime ago.&lt;/p&gt;
&lt;p&gt;At this point, you’re probably wondering why? Why are you writing this post about leaving a startup?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One of the things I enjoyed most about my early days at Tinybird was the long-form writing culture. It allowed us to express ideas and thoughts, share knowledge, and collaborate. Something difficult to see nowadays where everything should be instantaneous and require low effort. So I see writing this down as a way to pay tribute to the culture that shaped all these years.&lt;/li&gt;
&lt;li&gt;I tend to forget things I’ve contributed to and things I learned. Especially on multiyear journeys like this one. I’m just using this post as a way to remember and reflect on my time at Tinybird.&lt;/li&gt;
&lt;li&gt;Let’s be honest, who wouldn’t want to read personal experiences from people who have been through the ups and downs of a startup before joining it?&lt;/li&gt;
&lt;li&gt;Last but not least, &lt;a href=&quot;https://rbarbadillo.github.io/tinybird&quot;&gt;Raquel wrote something similar&lt;/a&gt; and I think it’s a good example to follow (I’ll probably steal the structure from her post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let me start with the basics.&lt;/p&gt;
&lt;h3 id=&quot;what-is-tinybird&quot;&gt;What is Tinybird?&lt;/h3&gt;
&lt;p&gt;I struggled every time someone asked me what &lt;a href=&quot;https://tinybird.co&quot;&gt;Tinybird&lt;/a&gt; is, especially for non-technical people. It was a challenge to explain the product in a concise way. I found myself directly explaining some use cases people can easily relate to.&lt;/p&gt;
&lt;p&gt;Let me try to do better now that I have time to think it through. It’s going to be funny if this is the first time I manage to explain it well, since it’s going to be the last time I have to.&lt;/p&gt;
&lt;p&gt;Tinybird is a managed ClickHouse solution that simplifies the process of building and deploying real-time data pipelines. If you are already familiar with ClickHouse, you can skip the next part.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://clickhouse.com/&quot;&gt;ClickHouse&lt;/a&gt; is an open-source column-oriented database management system designed for fast analytical queries over large datasets. It is known for its high performance, scalability, and ease of use, until it’s not. Running ClickHouse in production can be challenging, especially when it comes to managing low-latency and high-throughput use cases.&lt;/p&gt;
&lt;p&gt;Tinybird simplifies this process by providing a managed service that takes care of real-time data ingestion, database updates, query optimization, schema migrations, etc. It comes with a lot of developer experience sugar, so developers can focus on building their applications without worrying about the underlying infrastructure.&lt;/p&gt;
&lt;h3 id=&quot;why-tinybird&quot;&gt;Why Tinybird?&lt;/h3&gt;
&lt;p&gt;The path to Tinybird wasn’t straightforward. I went through the interview process and didn’t make the cut initially. A few weeks later, &lt;a href=&quot;https://x.com/javisantana&quot;&gt;Javi&lt;/a&gt; called me back with an offer. They had chosen another candidate who didn’t work out, and I got a second chance. I always remind him this as one of his biggest hiring mistakes, not because I was a bad hire, but because he almost let me go to someone else.&lt;/p&gt;
&lt;p&gt;Getting that callback felt nice. People always say “we’ll keep you in mind if another position opens up,” but nobody actually does. They did.&lt;/p&gt;
&lt;p&gt;At the time, I was Lead Software Engineer at &lt;a href=&quot;https://www.kogniasports.com/&quot;&gt;Kognia Sports&lt;/a&gt;. The opportunity to join Tinybird was compelling for several reasons. Being from Spain, I knew Tinybird’s reputation, they were well-known in the Spanish tech scene. The product itself was amazing. I remember telling them during the interview that I would have used Tinybird at previous companies if it had existed. And the team was exceptionally talented. I didn’t know them personally, but I had colleagues like &lt;a href=&quot;https://x.com/matallo&quot;&gt;Matallín&lt;/a&gt; who had worked with some of them before and had only good words about the experience.&lt;/p&gt;
&lt;p&gt;What struck me during the interviews was how passionate everyone was about what they were building. The talent density was obvious, these were people who deeply understood the problem space and were excited to solve it.&lt;/p&gt;
&lt;p&gt;I joined in December 2021. The company was around 20 people at the time. Over the next few years, we’d grow to 100, then eventually contract back to 45.&lt;/p&gt;
&lt;p&gt;My path inside the company was particular. I joined the Customer Success team, which was scary at first. I’d never been in direct contact with clients and wasn’t sure I could manage it. But the experience turned out to be amazing. Most of the people I worked with were technical, so it felt more like working as a Data Engineer collaborating with another team in the same company than traditional customer success work.&lt;/p&gt;
&lt;p&gt;After collaborating with many companies and building very challenging use cases, I started learning ClickHouse internals on my own. After about a year, I moved to the ClickHouse team to work alongside &lt;a href=&quot;https://x.com/algunenano&quot;&gt;Marín&lt;/a&gt;. It was an incredible learning experience, I went deep into low-level and systems-related work that I’d never touched before.&lt;/p&gt;
&lt;p&gt;Eventually, I was promoted to Staff Engineer, where I led horizontal projects and important company-wide initiatives like &lt;a href=&quot;https://www.tinybird.co/blog/tinybird-is-local-first&quot;&gt;Forward&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;culture&quot;&gt;Culture&lt;/h3&gt;
&lt;p&gt;One of Tinybird’s defining characteristics was its emphasis on ownership and autonomy. You weren’t given a detailed roadmap with step-by-step tasks. Instead, you were expected to identify problems, propose solutions, and execute them. While working in Customer Success, it was common for me to fix any bug I found in the process. &lt;a href=&quot;https://x.com/a_delamo&quot;&gt;Del Amo&lt;/a&gt; was my buddy when I joined and embodied this mindset perfectly.&lt;/p&gt;
&lt;p&gt;This level of autonomy was amazing, but overwhelming at first. I remember feeling like I was going to get fired during the first two weeks. Many people felt this way at Tinybird, not just because of the autonomy, but because of the sheer talent surrounding you. When everyone around you is exceptionally skilled, it’s easy to question whether you belong.&lt;/p&gt;
&lt;p&gt;The long-form writing culture I mentioned earlier was more than just a communication preference, it was a way of thinking. We wrote RFCs, postmortems, and design docs. But we also documented insights from deep analyses and explorations. At the time, we were using Basecamp, and its structure naturally encouraged thoughtful, long-form writing instead of quick, throwaway messages. Writing forced clarity. You couldn’t hand-wave through a proposal or hide fuzzy thinking behind jargon. If you couldn’t articulate your reasoning in writing, you probably hadn’t thought it through.&lt;/p&gt;
&lt;p&gt;After we migrated to Slack, we lost this. It was a significant cultural loss.&lt;/p&gt;
&lt;p&gt;The most important value at Tinybird was bias toward action. Many people would join the company and start suggesting things we could do, creating tickets and waiting for consensus. The culture was to just do things. Don’t wait for permission or perfect alignment—build it, ship it, learn from it.&lt;/p&gt;
&lt;p&gt;We had a strong CI/CD pipeline, so we deployed to production every day. This came with risks, but it was the fastest way to move. We believed in putting things in front of clients as soon as possible to start collecting feedback, rather than discussing and theorizing indefinitely.&lt;/p&gt;
&lt;p&gt;Another thing that surprised me from the beginning, every opinion mattered. It didn’t matter if you were junior or senior, new or tenured, if you had a perspective, people listened.&lt;/p&gt;
&lt;h3 id=&quot;remote&quot;&gt;Remote&lt;/h3&gt;
&lt;p&gt;Remote work sounds great in theory. No commute, work from anywhere, flexible schedule. The reality is more nuanced.&lt;/p&gt;
&lt;p&gt;Tinybird was remote from the beginning, the company started during the COVID pandemic. While the flexibility was amazing, the lack of in-person interaction made certain things harder. Building relationships with new team members was difficult. You can’t replicate the spontaneous hallway conversations or the quick desk drive-by to unblock something. Everything becomes scheduled, formal, asynchronous.&lt;/p&gt;
&lt;p&gt;There’s also a persistent feeling of solitude. You’re alone at home, working through problems by yourself. The casual conversations where you might share frustrations or uncertainties simply don’t happen when you don’t know people yet.&lt;/p&gt;
&lt;p&gt;Working remotely with exceptionally talented people comes with its own challenge: impostor syndrome. It’s much harder to get to a reasonable pace when you’re remote and starting out. When everyone around you seems to effortlessly ship complex features, solve hard problems, or grasp concepts faster than you, it’s easy to feel like you don’t belong. The remote aspect amplifies this as you don’t see others struggle, only their polished output. And you’re sitting alone at home, wondering if you’re the only one who doesn’t get it.&lt;/p&gt;
&lt;h3 id=&quot;and-the-gossip&quot;&gt;And the gossip?&lt;/h3&gt;
&lt;p&gt;If you know me, you’ll be expecting a bit of gossip but I’m sorry not this time. Instead, there are a few hard truths about things that didn’t work so well. After 4 years, 2 months, and 21 days, it was time to move on. Not because of any single catastrophic event, but because of accumulated weight.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The leadership structure added its own complexity. Tinybird has five founders. Having that many founders can make certain situations difficult, especially when they disagree. During some periods, &lt;a href=&quot;/blog/becoming-irrelevant&quot;&gt;I was considered middle management&lt;/a&gt;, which was challenging to navigate and didn’t work well. There wasn’t much space for that layer, especially when the founders already covered everything at the company’s current size.&lt;/li&gt;
&lt;li&gt;The company grew, new people joined, and the old ways of doing things were replaced with new ones. The culture I started with slowly changed or disappeared. The thing that hurts me most is the loss of the long-form writing. Once we started using Slack people started to communicate in short messages, which made it harder to have deep conversations and keep decision-making focused in a single place.&lt;/li&gt;
&lt;li&gt;Much of the great talent that brought me to Tinybird eventually left. It’s normal at companies like this. Rotation is something you have to accept, but at some point, you look left and right and see that everybody you’ve been working and learning from is gone.&lt;/li&gt;
&lt;li&gt;Startups evolve. Priorities shift. Strategies change. That’s expected, even necessary. But we had at least one major product direction change every year. While this is understandable for a startup trying to find product-market fit, it takes a lot of energy. I don’t mind throwing away work, that’s part of the process, but the constant context switching and reorientation is distracting and draining.&lt;/li&gt;
&lt;li&gt;Four years at a demanding startup is a long time. The demand came from constant firefighting. The initial excitement that fuels you through the first year or two eventually gives way to fatigue. The problems you’re solving start to feel familiar. The pace that once energized you starts to drain you.&lt;/li&gt;
&lt;li&gt;Recent organizational changes made it clear it was time for something new.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What I’m looking for next is clear: technical challenges and a seat at the table to have real impact.&lt;/p&gt;
&lt;p&gt;I’m grateful for the experience. I grew more as an engineer in these four years than I could have imagined. I developed deep systems knowledge and low-level technical expertise I never had before. I’m proud of many projects, Forward and the &lt;a href=&quot;/blog/query-booster&quot;&gt;Query Booster&lt;/a&gt; among them. But also of thoughtful investigations, difficult bug discoveries, and especially being able to have an impact on colleagues’ careers.&lt;/p&gt;
&lt;p&gt;Most importantly, I managed to have fun while working hard. And I made really good friends along the way.&lt;/p&gt;
&lt;p&gt;To everyone still at Tinybird: I appreciate the founders tremendously, especially &lt;a href=&quot;https://x.com/rochoa&quot;&gt;Ochoa&lt;/a&gt; and Javi. And to the many team members I’ve worked closely with over these years, thank you.&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fsantander.ZTIKfQzH.png&amp;#38;w=1200&amp;#38;q=100&quot; alt=&quot;Santander music fest&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1184&quot; width=&quot;1184&quot; height=&quot;1578&quot;&gt;
&lt;small&gt;Me, Raquel, Nuria, and Enric&lt;/small&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fwedding.BZq0yFRA.png&amp;#38;w=1200&amp;#38;q=100&quot; alt=&quot;Del Amo&apos;s wedding&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1440&quot; width=&quot;1440&quot; height=&quot;1080&quot;&gt;
&lt;small&gt;Filete, me, Del Amo, Javi, Rafa, and Nuria&lt;/small&gt;</content:encoded></item><item><title>2026W05</title><link>https://jordivillar.com/reads/2026w05/</link><guid isPermaLink="true">https://jordivillar.com/reads/2026w05/</guid><pubDate>Sun, 08 Feb 2026 10:53:56 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://mitchellh.com/writing/my-ai-adoption-journey&quot;&gt;My AI Adoption Journey&lt;/a&gt; — The most practical thing here is the “reproduce your own work” phase: doing tasks manually and then redoing them with agents. Painful, but it forces you to learn where agents actually help versus where you’re just cargo-culting. I also liked the idea of end-of-day agent sessions to get warm starts the next morning. What resonated most is his framing around craftsmanship rather than ideology — he doesn’t care if AI is here to stay, he just wants to do good work faster.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://grantslatton.com/nobody-cares&quot;&gt;Nobody cares&lt;/a&gt; — The examples hit hard because they’re so mundane: bike lanes designed to kill you, gym weights left unracked, dog waste everywhere. It’s not malice, it’s indifference, and that’s almost worse. I keep thinking about the author’s failed experiment of installing dog waste dispensers in his neighborhood hoping it would snowball into community care. It didn’t. The Japan comparison is interesting but I’m not sure “will to have nice things” is something you can transplant — it might be downstream of much deeper cultural structures.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://fosdem.org/2026/schedule/event/N7MVZT-hotpatching-clickhouse-with-llvm-xray/&quot;&gt;Hotpatching ClickHouse in production with XRay&lt;/a&gt; — The fact that you can inject log statements into a running ClickHouse instance via SQL is remarkable. &lt;code&gt;SYSTEM INSTRUMENT ADD LOG &apos;QueryMetricLog::startQuery&apos; &apos;...&apos;&lt;/code&gt; and it just works. The 4-7% binary size overhead with negligible runtime cost when inactive makes it a reasonable trade-off. This solves the one problem every production debugger knows: wishing you’d added one more log statement before deploying.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.jernesto.com/articles/thinking_hard&quot;&gt;I miss thinking hard.&lt;/a&gt; — The Builder vs Thinker framing put words to something I’ve been feeling. When AI gives you a 70% solution, rejecting it to think harder feels economically irrational — even though that struggle is where the real growth happens. The honest part is the ending: there’s no resolution. He doesn’t pretend to have figured out how to balance velocity with depth, and I respect that more than a tidy answer.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://davidgasquez.com/barefoot-data-platforms/&quot;&gt;Barefoot Data Platforms&lt;/a&gt; — This is the kind of pragmatism I appreciate: rip out Dagster and dbt, replace them with plain scripts that have metadata headers, and let agents iterate on isolated files. The constraint of “under 200 assets” is refreshingly honest about scope. Most data platform posts sell you a cathedral — this one says a well-organized shed is enough for most teams, and it probably is.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://buttondown.com/jaffray/archive/my-first-distributed-system/&quot;&gt;My First Distributed System&lt;/a&gt; — The Pokémon cloning exploit as a distributed systems lesson is one of those analogies that actually holds up under scrutiny. Two Game Boys linked by a cable are a real distributed system with real partial failure modes. The proposed escrow state fix is textbook two-phase commit, and the trade-off is clear: you can’t clone Pokémon anymore, but you might lose both if the cable disconnects mid-trade. Classic distributed systems — pick your failure mode.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://alexharri.com/blog/ascii-rendering&quot;&gt;ASCII characters are not pixels: a deep dive into ASCII rendering&lt;/a&gt; — I don’t know how I ended up reading this, but the core insight stuck: instead of mapping brightness to characters, you use 6-dimensional shape vectors to match character geometry to image regions. That’s why an &lt;code&gt;L&lt;/code&gt; character gets placed where there’s an L-shaped edge in the image. The contrast enhancement technique — sampling into neighboring cells to push boundaries — is borrowed from cel-shading, which is a connection I wouldn’t have expected.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://fly.io/blog/design-and-implementation/&quot;&gt;The Design &amp;#x26; Implementation of Sprites&lt;/a&gt; — Three design choices that make Sprites interesting: no container images (pre-staged base instances for instant creation), object storage for disks (durable state becomes “just a URL”), and inside-out orchestration (user code in an inner container, platform services in the root namespace). The storage stack using JuiceFS-style chunk splitting with local SQLite metadata is clever — it makes VM migration trivial since there’s no attached volume to move.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://valyala.medium.com/wal-usage-looks-broken-in-modern-time-series-databases-b62a627ab704&quot;&gt;WAL usage looks broken in modern Time Series Databases?&lt;/a&gt; — The core argument is that most TSDBs don’t actually fsync their WAL on every write, so the safety guarantee is weaker than you’d expect. Prometheus fsyncs WAL segments every 2 hours by default. Two hours. The author’s proposed alternative — buffer in memory and flush to SSTables — gives similar guarantees with less complexity. Though Ayende’s pushback is worth reading too: WAL implementations handle partial writes by rolling back incomplete transactions, which the article glosses over.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://modal.com/blog/gpu-health&quot;&gt;Keeping 20,000 GPUs healthy&lt;/a&gt; — The number that stands out: GPU issues account for 58.7% of all unexpected problems in production. Modal has never seen a degraded CPU core, but GPUs fail constantly — thermal throttling at 94°C, uncorrectable ECC errors clustering in specific regions, 0.1% CUDA initialization flake rates on L4s. Their approach of not attempting recovery and just disposing of unhealthy hosts is pragmatic. GPU reliability is years behind CPU reliability, and this post makes that concrete.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ordep.dev/posts/standups&quot;&gt;Standups&lt;/a&gt; — The argument that your teammates should already know what you did yesterday — through commits, PRs, and notifications — is one of those obvious truths that most teams ignore. I’ve been in standups that were pure “are you done yet?” rituals disguised as collaboration. The proposal to replace status reports with written updates and reserve meeting time for actual problem-solving is straightforward, but getting teams to actually make that switch requires someone willing to challenge the ceremony.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.tansu.io/articles/performance-tuning-i&quot;&gt;Tuning Tansu: 600,000 record/s with 13MB of RAM&lt;/a&gt; — What I liked about this is the methodology: using a null storage engine to isolate protocol overhead from I/O. The biggest wins came from reducing allocations during serialization and fixing eager evaluation of &lt;code&gt;Option&lt;/code&gt; and &lt;code&gt;Result&lt;/code&gt; methods — small things that compound. Getting from 3.7s to 1.9s on codec benchmarks (49% improvement) before even touching storage shows how much performance gets left on the table in the protocol layer.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2026W04</title><link>https://jordivillar.com/reads/2026w04/</link><guid isPermaLink="true">https://jordivillar.com/reads/2026w04/</guid><pubDate>Sun, 01 Feb 2026 11:19:27 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://antirez.com/news/159&quot;&gt;Automatic Programming&lt;/a&gt; — The distinction antirez makes here finally gave me language for something I’ve been fumbling with. It’s not about whether you use AI to write code—it’s about whether you’re steering or just prompting and hoping. His Redis example hit hard: the value wasn’t in technical novelty but in the contained vision. That maps to what I see working well versus the codebases that feel like they emerged from a chatbot fever dream.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://adamwiggins.com/posts/triage-embedding-classifier/&quot;&gt;Email triage with an embedding-based classifier&lt;/a&gt; — This outperformed a fine-tuned GPT by 11 percentage points while being dramatically faster. The separation of concerns makes sense: embeddings handle “understand the email” while logistic regression handles “what does this user care about.” People keep defaulting to LLMs when something simpler would work better. Worth remembering that the expensive part doesn’t need to run every time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://cedardb.com/blog/string_compression/&quot;&gt;Efficient String Compression for Modern Database Systems&lt;/a&gt; — The insight that compression is primarily about query performance, not storage, reframes the whole tradeoff. Getting data to fit in L1 cache (1ns access) versus RAM fundamentally changes what operations cost. FSST’s approach of building a symbol table from sample data feels like the kind of clever-but-not-too-clever technique that actually ships.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://tonystr.net/blog/git_immitation&quot;&gt;I made my own git&lt;/a&gt; — “Git is just a content-addressable file store” is one of those realizations that makes everything else click. What stuck with me is that parsing was harder than the actual version control logic. We treat Git like it’s complicated, but the core idea is almost trivial—it’s the interface that makes it feel like a black box.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://buttondown.com/jaffray/archive/online-asynchronous-schema-change-in-f1/&quot;&gt;Online, Asynchronous Schema Change in F1&lt;/a&gt; — The intermediate states approach is elegant: you can’t jump from no-index to index safely, but you can chain compatible transitions. Delete-only and write-only states let nodes migrate without corrupting data. This feels like the kind of solution that’s obvious after you see it but probably took years to figure out. Makes me think about what other distributed systems problems have similar chain-of-compatibility solutions.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://lalitm.com/post/why-senior-engineers-let-bad-projects-fail/&quot;&gt;Why Senior Engineers Let Bad Projects Fail&lt;/a&gt; — “Being right and being effective are different” cuts through so much noise. The credibility-as-currency framing explains behavior I’ve seen but couldn’t articulate. You don’t get credit for disasters you prevent, only for the battles you pick and win. Still processing whether this is pragmatic wisdom or just resignation to broken systems.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.fromjason.xyz/p/notebook/slop-is-everywhere-for-those-with-eyes-to-see/?utm_source=hackernewsletter&amp;#x26;utm_medium=email&amp;#x26;utm_term=fav&quot;&gt;Slop is Everywhere For Those With Eyes to See&lt;/a&gt; — The 90-9-1 rule creates a structural problem: platforms need infinite content but only 1-3% of users create anything. Algorithms fill that gap with slop because engagement matters more than quality. The behavioral science point about effort and meaning landed—when everything is effortless to access, nothing feels valuable. I’ve been noticing this with technical content too, not just social media.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.seangoedecke.com/how-i-estimate-work/&quot;&gt;How I estimate work&lt;/a&gt; — “Only the known work can be accurately estimated, but unknown work takes 90% of the time” explains why estimation always feels broken. The reframe that estimates are political negotiation tools, not technical predictions, matches every project I’ve seen. Managers arrive with timelines, engineers figure out what fits. Treating it as a prediction problem sets everyone up for disappointment.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://openai.com/index/scaling-postgresql/&quot;&gt;Scaling PostgreSQL to power 800 million ChatGPT users | OpenAI&lt;/a&gt; — The challenges they describe—connection pooling, read replica lag, vacuum tuning, lock contention—are exactly what you hit at high throughput. Nothing novel but it’s validating to see that even at ChatGPT scale, you’re fighting the same PostgreSQL battles. Sometimes the answer to “how do they do it?” is just “they do the same things, but more carefully.”&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2026W03</title><link>https://jordivillar.com/reads/2026w03/</link><guid isPermaLink="true">https://jordivillar.com/reads/2026w03/</guid><pubDate>Sun, 25 Jan 2026 12:34:09 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.seangoedecke.com/addicted-to-being-useful/&quot;&gt;I’m addicted to being useful&lt;/a&gt; — This hit close. The idea that some of us are fundamentally wired to solve problems, and engineering just happens to fit that dysfunction perfectly. I’ve been thinking about whether my drive to build things comes from genuine interest or just this compulsion to be useful. His point about how this actually protects against burnout when you can satisfy it rings true.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://rushter.com/blog/clickhouse-strings/&quot;&gt;How ClickHouse handles strings&lt;/a&gt; — The overlapping memory reads technique is wild. Reading the same characters multiple times on purpose because it makes branch prediction work better. I keep coming back to how much performance work is about understanding what the hardware actually does, not what we think it should do.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/Julian/status/2012649856922481118/?rw_tt_thread=True&quot;&gt;How to figure out what to do with your life&lt;/a&gt; — I’m at this exact crossroads. The thread didn’t load properly but the title alone captures where I am right now.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.yakkomajuri.com/blog/raising-money-fucked-me-up&quot;&gt;Raising money fucked me up&lt;/a&gt; — The part about projecting expectations onto yourself hit hard. He realized his investors weren’t actually pressuring him, he was doing it to himself. I see this pattern in how I think about my own work. The shift from “what problem does this solve” to “how big does this feel” is something I need to watch for.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://adamwiggins.com/posts/personal-information-firehose/&quot;&gt;Personal information firehose&lt;/a&gt; — A personal algorithm that learns what matters to you without manual filters. The idea feels right but the obstacles are real - training on small personal datasets instead of massive global ones, working across fragmented channels, dealing with restricted APIs. Still figuring out if this is technically possible or just wishful thinking.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://antirez.com/news/158&quot;&gt;Don’t fall into the anti-AI hype&lt;/a&gt; — Antirez building a BERT inference library in pure C with AI assistance in hours instead of weeks made this concrete for me. The skill that matters is knowing what to build and how to guide the tool, not typing the code yourself. I’m still working through what this means for how I spend my time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://candost.blog/the-unbearable-joy-of-sitting-alone-in-a-cafe/?utm_source=hackernewsletter&amp;#x26;utm_medium=email&amp;#x26;utm_term=fav&quot;&gt;The Unbearable Joy of Sitting Alone in A Café&lt;/a&gt; — The observation that you can’t control what other people think, revealed just by sitting still without your phone. I don’t do this enough. There’s something about being alone in public that forces a different kind of attention than being alone at home.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/jshchnz/status/2009372836419248263/?s=12&amp;#x26;rw_tt_thread=True&quot;&gt;If you think @Sentry isn’t serious about AI, I’d recommend…&lt;/a&gt; — David Cramer’s internal push for everyone at Sentry to go all-in on AI. Watching companies navigate this shift in real time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://rbarbadillo.github.io/2026/01/01/2025/&quot;&gt;~/rbv/2025&lt;/a&gt; — Raquel’s point about almost forgetting to enjoy herself struck me. Also her take that intellectual giftedness is less important than finding people who actually want to understand you. The whole review feels like someone recalibrating after a hard year.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://marginalrevolution.com/marginalrevolution/2018/12/deconstructing-cultural-codes.html&quot;&gt;Deconstructing cultural codes&lt;/a&gt; — Cowen’s approach is to learn as many cultural systems as possible - art, music, industries, religions - to understand how things actually work. The problem is cultural codes are multiplying faster than anyone can learn them. This feels relevant to how I think about understanding different parts of tech.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://muratbuffalo.blogspot.com/2026/01/the-sauna-algorithm-surviving.html?m=1&quot;&gt;The Sauna Algorithm: Surviving Asynchrony Without a Clock&lt;/a&gt; — Using causality instead of time to coordinate. Wait for the person who arrived after you to leave, guaranteeing you stayed long enough. The sauna framing makes the distributed systems concept click in a way that formal definitions don’t. Events ordered by what caused what, not by clock time.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Trust AI, But Verify</title><link>https://jordivillar.com/blog/trust-but-verify/</link><guid isPermaLink="true">https://jordivillar.com/blog/trust-but-verify/</guid><pubDate>Sat, 17 Jan 2026 12:40:00 GMT</pubDate><content:encoded>&lt;p&gt;I wanted to understand buffer pool replacement policies better. So I asked Claude to run an experiment: implement LRU, Clock, LFU, and ARC, benchmark them under different workloads, tell me which one’s best.&lt;/p&gt;
&lt;p&gt;Claude wrote 2,000 lines of C++. Compiled it. Ran the benchmarks. Gave me results.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;“Simple LRU performs just as well as sophisticated ARC.”&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I almost bought it.&lt;/p&gt;
&lt;h3 id=&quot;the-results-looked-credible&quot;&gt;The Results Looked Credible&lt;/h3&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Zipfian Workload:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;LRU:  80.7% hit rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;ARC:  80.4% hit rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Buffer Size Scaling:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;10% buffer: LRU 66.1%, ARC 66.0%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;20% buffer: LRU 75.3%, ARC 75.2%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;30% buffer: LRU 80.4%, ARC 80.1%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;ARC consistently performed worse than LRU. Not by much, fractions of a percent, but consistently.&lt;/p&gt;
&lt;p&gt;The analysis made sense too:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;“ARC’s complexity adds overhead”&lt;/li&gt;
&lt;li&gt;“Simple algorithms often win”&lt;/li&gt;
&lt;li&gt;“Don’t over-engineer”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Classic engineering wisdom. I’ve heard this before. I’ve said this before.&lt;/p&gt;
&lt;p&gt;Here’s the thing: something felt off.&lt;/p&gt;
&lt;p&gt;ARC is supposed to excel when memory is tight. That’s the whole point of the algorithm, it adapts to workload patterns using ghost lists to learn from eviction mistakes.&lt;/p&gt;
&lt;p&gt;But the benchmarks showed it losing at 10% buffer size. At 20%. At every size.&lt;/p&gt;
&lt;p&gt;I don’t know. Maybe ARC is overhyped. Maybe the textbooks are wrong. Maybe simple really is better.&lt;/p&gt;
&lt;p&gt;Or maybe there’s a bug.&lt;/p&gt;
&lt;h3 id=&quot;claude-you-are-wrong&quot;&gt;Claude You Are Wrong&lt;/h3&gt;
&lt;p&gt;“Are you sure there are no bugs in the implementation?”&lt;/p&gt;
&lt;p&gt;Claude looked at the ARC code and found it immediately.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The policy interface tracked frame IDs, but ARC’s ghost lists needed page IDs.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When page 5000 gets evicted from frame 42, the ghost list should remember “page 5000.” Because frame 42 will be reused for a different page immediately.&lt;/p&gt;
&lt;p&gt;The implementation remembered “frame 42.”&lt;/p&gt;
&lt;p&gt;So when the ghost list later saw activity on frame 42, it thought it was page 5000. But it wasn’t. It was page 8123 or whatever got loaded into that frame next.&lt;/p&gt;
&lt;p&gt;The entire adaptive mechanism was learning garbage. ARC was just LRU with extra overhead and broken logic.&lt;/p&gt;
&lt;h3 id=&quot;the-fixed-results&quot;&gt;The Fixed Results&lt;/h3&gt;
&lt;p&gt;We fixed the bug. Re-ran everything.&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Zipfian Workload:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;LRU:  80.7% hit rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;ARC:  81.2% hit rate&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Buffer Size Scaling:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;10% buffer: LRU 66.2%, ARC 71.0% (+4.8%)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;20% buffer: LRU 75.3%, ARC 77.6% (+2.2%)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;30% buffer: LRU 80.8%, ARC 81.6% (+0.7%)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;40% buffer: LRU 84.3%, ARC 84.0%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;50% buffer: LRU 87.0%, ARC 86.4%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now ARC wins when memory is tight. 4.8% better at 10% buffer. 2.2% better at 20%.&lt;/p&gt;
&lt;p&gt;And it loses when memory is plentiful, the overhead costs more than adaptation helps.&lt;/p&gt;
&lt;p&gt;This matches theory. This makes sense.&lt;/p&gt;
&lt;p&gt;The original results were just… wrong.&lt;/p&gt;
&lt;h3 id=&quot;what-scares-me-about-this&quot;&gt;What Scares Me About This&lt;/h3&gt;
&lt;p&gt;Claude didn’t just write buggy code. It:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Compiled the code successfully&lt;/li&gt;
&lt;li&gt;Ran comprehensive benchmarks&lt;/li&gt;
&lt;li&gt;Generated plausible results&lt;/li&gt;
&lt;li&gt;Analyzed those results coherently&lt;/li&gt;
&lt;li&gt;Defended a wrong conclusion confidently&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The code looked clean. The benchmarks ran. The numbers were internally consistent. The analysis sounded reasonable. And it was all based on a bug.&lt;/p&gt;
&lt;p&gt;If I hadn’t known enough about ARC to be suspicious, I would have walked away thinking it’s overhyped.&lt;/p&gt;
&lt;p&gt;Here’s what makes this harder: AI doesn’t hedge. Humans express uncertainty “based on these assumptions” or “this might not account for…” With AI, there’s no signal. Just confident output. Right or wrong, the tone is identical.&lt;/p&gt;
&lt;p&gt;How many times have I trusted AI-generated results without questioning them? How many experiments have I run where I didn’t have enough domain knowledge to know the results looked wrong?&lt;/p&gt;
&lt;p&gt;I don’t know. That’s what bothers me.&lt;/p&gt;
&lt;p&gt;I’m not going to stop using AI. Claude wrote 2,000 lines of C++ in minutes. That’s absurdly productive.&lt;/p&gt;
&lt;p&gt;But I’m changing how I think about it. Treat AI like a really fast junior engineer who’s confident about everything. The code might be great. The analysis might be wrong. The results might be based on a subtle bug.&lt;/p&gt;
&lt;p&gt;What I need to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Actually review the implementation&lt;/li&gt;
&lt;li&gt;Question results that don’t match expectations&lt;/li&gt;
&lt;li&gt;Ask “are you sure?” even when it sounds confident&lt;/li&gt;
&lt;li&gt;Trust my domain knowledge over AI confidence&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That last one is hard. When Claude gives me a detailed analysis with benchmark numbers, it’s easy to think “maybe I’m wrong.” But sometimes I’m not wrong. Sometimes the AI made a mistake. And it’ll defend that mistake with the same confidence it would defend correct results.&lt;/p&gt;
&lt;h3 id=&quot;the-echo-chamber&quot;&gt;The Echo Chamber&lt;/h3&gt;
&lt;p&gt;I keep seeing people on Twitter talking about how good LLMs got at programming tasks. Building entire products in minutes. Writing flawless code. Replacing developers.&lt;/p&gt;
&lt;p&gt;The people saying this loudest are people with technical backgrounds who know what they’re doing. They know how to interpret the results. They know when to trust the AI and when to question it.&lt;/p&gt;
&lt;p&gt;But here’s what worries me: people without that background are listening.&lt;/p&gt;
&lt;p&gt;They hear “AI can build anything” and think that means they don’t need to understand the domain. They can just ask the AI, accept the output, and move on.&lt;/p&gt;
&lt;p&gt;And that works. Until it doesn’t.&lt;/p&gt;
&lt;p&gt;Until the AI writes code with a subtle bug that produces plausible but wrong results. Until it analyzes data and draws confident conclusions from flawed assumptions. Until it builds something that looks right but fails in ways you don’t have the expertise to notice.&lt;/p&gt;
&lt;p&gt;The people celebrating AI’s capabilities have the knowledge to catch these mistakes. They’re using AI as a force multiplier for their existing expertise.&lt;/p&gt;
&lt;p&gt;The people who don’t have that expertise? They’re using AI as a replacement for it.&lt;/p&gt;
&lt;p&gt;And the AI won’t tell them the difference.&lt;/p&gt;
&lt;h3 id=&quot;how-i-actually-use-ai&quot;&gt;How I Actually Use AI&lt;/h3&gt;
&lt;p&gt;I don’t want this to sound like “AI is dangerous, don’t use it.” Because I use it constantly.&lt;/p&gt;
&lt;p&gt;I use Claude to learn new topics. I’ll feed it a lecture or paper and have it quiz me, Socratic method style. It asks questions, I answer, it pushes back. That’s incredibly valuable for understanding new concepts.&lt;/p&gt;
&lt;p&gt;I use it to understand codebases. “Why does this implementation use X instead of Y?” “What are the trade-offs here?” Claude explains things I could figure out myself, but in minutes instead of hours.&lt;/p&gt;
&lt;p&gt;I use it to write code and run experiments. Like this buffer pool benchmark. Claude wrote 2,000 lines I would have spent days on.&lt;/p&gt;
&lt;p&gt;I use it to review my writing. This very blog post. I write a draft, Claude suggests changes, I decide what to keep.&lt;/p&gt;
&lt;p&gt;Here’s what all these have in common: &lt;strong&gt;I have domain knowledge in what I’m doing.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When Claude quizzes me on database systems, I know enough to recognize bad questions or wrong corrections. When it explains code, I can tell if the explanation matches what I see. When it writes benchmarks, I can spot when results don’t match theory. When it reviews my writing, I know my voice well enough to reject suggestions that don’t fit.&lt;/p&gt;
&lt;p&gt;AI isn’t teaching me new skills. It’s augmenting what I already know how to do. It’s compression of time, not replacement of expertise.&lt;/p&gt;
&lt;p&gt;And that’s the gap I keep seeing people miss.&lt;/p&gt;
&lt;h3 id=&quot;what-im-still-working-out&quot;&gt;What I’m Still Working Out&lt;/h3&gt;
&lt;p&gt;I don’t have clean answers here. Just things I’m trying to figure out:&lt;/p&gt;
&lt;p&gt;How much domain knowledge is enough to use AI safely? When I’m learning something completely new, am I equipped to catch AI’s mistakes? Or do I need to learn the basics the hard way first?&lt;/p&gt;
&lt;p&gt;How do I know when I’m out of my depth? When the AI confidently explains something I don’t understand, how do I tell the difference between “this is new to me but correct” and “this is plausible-sounding garbage”?&lt;/p&gt;
&lt;p&gt;How do I balance productivity gains against verification overhead? If reviewing AI output takes as long as writing it myself, what’s the point? But if I don’t review it, I risk shipping bugs like the ARC implementation.&lt;/p&gt;
&lt;p&gt;I don’t know yet. I’m figuring it out as I go.&lt;/p&gt;</content:encoded></item><item><title>Work only I can do</title><link>https://jordivillar.com/notes/work-only-i-can-do/</link><guid isPermaLink="true">https://jordivillar.com/notes/work-only-i-can-do/</guid><pubDate>Wed, 14 Jan 2026 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;My entire strategy is to do the work only I can do.&lt;/p&gt;
&lt;p&gt;Work that can’t be taught. Work that requires some unique combination of my skills, opinions, tastes, and experiences. Work that without me, wouldn’t get done.&lt;/p&gt;
&lt;p&gt;Everyone’s talking about AI replacing jobs, automating work, making developers obsolete. The discourse is exhausting. Half the people are panicking, the other half are in denial, and nobody seems to be asking the right question.&lt;/p&gt;
&lt;p&gt;The question isn’t whether AI can write code. It can. The question is whether AI can do your work.&lt;/p&gt;
&lt;p&gt;If your work is following established patterns, implementing well-understood solutions, or translating requirements into predictable outputs. Yes, that’s going to get automated. It should get automated. That’s not a threat, it’s just what happens when something becomes routine.&lt;/p&gt;
&lt;p&gt;But the work that sits at the intersection of your specific experiences, your particular way of seeing problems, your accumulated context, the opinions you’ve formed from making mistakes, that’s different. That’s the work that moves things forward in ways that weren’t possible before you showed up.&lt;/p&gt;
&lt;p&gt;AI can generate code. It can’t decide what’s worth building. It can’t know which shortcuts are smart and which ones will haunt you. It can’t weigh trade-offs through the lens of having seen this exact thing blow up before. It can’t have taste.&lt;/p&gt;
&lt;p&gt;The hard part isn’t identifying this work. The hard part is being honest about whether you’re actually doing it. The hard part is saying no to everything else. The hard part is resisting the pull to stay busy with work that feels productive but could be done by anyone, or anything, with the same instructions.&lt;/p&gt;
&lt;p&gt;I’m not always good at this. I still catch myself doing work that doesn’t need me. But when I do manage to focus on the work only I can do, everything else gets clearer. The decisions get easier. The direction becomes obvious.&lt;/p&gt;
&lt;p&gt;Because if you’re not doing the work only you can do, what exactly are you doing?&lt;/p&gt;</content:encoded></item><item><title>2026W00</title><link>https://jordivillar.com/reads/2026w00/</link><guid isPermaLink="true">https://jordivillar.com/reads/2026w00/</guid><pubDate>Sun, 04 Jan 2026 18:53:33 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://notes.eatonphil.com/2026-01-03-distinguishing-yourself.html&quot;&gt;Distinguishing yourself early in your career as a developer&lt;/a&gt; — Honestly? This hit different. Phil breaks down the job market into three tiers and the advice that stuck: start local, skip the FAANG obsession early on, and—here’s what I appreciate most—write consistently about niche technical stuff. Not for clout, but because it’s genuinely one of the most effective ways to build credibility over time. The 6-12 month job search reality check is brutal but refreshing. Also, the idea that support or QA roles can be legitimate entry points into dev? That’s the kind of pragmatic wisdom people actually need.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://lelouch.dev/blog/you-are-probably-not-dumb/&quot;&gt;You Are NOT Dumb, You Just Lack the Prerequisites&lt;/a&gt; — I’ve definitely read this before, but revisiting it at the start of the year feels intentional. The premise is simple yet powerful: struggling with hard concepts doesn’t mean you’re intellectually incapable, it means you’re missing the foundational pieces. What resonates is the author’s journey—150 days of going back to basics in math, systematically rebuilding understanding. It’s the kind of humility and persistence we need more of. The analogy about jumping into a video game at minimum level? Chef’s kiss. This is the energy I want to carry into 2026.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/systematicls/status/2004900241745883205/?s=12&amp;#x26;rw_tt_thread=True&quot;&gt;The Prison Of Financial Mediocrity&lt;/a&gt; — This thread captures something unsettling about where we are right now. A whole generation is effectively locked out of traditional wealth-building paths—no homeownership, no stable pensions, wages that don’t match cost of living. So what happens? People turn to high-variance bets like crypto, prediction markets, and sports betting because it feels like the only way to gain some agency over their financial future. The platforms and “hope sellers” profit regardless. It’s a depressing feedback loop where desperation meets exploitation. Hard to read, harder to ignore.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://double-dissent.fika.bar/how-to-add-two-vectors-fast-01KDFNX4WQA7C70TZ19K8P1JAV&quot;&gt;How to add two vectors, fast&lt;/a&gt; — When Txus writes about low-level optimization stuff, you stop and pay attention. This is a deep dive into CPU vs GPU performance for vector addition—concrete benchmarks, memory-bound vs compute-bound problems, the whole deal. The best part? Learning that throwing fancy optimizations at a memory-bound kernel barely moves the needle because data movement, not computation, is the bottleneck. It’s a reminder that profiling and understanding hardware constraints beats clever code tricks. If you’re into performance engineering or just curious about why GPUs work the way they do, this is worth your time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ploum.net/2025-12-15-communication-entertainment.html&quot;&gt;How We Lost Communication to Entertainment&lt;/a&gt; — This one stings because it’s true. We don’t have social networks anymore, we have entertainment platforms optimized for engagement, not connection. The shift is subtle but devastating: people now accept lost messages, multiple accounts, algorithmic feeds that prioritize virality over trust. The author draws this generational line—older folks expecting reliability, younger users treating platforms like content feeds. What I appreciate is the refusal to chase critical mass. Instead: email, RSS, mailing lists, offline-first tools. It’s a smaller community, sure, but one built on actual communication. I feel seen.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://linear.app/now/designing-remote-work-at-linear&quot;&gt;Designing remote work at Linear&lt;/a&gt; — Linear’s approach to remote work feels refreshingly intentional. It’s not just “we allow WFH,” it’s designed around freedom, trust, and deep focus. Small autonomous teams (2-4 people), rotating project leadership, zero-bugs SLAs, quality Wednesdays, feature roasts before shipping—these aren’t just rituals, they’re guardrails for maintaining quality without micromanagement. The goalie rotation for handling unplanned work is clever. And honestly? The 10-year equity exercise windows and sabbatical provisions show they’re thinking long-term about retention. This is what remote-first looks like when you actually commit to it, not just tolerate it.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.seangoedecke.com/taking-a-position/&quot;&gt;Engineers who won’t commit force bad decisions&lt;/a&gt; — This called me out a bit. The argument is sharp: when senior engineers stay non-committal in technical discussions, they’re not being careful or humble—they’re shifting the burden to less-informed teammates. Either junior devs end up guessing, or the loudest voice wins by default. The root cause? Fear of being publicly wrong. But here’s the thing: managers expect some calls to fail, especially on genuinely hard problems. The threshold is simple—if you have more context than others in the room, speak up. Caveats and hedging just create friction. It’s uncomfortable but necessary: taking a position, even with uncertainty, moves the team forward.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2025W51</title><link>https://jordivillar.com/reads/2025w51/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w51/</guid><pubDate>Sun, 28 Dec 2025 12:32:09 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://nesbitt.io/2025/12/26/how-uv-got-so-fast.html&quot;&gt;How uv got so fast&lt;/a&gt; — This one hit different because everyone’s been saying “uv is fast because Rust” but the real story is way more interesting. It’s fast because of what it doesn’t do—no legacy .egg support, no bytecode compilation, no permissive spec violations. The timing was perfect: PEP 658 landed on PyPI in May 2023, giving direct access to package metadata, and uv launched February 2024. Honestly? Most of the speedups (parallel downloads, global cache, the PubGrub resolver) could be added to pip without touching Rust. The language helps with zero-copy deserialization and threading, sure, but the real win is having the courage to say “we’re not supporting that old stuff anymore.” It’s a reminder that sometimes architectural decisions matter way more than implementation language.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://miguelcarranza.es/cto-year-8&quot;&gt;My role as a founder CTO: Year Eight&lt;/a&gt; — Miguel got offered $500M for RevenueCat and turned it down. That decision alone makes this worth reading, but what I appreciate most is how raw he is about the oscillation between conviction and doubt. His wife told him to keep going, viewing it as their shared legacy, and that reframed everything. The rest is a founder CTO doing founder CTO things at scale—50 flights, still doing 40 interviews a year because hiring is “the highest leverage activity,” creating this Office of the CTO team for zero-to-one work. He admits his three biggest mistakes openly: wasting energy on a VP of Engineering search, letting hiring velocity stall mid-year, and moving people to new initiatives before stabilizing existing ones. What stuck with me though is that after all this growth and near-exit, he’s still fundamentally a builder who can’t imagine stopping. That kind of clarity about who you are is rare.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://nesbitt.io/2025/12/24/package-managers-keep-using-git-as-a-database.html&quot;&gt;Package managers keep using git as a database, it never works out&lt;/a&gt; — The pattern is almost comical at this point: use git as your package registry because it’s convenient and free, watch it break spectacularly as you scale, then quietly migrate to actual HTTP APIs. Cargo had users stuck on “delta resolution” forever, Homebrew’s .git folders hit 1GB, CocoaPods took minutes just to clone. The best part? They all solved it the same way—keep git internally for governance workflows, but serve metadata over HTTP/CDN to users. What I appreciate about this piece is how clearly it explains why git fails: it’s missing CHECK constraints, UNIQUE constraints, proper locking, query indexes, all the things actual databases have. Plus filesystem limits like Windows’ 260-character path restriction and case-sensitivity mismatches. It’s a textbook case of picking a tool that solves your immediate problem (version control + hosting) while ignoring what you actually need (a queryable database with performance guarantees).&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.raptitude.com/2025/12/maybe-the-default-settings-are-too-high/&quot;&gt;Maybe the Default Settings Are Too High&lt;/a&gt; — David Cain reads Lord of the Rings out loud at triple-slow pace, pausing after commas, and gets more out of it than speed-reading ever gave him. Same with eating—half speed, smaller portions, more pleasure. The insight that landed for me: when we rush to get to the “good stuff” faster, we actually guarantee we’ll miss it entirely. Our sensory systems need time to propagate the experience, but modern life has conditioned us toward empty, surface-level rewards because we treat everything as disposable. The paradox is brutal: we have infinite books and snacks available, so we unconsciously devalue each one by consuming it too quickly. Slowing down doesn’t just make experiences richer, it naturally redirects you toward substantive things because cheap chocolate and TikTok videos don’t reward patience. This meshes with something I’ve been feeling about how I read technical articles—scanning for bullet points instead of letting ideas settle.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://nnethercote.github.io/2025/09/04/faster-rust-builds-on-mac.html&quot;&gt;Faster Rust builds on Mac&lt;/a&gt; — I’ve been hitting this without realizing what it was. Every build script and test binary triggers XProtect’s malware scan the first time it runs, and since Rust compiles fresh binaries constantly, you’re basically waiting for a single-threaded security daemon to approve every executable. The author shows build scripts going from 3.88 seconds down to 0.14 seconds, and a test suite dropping from 9m42s to 3m33s, just by adding Terminal to the developer tools list. That’s a massive win for a one-time settings change. The honest trade-off discussion is what I appreciate here—this disables an OS security feature, so you’re choosing speed over protection. For personal dev machines where you control what code you’re running, that’s probably fine. For shared or work machines, maybe not. Either way, knowing this exists beats suffering through slow builds and blaming Rust.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://kupajo.com/write-to-escape-your-default-setting/?utm_source=substack&amp;#x26;utm_medium=email&quot;&gt;Write to escape your default setting&lt;/a&gt; — This connects to the previous article about slowing down, but focused on thinking instead of consuming. Our minds operate in “perpetual approximation mode”—jumping between shiny fragments, never settling long enough to go deep. Writing breaks that pattern by forcing you to create coherence on paper, which immediately exposes the gaps between what you think you know and what you actually understand. The Francis Bacon line hits: “reading maketh a full man… writing maketh an exact man.” What I appreciate is the permission to write fast and sloppy—the point isn’t polished prose, it’s getting the muddy bottom of your thoughts visible so you can examine it. This is why I keep coming back to writing notes and posts even when it feels inefficient. It’s not about producing content, it’s about extending my working memory beyond what I can hold internally and discovering what I actually believe.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://neilthanedar.com/youre-not-burnt-out-youre-existentially-starving/&quot;&gt;You’re Not Burnt Out. You’re Existentially Starving.&lt;/a&gt; — Neil Thanedar argues that burnout is often misdiagnosed—what we’re actually experiencing is Viktor Frankl’s “existential vacuum,” a profound absence of purpose despite material comfort. The argument cuts through the hustle culture vs. anti-hustle binary: working 100+ hours a week isn’t inherently problematic if those hours align with genuine purpose, but 40 hours of meaningless tasks will destroy you. What resonates is his challenge to reconnect with childhood dreams before self-doubt kicked in, then build your entire life around that direction rather than just optimizing leisure time. His own story—abandoning astronaut/president dreams in middle school, spending 15+ years in tech, finally embracing political engagement—shows this isn’t about sudden epiphanies but deliberate realignment. The “start small” advice matters: volunteer one hour weekly for something you believe in, don’t wait for perfect timing. Honestly? This reframes my own frustrations with leadership work as potentially a purpose problem, not a workload problem.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://double-dissent.fika.bar/you-are-not-the-code-01KBYRMJG8W0PC853BHAKW5JC4&quot;&gt;You are not the code&lt;/a&gt; — Txus spent weeks building an elegant Clojure query language for power users, demoed it to his manager Manish, got told not to ship it, and then just… deleted the branch. The relief he felt in that moment revealed something profound: he’d been fusing his self-worth with his code output, making every criticism feel personal and every failure diminishing. The realization that “the code is an artifact, a byproduct of an ongoing process” freed him—the two weeks weren’t wasted because the learning remained even after the deletion. This hits hard for anyone who’s had technically brilliant work rejected for product/team alignment reasons. The maturity here is recognizing that being right about the technical solution doesn’t mean it’s the right solution. Your value comes from growth and capability, not from lines of code surviving in the repo. I’ve definitely struggled with this when features I built got deprecated or rewritten, feeling like it invalidated the work, when actually it just meant we learned enough to do something better.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>166 hours</title><link>https://jordivillar.com/notes/166-hours/</link><guid isPermaLink="true">https://jordivillar.com/notes/166-hours/</guid><pubDate>Sat, 27 Dec 2025 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Last year I wrote about &lt;a href=&quot;/notes/130-hours&quot;&gt;hitting 130 hours&lt;/a&gt; of working out. I ended that note with a line I repeated three times: “I’m going to keep making progress.”&lt;/p&gt;
&lt;p&gt;I didn’t know if I’d actually do it. Part of me wondered if writing it down was just another way of setting myself up for disappointment.&lt;/p&gt;
&lt;p&gt;This year was rough. The kind of year where everything important changed in the span of a week, and the months after felt like trying to find solid ground that kept shifting.&lt;/p&gt;
&lt;p&gt;I still managed to work out for 166 hours. Still around 27 minutes a day. Still not impressive numbers.&lt;/p&gt;
&lt;p&gt;But here’s what mattered: when everything else felt like it was falling apart, this was something I could control. I could lace up my shoes. I could show up. And on the days I did, it was the one thing that felt like forward motion.&lt;/p&gt;
&lt;p&gt;None of these are numbers that would impress anyone. But they’re mine, and they represent something harder than any PR: consistency when it would’ve been easier to stop.&lt;/p&gt;
&lt;p&gt;Life changes fast. Appreciate the people you have around you while you have them. And don’t waste your time with people who don’t deserve it. These aren’t fitness lessons, but they’re what this year taught me while I was trying to keep showing up.&lt;/p&gt;
&lt;p&gt;Next year, I’ll keep making progress. Not because I have to, but because I’ve proven to myself that I can.&lt;/p&gt;</content:encoded></item><item><title>2025W50</title><link>https://jordivillar.com/reads/2025w50/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w50/</guid><pubDate>Sun, 21 Dec 2025 12:30:54 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.joanwestenberg.com/thin-desires-are-eating-your-life/?utm_source=hackernewsletter&amp;#x26;utm_medium=email&amp;#x26;utm_term=fav&quot;&gt;Thin Desires Are Eating Your Life&lt;/a&gt; — This hit different. The distinction between thin desires (scrolling, checking notifications) and thick desires (learning something hard, building real skills) explains so much about why we feel empty despite having instant access to everything. What really stuck with me: tech companies have figured out how to extract the dopamine hit from meaningful activities while stripping away the transformation. Social media gives you the reward of connection without any of the depth. The best part? The author doesn’t prescribe some grand solution—just small acts of resistance like baking bread or writing actual letters. Things that refuse to be optimized, that force you to slow down. Honestly? That feels more achievable than trying to quit everything cold turkey.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pierce.dev/notes/go-ahead-self-host-postgres&quot;&gt;Go ahead, self-host Postgres&lt;/a&gt; — I appreciate how blunt this is about the cloud narrative. The author’s been running self-hosted Postgres for two years, serving millions of queries daily, and spends about 30 minutes a month on maintenance. Meanwhile AWS charges $328/month for what’s basically vanilla Postgres with some operational tooling wrapped around it. What convinced me most? The actual numbers: migrating from RDS took 4 hours of work, performance was identical, and the operational burden is genuinely minimal if you’re comfortable with the basics. Not saying managed services are never worth it—startups moving fast or enterprises with compliance needs make sense—but for everyone else paying those cloud premiums? You might be solving a problem you don’t actually have.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.mathieui.net/this-is-not-the-future.html&quot;&gt;This is not the future&lt;/a&gt; — This felt like someone finally saying what I’ve been thinking. The premise is simple but powerful: nothing is inevitable just because a tech company with billions says it is. We’ve been trained to accept every new “innovation”—AI features shoved into everything, unrepairable devices, surveillance masquerading as convenience—as if it’s the natural progression of technology. But it’s not. Every choice is political, every adoption is a vote. What I appreciate most is the refusal to be passive. The author gives you a whole list of things that aren’t actually necessary (internet-connected toasters, subscription car features, mandatory accounts for basic functions) and basically says: you don’t have to accept this. Choose tools that respect you. It’s a reminder that we still have agency, even when companies pretend we don’t.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.canoozie.net/disks-lie-building-a-wal-that-actually-survives/&quot;&gt;Disks Lie: Building a WAL that actually survives&lt;/a&gt; — The title nails it: your storage stack is absolutely lying to you. This breaks down all the ways a naive write-ahead log will fail in production—data sitting in kernel buffers pretending to be persisted, silent bit flips, operation reordering, the works. What I found most valuable is the five-layer defense strategy: checksums to catch corruption, dual WAL files on separate disks (because latent sector errors are common enough that a single copy is “negligence”), O_DIRECT + O_DSYNC to bypass buffering entirely, io_uring with linked operations for ordering guarantees, and post-fsync verification reads because you genuinely cannot trust that your data made it to disk. It’s paranoid, sure, but production durability demands paranoia. The storage stack will not tell you the truth about whether your data is safe.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://buttondown.com/jaffray/archive/lightweight-cardinality-estimation-with-density/&quot;&gt;Lightweight Cardinality Estimation with Density&lt;/a&gt; — This is one of those pieces that makes you appreciate how much thought goes into query optimization. Density is just 1/ICARD (the reciprocal of distinct values), but it’s surprisingly useful for helping query planners decide which index to use. The insight that stuck: for uniformly distributed data, density tells you roughly how selective an equality predicate will be, and comparing densities across multiple columns reveals correlations in your data. What I appreciate most is the practicality argument—unlike histograms or other complex statistics, density is cheap to compute and maintain at scale, even using approximations like HyperLogLog. It’s not perfect, but it’s a valuable tool relative to its cost. Sometimes the simple heuristic that you’ll actually use beats the sophisticated one that’s too expensive to track.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://zed.dev/blog/zed-is-our-office&quot;&gt;Zed Is Our Office&lt;/a&gt; — The audacity of this vision is what gets me. They’re not just adding collaboration features to an editor—they’re arguing that the editor itself should be your office. Built from scratch with CRDTs for seamless real-time editing, zero-friction setup (just GitHub auth), built-in audio and screenshare that automatically follows who’s talking. What really sells it? They’re using Zed to run their own company. Not as a demo, but as their actual workspace—channels for company-wide discussions, projects, personal focus. The ambition here is that code, conversations, and context all live in the same place, accessible to both teammates and AI. Most collaborative editors feel bolted-on and end up forcing you back to Slack or Zoom. This feels like a genuine rethink of what a development environment could be if collaboration was fundamental, not an afterthought.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://soumith.ch/blog/2025-11-06-leaving-meta-and-pytorch.md.html&quot;&gt;Leaving Meta and PyTorch&lt;/a&gt; — There’s something quietly powerful about leaving at the peak. Soumith led PyTorch from inception to 90%+ adoption across the AI industry, and now he’s walking away from “one of the AI industry’s most leveraged seats” to do something small and uncomfortable. What struck me most is the self-awareness: he couldn’t ignore the counterfactual regret of never trying something outside Meta. The reflection on institutional strength resonates too—PyTorch doesn’t need him anymore, the team can solve problems without him, and that’s actually the sign of success. It’s a reminder that sometimes the right move is stepping away precisely when you’ve built something that can thrive without you. The curiosity to start over, to be a beginner again, even when you’ve already won? That takes guts.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://paoramen.fika.bar/the-talent-machine-01K94NER8BZH9G9GYHR7TMFRES&quot;&gt;The Talent Machine&lt;/a&gt; — The core reframe here is brilliant: stop thinking of hiring as filtering applicants and start thinking of it as selling a product. Define what you’re actually offering—not just the tech stack, but the culture, the mission, the compensation structure—and position it to match what specific candidates want. What really clicked for me is the emphasis on transparency as a competitive advantage. Public employee handbook, fixed salaries instead of negotiable ranges, clear career paths. Most companies treat this stuff like trade secrets, but being radically transparent eliminates friction and self-selects for cultural fit. The tactical advice about targeting underserved demographics (experienced engineers in their 30s-40s looking for stability) and choosing less-saturated tech stacks to reduce competition? That’s the kind of strategic thinking most companies miss. Hiring isn’t about having the biggest budget, it’s about knowing exactly who you’re for and making it dead simple for them to say yes.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2025W49</title><link>https://jordivillar.com/reads/2025w49/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w49/</guid><pubDate>Sun, 14 Dec 2025 12:31:05 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.dbpro.app/blog/sqlite-json-virtual-columns-indexing&quot;&gt;SQLite JSON Superpower: Virtual Columns + Indexing&lt;/a&gt; — This is one of those “why didn’t I know about this sooner?” moments. The idea is simple but powerful: dump your JSON into SQLite, create virtual columns that extract specific fields on the fly, then slap indexes on those virtual columns. Boom, you’ve got schemaless flexibility with relational performance. No backfilling, no migration headaches. You can literally add a new indexed column later without touching your data. The beauty is in how it solves the eternal “should I go schemaless or relational?” debate by just saying “both.”&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://javisantana.substack.com/p/recortes-2025&quot;&gt;Recortes 2025&lt;/a&gt; — Javi’s year-end reflections hit different. It’s refreshing to read someone being this honest about the mental toll of being CEO while also celebrating running a 10k at 43. The quote about not being able to compete with someone having fun keeps bouncing around in my head. There’s something profound about acknowledging that working hard without questioning what you should actually work on is just lazy thinking with extra steps. Also, keeping a diary for his daughter to understand his life context later? That’s the kind of long-term thinking I respect.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://fly.io/blog/litestream-vfs/&quot;&gt;Litestream VFS&lt;/a&gt; — This genuinely blew my mind. You can query SQLite databases sitting in S3 without downloading the whole thing. Just fetch the pages you need using S3 range requests and an LRU cache. Point-in-time recovery is just a SQL pragma away. The LTX format compacts page versions so you’re not storing redundant data. This solves so many “I just need to quickly check production” moments without spinning up a whole restore dance. The technical elegance here is top-notch.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://rapha.land/craft-software-that-makes-people-feel-something/&quot;&gt;Craft software that makes people feel something&lt;/a&gt; — Raphael built Boo, a code editor, mostly for himself because he wanted something with soul. His point about repetition killing creativity resonates hard. When coding becomes mechanical, you lose the spark that makes people go “wow.” What I appreciate most is his willingness to build something not commercially viable just because it matters to him. In an era obsessed with PMF and growth metrics, building for joy feels almost radical. The comparison to Lego is spot on too - the best software feels like play.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://miguelcarranza.es/the-long-run&quot;&gt;The Long Run&lt;/a&gt; — Miguel went from being a non-runner to completing a full marathon, running through 18 cities along the way. But this isn’t really about running. It’s about rewriting your internal script. Once you do something you thought was impossible, the ceiling lifts everywhere else. “Discipline compounds quietly, until suddenly you’re a different person” is the kind of truth you only understand after experiencing it. The best part? He doesn’t frame it as inspiration porn, just an honest account of how doing hard things changes what you think you’re capable of.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://entropicthoughts.com/transparent-leadership-beats-servant-leadership&quot;&gt;Transparent Leadership Beats Servant Leadership&lt;/a&gt; — The distinction here is subtle but important. Servant leadership can accidentally create dependency where you’re constantly clearing paths for people. Transparent leadership is about teaching people to clear their own paths. You coach, you connect folks directly, you explain the “why” so they can make good calls independently. The goal is to become redundant by leveling everyone up. I’ve seen too many managers become bottlenecks disguised as helpful. This framework gives language to why that feels wrong and what to do instead.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://lalitm.com/software-engineering-outside-the-spotlight/&quot;&gt;Why I Ignore The Spotlight as a Staff Engineer&lt;/a&gt; — Not all impact comes from high-visibility projects. Some of the most valuable work is being the long-term steward of critical systems nobody thinks about until they break. The trade-off is less external glory, but you get to build deep expertise and solve genuinely complex problems. The “shadow hierarchy” concept is fascinating - getting recognition through customer team endorsements rather than flashy launches. “The most ambitious thing you can do is stay put, dig in, and build something that lasts” flips the typical career advice on its head, and honestly? It’s probably right more often than we admit.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://lokireturns.github.io/delta-lake/2025/11/21/delta-table-atomicity.html&quot;&gt;Delta Lake Atomicity&lt;/a&gt; — A solid technical breakdown of how Delta Lake gets atomicity without traditional WAL. Instead of page-level recovery, it uses copy-on-write: write new Parquet files, then atomically write a JSON log entry pointing to them. If something crashes mid-flight, the uncommitted files just sit there invisible since they’re not in the log. It’s clever because object storage guarantees single file write atomicity, so you piggyback on that rather than building your own transaction log infrastructure. Clean, simple, and it works.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>64-bit Misalignment</title><link>https://jordivillar.com/blog/memory-alignment/</link><guid isPermaLink="true">https://jordivillar.com/blog/memory-alignment/</guid><pubDate>Mon, 17 Nov 2025 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I’ve been following CMU’s 15-445/645 Database Systems course lately, trying to fill some gaps in my low-level systems knowledge. When the lectures covered memory alignment, it all made perfect sense. Cache lines, multiple memory operations, hardware penalties. Textbook stuff.&lt;/p&gt;
&lt;p&gt;But here’s the thing: I don’t trust theory until I’ve seen it break something myself.&lt;/p&gt;
&lt;p&gt;So I decided to write a benchmark. Simple plan: create two structs (one with natural alignment, one tightly packed) and demonstrate the performance penalty everyone talks about. I’d watch the misaligned version choke, nod knowingly, and move on to the next lecture.&lt;/p&gt;
&lt;p&gt;Spoiler alert: The misaligned version won.&lt;/p&gt;
&lt;h3 id=&quot;what-is-64-bit-memory-alignment&quot;&gt;What is 64-bit Memory Alignment?&lt;/h3&gt;
&lt;p&gt;Before we get to the weird results, let me explain what I thought I understood. Modern processors prefer (or require) that data be aligned to 64-bit (8-byte) boundaries.&lt;/p&gt;
&lt;p&gt;Why does alignment matter? When data is properly aligned, the CPU can read or write it in a single memory operation. When misaligned, several things can happen depending on the architecture&lt;sup&gt;&lt;a href=&quot;#user-content-fn-1&quot; id=&quot;user-content-fnref-1&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multiple memory operations&lt;/strong&gt;: Reading a misaligned &lt;code&gt;uint64_t&lt;/code&gt; might require two separate memory reads instead of one. For example, if an 8-byte value starts at address 1, it might span two different memory access boundaries, requiring the CPU to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read bytes 0-7 (getting bytes 1-7 of our value)&lt;/li&gt;
&lt;li&gt;Read bytes 8-15 (getting byte 0 of our value)&lt;/li&gt;
&lt;li&gt;Combine the pieces with bit-shifting and masking&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Non-atomic operations&lt;/strong&gt;: Even more critically, misaligned writes are &lt;strong&gt;not atomic&lt;/strong&gt;. A properly aligned 8-byte write happens as a single atomic operation, but a misaligned write might be implemented as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read-modify-write of the first memory chunk&lt;/li&gt;
&lt;li&gt;Read-modify-write of the second memory chunk&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This means another thread could observe a partially-written value, making misaligned access dangerous in concurrent code without additional synchronization.&lt;/p&gt;
&lt;p&gt;Compilers automatically add padding to structs to maintain these alignment requirements and avoid these issues. Here’s a simple example:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;cpp&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;// NORMAL struct - compiler adds padding for alignment&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;struct&lt;/span&gt;&lt;span style=&quot;color:#4EC9B0&quot;&gt; NormalStruct&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    uint8_t&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  a;&lt;/span&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;      // 1 byte&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    uint64_t&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; b;&lt;/span&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;      // 8 bytes (aligned at offset 8)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    uint16_t&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; c;&lt;/span&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;      // 2 bytes&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;    // Total: 24 bytes (with padding)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;};&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;// PACKED struct - no padding, fields stored consecutively&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;struct&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; __attribute__&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;((packed)) PackedStruct {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    uint8_t&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  a;&lt;/span&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;      // 1 byte&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    uint64_t&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; b;&lt;/span&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;      // 8 bytes (misaligned at offset 1)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    uint16_t&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; c;&lt;/span&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;      // 2 bytes&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;    // Total: 11 bytes (no padding)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The memory layout looks like this:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;NormalStruct (24 bytes):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Byte:  0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Field: [a][- padding (7 bytes) -][b  b  b  b  b  b  b  b][c  c][pad]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;PackedStruct (11 bytes):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Byte:  0  1  2  3  4  5  6  7  8  9  10&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;  Field: [a][b  b  b  b  b  b  b  b][c  c]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The packed version saves 13 bytes per struct (&lt;strong&gt;54% reduction&lt;/strong&gt;), but the &lt;code&gt;uint64_t b&lt;/code&gt; field is now misaligned, starting at offset 1 instead of 8.&lt;/p&gt;
&lt;h3 id=&quot;the-plan&quot;&gt;The Plan&lt;/h3&gt;
&lt;p&gt;The plan was straightforward. I’d benchmark these two structs performing various operations and watch the misaligned version suffer. I tested on the two different architectures I have at home:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Apple M1 Pro&lt;/strong&gt; (ARM64)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AMD Ryzen 7 8745HS&lt;/strong&gt; (x86-64)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I created four benchmark tests:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Sequential read of the &lt;code&gt;uint64_t b&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;Random read of the &lt;code&gt;uint64_t b&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;Sequential write to the &lt;code&gt;uint64_t b&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;Read-modify-write operations&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each test ran on arrays of 100,000 elements with 1,000 iterations, repeated 10 times for statistical reliability.&lt;/p&gt;
&lt;h3 id=&quot;first-results-wait-what&quot;&gt;First Results: Wait, What?&lt;/h3&gt;
&lt;p&gt;Here are the initial results on &lt;strong&gt;x86-64 (AMD Ryzen 7 8745HS)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Test                   Aligned (ms)  Misaligned (ms)  Difference&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;-----------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Read              28.38            26.10      -8.0%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Random Read                  56.90            69.96     +22.9%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Write             20.27            14.45     -28.8%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Read-Modify-Write            21.53            20.71      -3.8%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And on &lt;strong&gt;ARM64 (Apple M1 Pro)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Test                   Aligned (ms)  Misaligned (ms)  Difference&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;-----------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Read              97.22            97.21      -0.0%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Random Read                 211.00           210.11      -0.4%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Write             30.47            22.22     -27.0%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Read-Modify-Write            39.42            40.80      +3.5%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I stared at these numbers for a long time. The misaligned packed struct was performing almost the same (if not better) than the aligned version. This made no sense. Where were my alignment penalties?&lt;/p&gt;
&lt;h3 id=&quot;nothing-made-sense&quot;&gt;Nothing Made Sense&lt;/h3&gt;
&lt;p&gt;I double-checked everything:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Checked if some compiler flags were enabled and avoiding the alignment penalties&lt;/li&gt;
&lt;li&gt;Warmup iterations&lt;/li&gt;
&lt;li&gt;Statistical analysis&lt;/li&gt;
&lt;li&gt;Memory initialization&lt;/li&gt;
&lt;li&gt;Reviewed the theory and had a long conversation with Claude&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The code was correct. The results were real. But they didn’t match my expectations at all.&lt;/p&gt;
&lt;h3 id=&quot;then-cache-hit-me&quot;&gt;Then Cache Hit Me&lt;/h3&gt;
&lt;p&gt;I calculated the actual memory footprint of my test arrays:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;100,000 elements:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NormalStruct&lt;/code&gt;: 100,000 × 24 bytes = &lt;strong&gt;2,343 KB&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PackedStruct&lt;/code&gt;: 100,000 × 11 bytes = &lt;strong&gt;1,074 KB&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And then I checked the CPU cache sizes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AMD Ryzen 7 8745HS (per core):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;L1 Data Cache: 32 KB&lt;/li&gt;
&lt;li&gt;L2 Cache: 1 MB (&lt;strong&gt;1,024 KB&lt;/strong&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Apple M1 Pro (per core):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;L1 Data Cache: 64 KB&lt;/li&gt;
&lt;li&gt;L2 Cache: 4 MB&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The packed struct’s smaller memory footprint meant better cache utilization. This advantage was outweighing any misalignment penalties!&lt;/p&gt;
&lt;h3 id=&quot;proving-it&quot;&gt;Proving It&lt;/h3&gt;
&lt;p&gt;I needed to test this properly. I created a new benchmark that tested different array sizes to see where the performance crossover happens:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;cpp&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;void&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; test_array_size&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;size_t&lt;/span&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt; count&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;size_t&lt;/span&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt; iterations&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;) {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    vector&amp;#x3C;NormalStruct&gt; &lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;normal_data&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(count);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    vector&amp;#x3C;PackedStruct&gt; &lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;packed_data&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(count);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;    // ... initialization and warmup ...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    double&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; normal_time = &lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;benchmark_sequential_read&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(normal_data, iterations);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    double&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; packed_time = &lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;benchmark_sequential_read&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(packed_data, iterations);&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;    // Calculate and print results&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I tested array sizes from 100 elements (fits comfortably in L1) up to 200,000 elements (way beyond L2), adjusting iterations to keep total runtime reasonable.&lt;/p&gt;
&lt;h3 id=&quot;the-pattern&quot;&gt;The Pattern&lt;/h3&gt;
&lt;p&gt;Here are the results on &lt;strong&gt;x86-64 (AMD Ryzen 7 8745HS)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Array Size    Normal (KB)    Packed (KB)     Difference&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;--------------------------------------------------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;       100           2              1            +2.8%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;     1,000          23             10            +1.5%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;     5,000         117             53            -5.4%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;    10,000         234            107            -5.9%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;    30,000         703            322            -6.6%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;    50,000       1,171            537            -8.8%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;   100,000       2,343          1,074            -8.7%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;   200,000       4,687          2,148            -8.2%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The pattern is crystal clear:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tiny arrays (100-1,000 elements, both in L1)&lt;/strong&gt;: Aligned wins&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This is the “true” alignment penalty, minimal in modern CPUs&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Growing arrays (5,000-30,000)&lt;/strong&gt;: Packed advantage grows&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Better cache utilization starts to dominate&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The critical point (~50,000 elements)&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NormalStruct (1,171 KB) exceeds L2&lt;/li&gt;
&lt;li&gt;PackedStruct (537 KB) still fits in L2&lt;/li&gt;
&lt;li&gt;Packed is &lt;strong&gt;8.8% faster&lt;/strong&gt; despite misalignment!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Beyond L2&lt;/strong&gt;: Packed maintains ~8% advantage&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Both exceed cache, but packed uses less memory bandwidth&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;what-about-the-64-bit-alignment-penalty&quot;&gt;What About the 64-bit Alignment Penalty?&lt;/h3&gt;
&lt;p&gt;After all this, I still needed to see the actual alignment penalty without cache effects getting in the way. So I re-ran the benchmarks with just 100 elements, small enough to fit entirely within L1 cache. &lt;sup&gt;&lt;a href=&quot;#user-content-fn-2&quot; id=&quot;user-content-fnref-2&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;x86-64 (AMD Ryzen 7 8745HS)&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Test                   Aligned (ms)  Misaligned (ms)  Difference&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;-----------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Read                2.37             2.45      +3.3%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Random Read                    3.33             3.70     +11.3%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Write               1.52             1.93     +26.7%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Read-Modify-Write              2.07             2.11      +2.0%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;ARM64 (Apple M1 Pro)&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Test                   Aligned (ms)  Misaligned (ms)  Difference&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;-----------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Read                9.80             9.76      -0.4%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Random Read                   20.82            20.81      -0.1%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Write               1.59             1.66      +5.0%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Read-Modify-Write              1.94             3.54     +82.7%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These results are exactly what we expected to see. With a small array of 100 elements that fits within L1 cache, we finally observe the alignment penalties that theory predicts. The misaligned packed struct shows consistent slowdowns ranging from 2% to 82%, demonstrating the real computational cost of accessing data that doesn’t sit on natural memory boundaries.&lt;/p&gt;
&lt;p&gt;The small dataset isolates the true alignment penalty from cache effects. This is the benchmark that confirms what everyone says about alignment, when cache isn’t a factor, misaligned access is indeed slower.&lt;/p&gt;
&lt;h3 id=&quot;what-i-learned&quot;&gt;What I Learned&lt;/h3&gt;
&lt;p&gt;I started this to prove I understood something “everyone knows”. Instead, I got to see how different effects interact in practice.&lt;/p&gt;
&lt;p&gt;The alignment penalties are real, I saw them clearly with small datasets that fit in L1 cache. But in my benchmarks with larger arrays, something else happened. The packed struct’s 54% smaller memory footprint meant better cache utilization, and that advantage outweighed the misalignment cost. The “slower” approach became 8-9% faster.&lt;/p&gt;
&lt;p&gt;This wasn’t about the textbooks being wrong. It was about understanding that performance is about trade-offs, not rules. Alignment matters. Cache efficiency matters. Which one dominates depends on your specific workload, data size, and access patterns.&lt;/p&gt;
&lt;p&gt;Here’s what watching these effects interact taught me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Initial hypotheses can be wrong: I expected one thing, measured another&lt;/li&gt;
&lt;li&gt;Small benchmarks and large benchmarks can show opposite results&lt;/li&gt;
&lt;li&gt;“Common knowledge” is often incomplete without context&lt;/li&gt;
&lt;li&gt;Counterintuitive results are invitations to dig deeper&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;The real lesson isn’t about memory alignment.&lt;/strong&gt; It’s about staying curious when your data surprises you. It’s about measuring instead of assuming. It’s about being willing to investigate when something doesn’t match what you thought you knew.&lt;/p&gt;
&lt;p&gt;I wanted to prove I understood how memory worked. Instead, I got to see multiple effects competing with each other. And that taught me more than being right ever could have.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Update [2025-11-18]&lt;/strong&gt;: After sharing the blogpost in &lt;a href=&quot;https://eatonphil.com/discord.html&quot;&gt;Software Internals Discord&lt;/a&gt;, I received good feedback about reviewing performance counters to confirm my findings.&lt;/p&gt;
&lt;p&gt;I proceeded then to add a wrapper using &lt;code&gt;perf_event_open&lt;/code&gt; to measure cache misses, and observed that the cache misses were significantly reduced when the data was packed due to the smaller size. This confirmed the hypothesis that packed struct was compensating misalignment penalties due to the extra performance trade-off from better cache usage.&lt;/p&gt;
&lt;p&gt;Comparing cache misses between normal and packed structs for different array sizes, we can observe the following results:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Array Size    Normal (KB)    Packed (KB)     Difference     L1 Miss(N)     L1 Miss(P)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;--------------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;       100           2              1            +2.8%        0.00%           0.00%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;     1,000          23             10            +1.5%        0.10%           0.00%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;     5,000         117             53            -5.4%       18.77%           8.16%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;    10,000         234            107            -5.9%       18.77%           8.16%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;    30,000         703            322            -6.6%       18.79%           8.16%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;    50,000       1,171            537            -8.8%       18.74%           8.16%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;   100,000       2,343          1,074            -8.7%       18.75%           8.15%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;   200,000       4,687          2,148            -8.2%       18.73%           8.15%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So running the 64-bit performance test for 100,000, the original test I ran, it’s clear that cache misses on the aligned data are significantly higher making the misaligned structure more efficient in almost all cases.&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4; overflow-x: auto; white-space: pre-wrap; word-wrap: break-word;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Test                   Aligned (ms)  Misaligned (ms)  Difference    Misses (A)     Misses (M)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;----------------------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Read              28.38            26.10      -8.0%       18.77%          8.17%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Random Read                  56.90            69.96     +22.9%       36.99%         40.87%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Sequential Write             20.27            14.45     -28.8%       37.69%         15.53%&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;Read-Modify-Write            21.53            20.71      -3.8%       15.80%          7.77%&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the discord channel, Phil Eaton also shared an interesting discussion about the very same topic that happened 4 months ago in: &lt;a href=&quot;https://lobste.rs/s/plrsmw/data_alignment_for_speed_myth_reality&quot;&gt;https://lobste.rs/s/plrsmw/data_alignment_for_speed_myth_reality&lt;/a&gt;&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 class=&quot;sr-only&quot; id=&quot;footnote-label&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-1&quot;&gt;
&lt;p&gt;There is also another misalignment to consider when talking about memory alignment. Probably more important than the one being discussed here. The cache line misalignment is a performance issue that occurs when a data structure crosses a cache line boundary (usually 64 bytes). The implications of this misalignment are similar to the ones discussed here but at another level. &lt;a href=&quot;#user-content-fnref-1&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to reference 1&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-2&quot;&gt;
&lt;p&gt;Code available here: &lt;a href=&quot;https://github.com/jrdi/playground/tree/main/memory-alignment&quot;&gt;https://github.com/jrdi/playground/tree/main/memory-alignment&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-2&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to reference 2&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;</content:encoded></item><item><title>2025W38</title><link>https://jordivillar.com/reads/2025w38/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w38/</guid><pubDate>Sun, 28 Sep 2025 12:27:37 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://ordep.dev/posts/long-running-projects&quot;&gt;Surviving Long-Running Projects&lt;/a&gt; — A nice reality check on how grueling big projects can be. The author basically says: expect them to drag on, plan for “boring” maintenance, and set up routines to keep your motivation and mental health intact. It’s more about staying sane than about technical tricks.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://stripe.com/blog/how-we-built-it-real-time-analytics-for-stripe-billing&quot;&gt;How we built it: Real-time analytics for Stripe Billing&lt;/a&gt; — This is a cool peek under the hood. Stripe walks through how they built real-time analytics on top of billing, mixing Kafka, Flink, and their own in-house magic. It’s a story about scaling data pipelines to billions of events and still keeping dashboards snappy.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://johnnysswlab.com/9-things-every-fresh-graduate-should-know-about-software-performance/&quot;&gt;9 Things Every Fresh Graduate Should Know About Software Performance&lt;/a&gt; — Basically “performance isn’t magic.” It covers the usual suspects (memory allocation, cache, I/O, concurrency) but framed as a gentle guide for juniors. Solid reminders even for experienced devs.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://eclecticlight.co/2025/09/20/a-brief-history-of-threads-and-threading/&quot;&gt;A brief history of threads and threading&lt;/a&gt; — A fun historical tour of how we got to modern threading. From early OS days to today’s multicores, it’s an eye-opener on how messy and evolutionary concurrency really is.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://zed.dev/blog/hired-through-github-part-1&quot;&gt;Hired Through GitHub: Part 1&lt;/a&gt; — The author tells how showing work on GitHub led directly to job offers. It’s a feel-good story about the value of public contributions and writing about your code.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://notes.eatonphil.com/2025-09-15-what-is-systems-and-how-do-i-learn.html&quot;&gt;In response to a developer asking about systems&lt;/a&gt; — This is like a mini-manifesto on what “systems” really means. The takeaway is: stop chasing trendy tools and learn fundamentals — OS, networks, distributed systems. That’s how you actually “get” systems.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/&quot;&gt;From 19k to 4.2M events/sec: story of a SQLite query optimisation&lt;/a&gt; — Absolute banger. The author squeezes insane performance out of SQLite with indexing, batching, and clever SQL tweaks. Great reminder that small changes in schema/queries can be game-changing.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://iamvishnu.com/posts/utf8-is-brilliant-design&quot;&gt;UTF-8 is a Brilliant Design&lt;/a&gt; — A love letter to UTF-8. The author explains why it’s elegant, backwards-compatible, and just plain clever. Makes you appreciate how much thought went into something we all take for granted.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://joshs.bearblog.dev/being-good-isnt-enough/?utm_source=hackernewsletter&amp;#x26;utm_medium=email&amp;#x26;utm_term=fav&quot;&gt;Being good isn’t enough&lt;/a&gt; — A short but punchy post: skill alone won’t get you ahead. Communication, visibility, and timing matter too. Basically: don’t hide behind code.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.tigerdata.com/blog/introducing-direct-compress-up-to-40x-faster-leaner-data-ingestion-for-developers-tech-preview&quot;&gt;Introducing Direct Compress: Up to 40x Faster, Leaner Data Ingestion for Developers (Tech Preview)&lt;/a&gt; — Marketing-y but interesting. A new ingestion tech promising 40x faster loads. The takeaway: data infra vendors are pushing more on-the-fly compression to cut storage and speed up analytics.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ordep.dev/posts/livelocks&quot;&gt;When more threads make things worse&lt;/a&gt; — A cautionary tale of over-threading. Past a point, adding threads doesn’t help — it can create livelocks, context switching overhead, and general sadness. A good read if you’ve ever said “let’s just throw more threads at it.”&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dynomight.net/liking/?utm_source=hackernewsletter&amp;#x26;utm_medium=email&amp;#x26;utm_term=fav&quot;&gt;You can try to like stuff&lt;/a&gt; — An odd but uplifting essay about cultivating taste. The author argues that liking things is a skill you can practice, not just something that happens. Kind of refreshing.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://rbarbadillo.github.io/tinybird&quot;&gt;Reflections on Tinybird&lt;/a&gt; — A personal reflection from someone using Tinybird. They talk about what impressed them, what didn’t, and lessons learned from building real-time APIs. Interesting outsider’s perspective.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.seangoedecke.com/the-simplest-thing-that-could-possibly-work/&quot;&gt;Do the simplest thing that could possibly work&lt;/a&gt; — Classic XP wisdom retold. The post champions simplicity over cleverness. Don’t over-engineer; just do the simple thing first and iterate.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://underlap.org/developers-block/&quot;&gt;Developer’s block&lt;/a&gt; — Like writer’s block, but for coders. Talks about burnout, procrastination, and ways to get unstuck. Feels relatable if you’ve ever stared at your editor unable to start.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.engineersneedart.com/blog/almostfired/almostfired.html&quot;&gt;The First Time I Was Almost Fired From Apple&lt;/a&gt; — Great storytime. An engineer recounts a near-disastrous mistake at Apple and what it taught them about responsibility and learning under pressure.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://bytemash.net/posts/i-went-down-the-linear-rabbit-hole/&quot;&gt;Linear sent me down a local-first rabbit hole&lt;/a&gt; — A cool piece about discovering local-first architectures (like CRDTs) after using Linear. Shows how design decisions in apps can spark curiosity about deeper tech.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://fika.bar/paoramen/local-first-search-01K1B0WM1X4P5SV5QAES0Z5N75&quot;&gt;Local-first search&lt;/a&gt; — Builds on that same local-first theme. Explains how search can work offline-first with sync happening in the background. Makes you rethink how “search” has to be architected.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2025W31</title><link>https://jordivillar.com/reads/2025w31/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w31/</guid><pubDate>Fri, 08 Aug 2025 15:45:13 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://colton.dev/blog/curing-your-ai-10x-engineer-imposter-syndrome/&quot;&gt;No, AI is not Making Engineers 10x as Productive&lt;/a&gt; — AI does not make engineers ten times more productive because much of coding involves thinking, reading, and preventing unnecessary work. AI helps with small tasks and gives short bursts of speed, but it often encourages haste and over-building. True productivity comes from experience, good habits, and teamwork, not just using AI tools.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.scattered-thoughts.net/writing/all-the-cool-kids-are-doing-it/?utm_source=scattered-thoughts&amp;#x26;utm_medium=email&amp;#x26;utm_campaign=all-the-cool-kids-are-doing-it-36dd&quot;&gt;All the cool kids are doing it&lt;/a&gt; — The author is unsure about using large language models (LLMs) for coding because they find them unreliable and costly. They feel LLMs don’t yet help much with complex or performance-focused programming tasks. However, they see some promise in LLMs for other uses like research help and code explanation.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.derekthompson.org/p/the-sunday-morning-post-why-exercise&quot;&gt;The Sunday Morning Post: Why Exercise Is a Miracle Drug&lt;/a&gt; — Exercise improves many parts of the body and is better than any medicine. Studies show exercise helps cancer patients live longer and stay healthier.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://justin.searls.co/posts/full-breadth-developers/&quot;&gt;Full-breadth Developers&lt;/a&gt; — Full-breadth developers who combine technical skills with product sense are thriving with new AI tools. Many companies struggle because they separate product and engineering roles too strictly.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://stephango.com/ramblings&quot;&gt;Ramblings&lt;/a&gt; — Remote teams of 2-10 people can use personal “ramblings” channels to share thoughts without cluttering group chats. These channels help team members stay connected with short, informal updates and encourage creativity. Ramblings reduce interruptions and support social bonding, especially when no meetings are scheduled.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://fika.bar/paoramen/ai-is-eating-the-internet-01K10JG1SHGZQHN61HPGWPXN60&quot;&gt;AI is eating the Internet&lt;/a&gt; — The Internet used to offer free content funded by ads from companies like Google and Facebook. Now, AI tools reuse content without sending visitors to original sites, hurting creators and changing how the web works. In the future, AI will control access to information, balancing ads, content makers, and users in new ways.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://michaelnotebook.com/slow/index.html&quot;&gt;Slow&lt;/a&gt; — Some problems take humans many years or even centuries to solve, like building cathedrals or proving complex math theorems. Long-term projects need strong institutions and steady effort to keep going over time. Examples include scientific studies, historic buildings, and technology that may last for thousands of years.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.catherinejue.com/fast&quot;&gt;Fast&lt;/a&gt; — Fast software changes how we work and feel by making tasks quicker and easier. It requires focus and simplicity, often cutting unnecessary features to deliver speed. As current LLM technology improves, speed will unlock new possibilities and transform our lives like never before.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://thrau.at/blog/why-im-stepping-down-as-head-of-engineering-20250726.html&quot;&gt;Why I’m stepping down as Head of Engineering&lt;/a&gt; — The author is stepping down as Head of Engineering to focus on hands-on technical work they love. As the company grew, their role became more about management, which made them feel less connected and impactful. Stepping aside allows new leaders to scale the organization while the author contributes where they add the most value.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://newsletter.vickiboykis.com/archive/my-favorite-use-case-for-ai-is-writing-logs/&quot;&gt;My favorite use-case for AI is writing logs&lt;/a&gt; — JetBrains created a small, fast AI model that helps developers write better log lines automatically in PyCharm. This model runs locally on your computer, making coding easier and debugging faster. It shows that focused, lightweight AI can be very useful alongside big, general models.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.nationalgeographic.com/premium/article/diet-magnesium-anxiety-sleep-better-myth?cmpid=org=ngp::mc=social::src=instagram::cmp=editorial::add=ig20250722health-magnesiumanxietysleepmythpremiumhedcard&amp;#x26;linkId=844116046&quot;&gt;​How magnesium affects your sleep and anxiety&lt;/a&gt; — Magnesium may help reduce mild anxiety but does not strongly improve sleep for most people. Many people do not get enough magnesium from their diet, and supplements can be useful, especially magnesium glycinate.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://antirez.com/news/154&quot;&gt;Coding with LLMs in the summer of 2025 (an update)&lt;/a&gt; — Large language models like Gemini 2.5 PRO help programmers by finding bugs, speeding up testing, and supporting design decisions when humans guide them carefully. To get the best results, programmers must provide clear context and stay actively involved in the coding process instead of relying on AI alone. Right now, working together with AI is more effective than letting AI work solo, but this may change in the future.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://curiosity.ventures/posts/why-me#user-content-fnref-2&quot;&gt;Why me?&lt;/a&gt; — When bad things happen, we often ask “Why me?” and feel singled out. But life brings both good and bad luck to everyone, and neither is guaranteed. Accepting this helps us feel less angry and more grateful for the good moments.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://worksonmymachine.substack.com/p/mcp-an-accidentally-universal-plugin&quot;&gt;MCP: An (Accidentally) Universal Plugin System&lt;/a&gt; — I used to think MCP added nothing over APIs. But this post clicked: APIs exposes services, MCP wraps actions, there are mores actions than calling an HTTP service: CLI, scripts, anything runnable. It’s a universal interface for doing, not just calling. Surprisingly powerful.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ordep.dev/posts/opinions-on-trends&quot;&gt;Making Sense of a Noisy World&lt;/a&gt; — Trends come and go, but sharing thoughtful opinions helps us learn and grow. Honest views can challenge ideas and create meaningful conversations. Careful thinking makes our opinions valuable and helps us understand the noisy world.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Am I Becoming Irrelevant?</title><link>https://jordivillar.com/blog/becoming-irrelevant/</link><guid isPermaLink="true">https://jordivillar.com/blog/becoming-irrelevant/</guid><pubDate>Mon, 14 Jul 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It feels silly to write just another blogpost about how to transition to a leadership role.&lt;/p&gt;
&lt;p&gt;Hopefully this is not a post about how to succeed in that transition. This is about how I felt every time I’ve been promoted to a leadership role. And how I haven’t been able to describe the frustration I felt.&lt;/p&gt;
&lt;p&gt;I’m writing it down (and publishing it) as a way to process it.&lt;/p&gt;
&lt;h3 id=&quot;i-used-to-feel-relevant&quot;&gt;I Used to Feel Relevant&lt;/h3&gt;
&lt;p&gt;It usually starts with a few years into a company. I got things done. I shipped features that mattered. Some were technically difficult, others politically delicate, and a few simply fell apart and had to be rebuilt from scratch. I knew the product inside out. I could speak the language of our users. People came to me when they needed hard problems solved.&lt;/p&gt;
&lt;p&gt;Now I manage a team.&lt;/p&gt;
&lt;p&gt;Some days, I don’t know what I do anymore.&lt;/p&gt;
&lt;p&gt;My worth was defined by my output:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Merged PRs&lt;/li&gt;
&lt;li&gt;Shipped features&lt;/li&gt;
&lt;li&gt;Firefights I could handle on my own&lt;/li&gt;
&lt;li&gt;Recognition that was public and immediate&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now, I spend most of my time trying to &lt;strong&gt;protect the team’s focus&lt;/strong&gt;, &lt;strong&gt;say no&lt;/strong&gt; to things that don’t matter, and &lt;strong&gt;convince smart people to work on the boring-but-important stuff&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;It doesn’t feel like impact. It feels like being a buffer. A gatekeeper. A middle-layer.&lt;/p&gt;
&lt;p&gt;I miss investing real, focused time into one hard thing. I miss the feeling of flow, of closing my laptop after a long day and knowing exactly what I achieved. And honestly? I miss being openly recognized.&lt;/p&gt;
&lt;p&gt;Now, when something goes right, it’s because the team succeeded. That’s good, I want that. But part of me also wonders if anyone notices the invisible work that went into it.&lt;/p&gt;
&lt;h3 id=&quot;what-am-i-actually-doing-all-day&quot;&gt;What Am I Actually Doing All Day?&lt;/h3&gt;
&lt;p&gt;Sometimes I ask myself this in a genuine, not sarcastic, way.&lt;/p&gt;
&lt;p&gt;I know I’m doing things that matter. I can point to them:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;I filter out noise&lt;/strong&gt; before it ever reaches the team.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;I say no&lt;/strong&gt; to low-leverage requests, including from people who outrank me.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;I help&lt;/strong&gt;, and clear the fog so others can move fast and make good calls.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;I absorb tension&lt;/strong&gt; between what others want, what the team needs, and what the product should become.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But none of that looks like a roadmap item. None of it gets demoed at the company all-hands.&lt;/p&gt;
&lt;p&gt;There’s a fear I don’t say out loud often, but I’ll write it here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I’ve seen managers get fired quietly for “not contributing” I don’t want to be one of them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So when I feel underused, or like the value I bring isn’t visible, I wonder if I’m just one reorg away from being irrelevant.&lt;/p&gt;
&lt;p&gt;I keep telling myself that enabling others is impact. That saying “no” is a kind of strategy. But some days that feels like a story I’m telling to make myself feel better. And on other days, it feels completely, absolutely true.&lt;/p&gt;
&lt;h3 id=&quot;what-i-know-is-working&quot;&gt;What I Know Is Working&lt;/h3&gt;
&lt;p&gt;Even with all this ambiguity, there are things I’m proud of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The team is focused. They’re not jumping between shiny distractions. Not always at least.&lt;/li&gt;
&lt;li&gt;They ship. Not chaotically, but deliberately, and with confidence.&lt;/li&gt;
&lt;li&gt;I’ve helped them grow into harder challenges, and they are.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I’m not hands-on in the code, but I’m hands-on in the conditions that allow good code to happen.&lt;/p&gt;
&lt;p&gt;Still, I’m figuring out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How to &lt;strong&gt;measure success&lt;/strong&gt; in this role in a way that means something to me.&lt;/li&gt;
&lt;li&gt;How to tell the story of my impact &lt;strong&gt;without needing applause&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;How to &lt;strong&gt;say no&lt;/strong&gt; in a way that builds trust, not distance.&lt;/li&gt;
&lt;li&gt;How to be okay with the fact that the more effective I am, the &lt;strong&gt;less visible&lt;/strong&gt; it often is.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;why-im-writing-this&quot;&gt;Why I’m Writing This&lt;/h3&gt;
&lt;p&gt;As I said, I’m writing this for myself. I’ve ended up leaving a few companies because of this feeling. It’s time to stop running away from it.&lt;/p&gt;
&lt;p&gt;But I’m also writing this for you. Maybe you’re feeling this too. Maybe you’re not.&lt;/p&gt;
&lt;p&gt;Maybe you used to be the go-to builder, and now you spend your time in meetings wondering if you’re just in the way. Maybe you’re coaching, steering, absorbing, and worrying that no one sees it.&lt;/p&gt;
&lt;p&gt;Maybe you’re saying “no” when it would be easier to say “yes,” and wondering if you’re going to pay a price for that.&lt;/p&gt;
&lt;p&gt;If that’s you, I don’t have a fix. But I can say this:&lt;/p&gt;
&lt;p&gt;You’re not irrelevant.&lt;br&gt;
You’re just operating in a layer that doesn’t clap for itself.&lt;/p&gt;
&lt;p&gt;And maybe, just maybe, the fact that your team is calm, focused, and shipping, while everything around them is chaotic, is the best evidence of your impact.&lt;/p&gt;
&lt;p&gt;Even if no one’s naming it out loud.&lt;/p&gt;</content:encoded></item><item><title>2025W23</title><link>https://jordivillar.com/reads/2025w23/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w23/</guid><pubDate>Tue, 10 Jun 2025 00:09:44 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.jeetmehta.com/posts/thrive-in-obscurity&quot;&gt;Thrive in obscurity&lt;/a&gt; — Creative success often starts in obscurity, with many creators spending years sharing their work with few or no viewers. To stay motivated, focus on creating what you love instead of chasing popularity, as this will lead to better work and attract like-minded fans. Treat your early content as an investment for future audiences, knowing they may return to appreciate your journey later on.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://betterthanrandom.substack.com/p/if-you-are-useful-it-doesnt-mean&quot;&gt;If you are useful, it doesn’t mean you are valued&lt;/a&gt; — In your career, being useful means you get tasks done well, while being valued means you contribute to important decisions and have growth opportunities. Useful people may receive rewards, but they often feel stagnant and lack strategic involvement. To succeed, it’s crucial to recognize whether you are truly valued or just seen as a reliable worker.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://fly.io/blog/youre-all-nuts/&quot;&gt;My AI Skeptic Friends Are All Nuts&lt;/a&gt; — The author argues that LLMs can significantly assist software developers by handling tedious coding tasks, allowing them to focus on more important work. Despite skepticism about LLM-generated code quality, the author believes that these tools can enhance productivity and reduce the need for repetitive tasks. Ultimately, while LLMs may not replace all developers, they can streamline the coding process and improve efficiency.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://longform.asmartbear.com/little-company/&quot;&gt;A Smart Bear » You’re a little company, now act like one&lt;/a&gt; — Small companies often fear that being perceived as small will hurt sales, but this can actually alienate their best customers. Instead of adopting generic corporate language, they should present themselves authentically to attract Early Adopters who appreciate personal connections and are willing to provide feedback. By being honest and relatable, small companies can foster relationships that help them grow and improve their products.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://browsercompany.substack.com/p/letter-to-arc-members-2025&quot;&gt;Letter to Arc members 2025&lt;/a&gt; — The Browser Company is shifting focus from Arc to a new product called Dia, aiming to create a better browser experience. They recognized that Arc had limitations and wanted to build something that integrates AI more effectively.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://tadaima.bearblog.dev/if-nothing-is-curated-how-do-we-find-things/&quot;&gt;If nothing is curated, how do we find things?&lt;/a&gt; — The rise of social media has made it harder to find curated content, leading to information overload and mental exhaustion. Critics and curators are needed now more than ever to help sift through the vast amount of content available.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dcurt.is/thinking&quot;&gt;Thoughts on thinking&lt;/a&gt; — The author feels stuck and believes that their creative efforts are overshadowed by AI’s ability to produce better ideas quickly. They reflect on how using AI has diminished their own thinking and intellectual growth, despite having access to more information than ever.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Query Booster: How Tinybird optimizes table schemas for you</title><link>https://jordivillar.com/blog/query-booster/</link><guid isPermaLink="true">https://jordivillar.com/blog/query-booster/</guid><pubDate>Mon, 12 May 2025 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Original article: &lt;a href=&quot;https://www.tinybird.co/blog-posts/query-booster-how-tinybird-optimizes-table-schemas-for-you&quot;&gt;https://www.tinybird.co/blog-posts/query-booster-how-tinybird-optimizes-table-schemas-for-you&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A customer recently reached out to us with an interesting observation: their query latency had mysteriously dropped without any manual intervention. What they discovered was our Query Booster, quietly optimizing their workload behind the scenes. Query Booster automatically adapts to your database usage patterns, fine-tuning data source schemas for optimal performance.&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fquery-booster.BbmyvYJl.png&amp;#38;w=1920&amp;#38;q=100&quot; alt=&quot;Query Booster&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1600&quot; width=&quot;1600&quot; height=&quot;367&quot;&gt;
&lt;small&gt;An example of Query Booster in action, reducing query latency from ~175 ms to &amp;lt;5 ms without manual intervention.&lt;/small&gt;
&lt;h3 id=&quot;why-we-built-the-query-booster&quot;&gt;Why we built the Query Booster&lt;/h3&gt;
&lt;p&gt;Slow database queries hurt your business in several ways. When queries are slow, you need more infrastructure and resources to run them, which costs more money. Your users have a worse experience because they have to wait longer for data. Engineers waste time fixing slow queries instead of building new features. And you might miss important insights because the data takes too long to arrive.&lt;/p&gt;
&lt;p&gt;To understand how Query Booster solves these problems, we need to understand how Tinybird works. In columnar databases, the sorting key is the most critical setting for performance. This key determines how data is physically stored on disk, how efficiently the database can filter and process queries, and serves as the default primary key and index structure.&lt;/p&gt;
&lt;p&gt;If you are filtering by a column that is not in the sorting key, the database must perform a full table scan. This means reading significantly more data than necessary, since the column values are scattered throughout the storage. As your data volume grows, this mismatch between sorting key and query patterns leads to increasingly degraded performance due to full table scans.&lt;/p&gt;
&lt;p&gt;In an ideal scenario, you’d know all your query patterns in advance and design the perfect sorting key. But reality is different. New use cases come up over time, query patterns evolve as your business grows, different teams access data in different ways, and performance requirements change constantly.&lt;/p&gt;
&lt;p&gt;When this happens, you face a common issue. Your queries start performing full table scans, causing performance to degrade significantly. Engineers spend valuable time manually optimizing schemas, which requires careful planning and potential downtime. And just when you think you’ve solved the problem, new patterns emerge, and the cycle begins again.&lt;/p&gt;
&lt;p&gt;This is where Query Booster comes in. Instead of manually tuning your database, Query Booster automatically monitors your query patterns, identifies optimization opportunities, and creates temporary optimized schemas. It continuously validates performance improvements and removes optimizations when they’re no longer needed.&lt;/p&gt;
&lt;p&gt;The result? Your database continuously adapts to your usage patterns, maintaining optimal performance without any manual intervention.&lt;/p&gt;
&lt;p&gt;How Query Booster works
Query Booster is like having a database performance expert working 24/7 to optimize your data source schemas. Here’s how it works:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Continuous Monitoring:&lt;/strong&gt; The system constantly watches your query patterns&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query Plan Analysis:&lt;/strong&gt; It identifies frequently run queries and analyzes their optimization potential&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automatic Optimization:&lt;/strong&gt; When beneficial, it creates temporary optimized data source schemas&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance Validation:&lt;/strong&gt; The system monitors the optimization’s effectiveness and removes it if it’s not delivering results&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let’s dive into some technical aspects that make Query Booster interesting:&lt;/p&gt;
&lt;h3 id=&quot;continuous-monitoring&quot;&gt;Continuous Monitoring&lt;/h3&gt;
&lt;p&gt;Every query executed in Tinybird is automatically monitored and analyzed. We collect metrics that paint a complete picture of your query performance. This includes query execution time, the amount of data processed, and resource utilization. We also track query patterns, frequency, and the specific filter conditions being used. These metrics are stored in our monitoring system and made available to users through Service Data Sources, allowing you to track query performance over time.&lt;/p&gt;
&lt;p&gt;Our solution reviews these metrics periodically to detect the queries processing the biggest amount of data and the ones being executed more frequently, as these represent critical parts of your workload.&lt;/p&gt;
&lt;p&gt;For queries considered optimizable, we analyze the query plan, which is a detailed representation of what actions the database will perform. It contains information about the data sources being accessed, the filters that are applied, transformations, etc. In our case, we extract and analyze this query plan by parsing the &lt;code&gt;EXPLAIN&lt;/code&gt; output. We use this information to look at whether the database is using the current sorting key effectively, whether the query is reading more data than necessary, and whether the current schema aligns with the query’s access patterns.&lt;/p&gt;
&lt;h3 id=&quot;query-plan-analysis&quot;&gt;Query Plan Analysis&lt;/h3&gt;
&lt;p&gt;Once we have the list of queries considered as optimizable, we analyze the query plan using the &lt;code&gt;EXPLAIN&lt;/code&gt; output. To do so, we check the following things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Check if the query is using the primary key (usually the sorting key).&lt;/strong&gt; If the primary key is not being used, we can directly consider the query as optimizable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Check if the query is reading enough rows.&lt;/strong&gt; If the query reads very few rows, it’s probably already efficient. No need to optimize further.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Check if the filtering is inefficient (low selectivity).&lt;/strong&gt; Specifically, we check if the query reads more than 25% of the table’s available rows. If it does, that means the current sorting key is not helping much — it’s not filtering efficiently. This is a good signal that a better sorting order could help.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s look at an example to better understand how we use the &lt;code&gt;EXPLAIN&lt;/code&gt; output. For a table with sorting keys as &lt;code&gt;[id_1, id_2]&lt;/code&gt;, and a query like this:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;-- GOOD QUERY&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; value&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; table_1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;WHERE&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; id_1 = &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;863244623&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;EXPLAIN&lt;/code&gt; output for this query would look like this:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;# GOOD QUERY - EXPLAIN&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;Expression ((Project names + Projection))          &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  Expression                                       &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    ReadFromMergeTree (default.table_1)            &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    Indexes:                                       &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;      PrimaryKey                                   &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Keys:                                      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;          id_1                                      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Condition: (id_1 &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;863244623&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;863244623&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]) &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Parts: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;5&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;5&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                                 &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Granules: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;5&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1224&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can see that the query is using the primary key, and it’s reading 5/1224 (~0.4%) of the table. This means that the query is filtering efficiently. If we now change the query to:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;-- BAD QUERY&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; value&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; table_1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;WHERE&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; id_2 = &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;863244623&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;EXPLAIN&lt;/code&gt; output looks like this:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;# BAD QUERY - EXPLAIN&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;Expression ((Project names + Projection))          &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  Expression                                       &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    ReadFromMergeTree (default.table_1)            &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    Indexes:                                       &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;      PrimaryKey                                   &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Keys:                                      &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;          id_2                                   &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Condition: (id_2 &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; [&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;863244623&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;863244623&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]) &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Parts: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;5&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;5&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                                 &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Granules: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1224&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1224&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can see that the query is using the primary key, but it’s reading 1224/1224 (100%) of the table. This is an extreme case, but it shows that the query is not filtering efficiently. We could change the sorting key to &lt;code&gt;[id_2, id_1]&lt;/code&gt; and the query would be faster, but that would make the initial query slower.&lt;/p&gt;
&lt;h3 id=&quot;automatic-optimization&quot;&gt;Automatic Optimization&lt;/h3&gt;
&lt;p&gt;If the query is still considered optimizable, we suggest a better data source schema. Now that we’ve confirmed optimization might help:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The query booster chooses columns using a combination of their cardinality and the type of filter they are being used for.&lt;/strong&gt; For example, columns (and their cardinality) are treated differently depending on if they are used for point filtering or range filtering.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sort the filter columns by cardinality (ascending).&lt;/strong&gt; Columns with lower cardinality (fewer distinct values) go first — they help partition the data into coarse groups. Higher cardinality columns go later — they help refine filtering within those groups.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Suggest the first four columns from this sorted list as a new optimized data source schema.&lt;/strong&gt; These columns become the new sorting key for the data source.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the resultant suggested sorting key is a prefix of the already existing sorting key, we will not suggest an optimization. For example, if the sorting key is &lt;code&gt;[a, b, c, d]&lt;/code&gt; and your query filters on &lt;code&gt;[a, b]&lt;/code&gt;, this is not considered optimizable. Why? Because it’s already aligned with the sorting key, and we are already using the primary index efficiently. A new optimization wouldn’t help.&lt;/p&gt;
&lt;p&gt;Otherwise, we will proceed to create a new data source schema with the suggested sorting key.&lt;/p&gt;
&lt;p&gt;The decision to just use data source sorting keys to improve the performance of a query comes from years of experience debugging customer query performance issues. &lt;a href=&quot;https://www.tinybird.co/docs/work-with-data/optimization&quot;&gt;As we explained in our optimization guides&lt;/a&gt;, most of the time a query is not performing as expected is due to a bad set of sorting keys.&lt;/p&gt;
&lt;h3 id=&quot;performance-validation&quot;&gt;Performance Validation&lt;/h3&gt;
&lt;p&gt;But that is not all, we log and monitor optimizations to keep track of the impact they have on the performance of the queries. We only keep optimizations that are still useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Optimizations that are not being used anymore due to clients stopping endpoint calls will be removed&lt;/strong&gt; but we’ll allow them to create them again if endpoint calls are back. If the query we tried to optimize is not executed in 24h, we discard the optimization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimizations that are not being used despite endpoints being hit.&lt;/strong&gt; This could be caused by sorting keys being wrong or by any other unexpected issues. Those optimizations will be “banned” and won’t be created again. There are two ways to determine if an optimization is not being used:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Optimization not being used at all:&lt;/strong&gt; If the query is being called, but the optimization is not being used.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimization is being used but not filtering efficiently:&lt;/strong&gt; If the optimization is being used, but the query is reading more data than when using the original sorting key.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fquery-booster-2.DBSTOpu1.png&amp;#38;w=1920&amp;#38;q=100&quot; alt=&quot;Query Booster&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1600&quot; width=&quot;1600&quot; height=&quot;340&quot;&gt;
&lt;small&gt;Another example of Query Booster in action.&lt;/small&gt;
&lt;p&gt;Coming back to the surprised customer, the results were good:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2x reduction in query times for their most common workloads&lt;/li&gt;
&lt;li&gt;Decreased CPU usage for the same queries&lt;/li&gt;
&lt;li&gt;Zero configuration or maintenance required&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And while this particular case showed a 2x improvement, we’ve seen cases where the optimization leads to 100x faster queries.&lt;/p&gt;
&lt;h3 id=&quot;intelligence-matters&quot;&gt;Intelligence matters&lt;/h3&gt;
&lt;p&gt;The beauty of Query Booster lies in its simplicity from a user’s perspective. You don’t need to be a database expert or spend time manually optimizing your queries. The system handles that for you, continuously learning and adapting to your specific usage patterns.&lt;/p&gt;
&lt;p&gt;Remember: The best optimizations are the ones you don’t have to think about. Query Booster embodies this principle by automatically improving your database performance while you focus on what matters most - your application and business logic.&lt;/p&gt;</content:encoded></item><item><title>2025W14</title><link>https://jordivillar.com/reads/2025w14/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w14/</guid><pubDate>Sun, 13 Apr 2025 12:26:32 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://simone.org/advertising/&quot;&gt;What If We Made Advertising Illegal?&lt;/a&gt; — The idea of making advertising illegal may sound silly but couldn’t agree more. I’ve been unlucky enough to work for the industry and can’t imagine a better society with the current practices.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pastori.sh/blog/on-kindness&quot;&gt;Notes on kindness&lt;/a&gt; — “So eventually, if you care, you leave”&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://julian.digital/2025/03/27/the-case-against-conversational-interfaces/&quot;&gt;The case against conversational interfaces&lt;/a&gt; — Conversational interfaces, like voice assistants and chatbots, often promise a new way to interact with technology but fail to replace traditional computing methods. Instead of replacing existing tools, AI should enhance them, allowing for seamless interactions that feel effortless.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://refactoringenglish.com/chapters/write-blog-posts-developers-read/&quot;&gt;How to Write Blog Posts that Developers Read&lt;/a&gt; — Didn’t like the main topic of the article since it’s focused on ways to attract readers to your blog posts. When I write I do it for myself and I expect other personal blogs to do the same. Anyway, some of the advice is good even if you’re not focused on attracting readers.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.anthropic.com/research/tracing-thoughts-language-model&quot;&gt;Tracing the thoughts of a large language model&lt;/a&gt; — Anthropic team explains how are they working on understanding Claude’s internal “reasoning” processes. This research branch could help improve AI reliability and transparency.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://elixir-lang.org/blog/2025/03/25/cyanview-elixir-case/&quot;&gt;Cyanview: Coordinating Super Bowl’s visual fidelity with Elixir&lt;/a&gt; — Leaving aside the Elixir infomercial. Interesting read about how a team of nine have managed to build such an incredible product. It might sound simple but as someone who has been working with remote cameras in the past, ai can tell it’s more complex than it sounds.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://notes.eatonphil.com/2025-03-27-things-that-go-wrong-with-disk-io.html&quot;&gt;Things that go wrong with disk IO&lt;/a&gt; — Short article emphasizing the critical importance of maintaining data integrity when developing applications that rely on disk interactions&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2025W11</title><link>https://jordivillar.com/reads/2025w11/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w11/</guid><pubDate>Sun, 23 Mar 2025 20:12:30 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.tinybird.co/blog-posts/tb-deploy?utm_campaign=Forward%20Launch&quot;&gt;Less ceremony, more shipping&lt;/a&gt; — Tinybird introduces their new deployment tool, which simplifies data schema changes for teams. This tool automates the entire deployment process, ensuring no downtime and minimizing errors. By applying software engineering best practices to data workflows, it allows developers to focus more on building rather than troubleshooting.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/hollylawly/status/1897332070571250144/?rw_tt_thread=True&quot;&gt;PlanetScale redesign&lt;/a&gt; — Twitter thread about how PlanetScale redesigned their website and the impact it had. And &lt;a href=&quot;https://x.com/hollylawly/status/1897427145774772365&quot;&gt;a follow-up&lt;/a&gt; showing how the original redesign was done using Google Docs!&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://planetscale.com/blog/io-devices-and-latency&quot;&gt;IO devices and latency&lt;/a&gt; — Nice post about IO devices latency and how they have evolved. It comes with excellent illustrations. The author also wrote a &lt;a href=&quot;https://x.com/benjdicken/status/1901659372838928631?s=46&quot;&gt;Twitter thread&lt;/a&gt; with the making off and the impact this kind of educational technical posts have on their marketing.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.seangoedecke.com/good-times-are-over/&quot;&gt;The good times in tech are over&lt;/a&gt; — I don’t personally like the post but it’s true.  Good times are over for &lt;a href=&quot;https://jordivillar.com/blog/beyond-code&quot;&gt;software engineers that were not aware of what their job was&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://curiosity.ventures/posts/before-ai-grew-up&quot;&gt;Before AI Grew Up&lt;/a&gt; — A guy explaining to his kids in the future how AI feels nowadays. Nicely written and hopefully accurate.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2025W09</title><link>https://jordivillar.com/reads/2025w09/</link><guid isPermaLink="true">https://jordivillar.com/reads/2025w09/</guid><pubDate>Sun, 09 Mar 2025 13:10:52 GMT</pubDate><content:encoded>&lt;p&gt;It’s been a while since the last time I published a post with the list of articles I’ve read. This one contains the highlights from the past few months.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://fly.io/blog/wrong-about-gpu/?utm_source=hackernewsletter&amp;#x26;utm_medium=email&amp;#x26;utm_term=fav&quot;&gt;We Were Wrong About GPUs&lt;/a&gt; — Fly.io invested in GPU Machines to support AI/ML tasks for developers, but they learned that most developers prefer using APIs for AI instead of managing GPUs themselves. Despite the challenges and costs of deploying GPUs, they found that the demand for these machines was lower than expected. The company is now scaling back its GPU ambitions while focusing on improving its core product offerings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://marginalrevolution.com/marginalrevolution/2025/02/why-i-think-ai-take-off-is-relatively-slow.html&quot;&gt;Why I think AI take-off is relatively slow&lt;/a&gt; — Tyler Cowen argues that the rapid adoption of AI is slowed down by inefficiencies in less productive sectors and human bottlenecks, like regulatory delays. He believes that while AI can improve productivity, it will not transform the economy as quickly as expected due to the challenges of integrating it with human work. Overall, Cowen suggests that AI might boost economic growth modestly, but noticeable changes will take time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.robinsloan.com/lab/is-it-okay/&quot;&gt;Is it okay?&lt;/a&gt; — The author discusses the impact of language models on creativity and copyright, arguing that their reliance on the entire internet raises ethical concerns. He believes that if these models only replicate existing content without contributing to human creativity, it is problematic. However, if they lead to significant advancements in science and technology, their use may be justified.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.lewissociety.org/innerring/&quot;&gt;The Inner Ring&lt;/a&gt; — The concept of the “Inner Ring” describes the human desire to belong to exclusive groups, which can lead to negative actions and feelings of exclusion. This longing for acceptance can overshadow genuine connections and fulfillment in life. True happiness comes from focusing on meaningful work and relationships rather than chasing after superficial insider status.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://s2.dev/blog/intro&quot;&gt;Introducing S2&lt;/a&gt; — Introduction to S2, a new approach to streaming data storage.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://mitchellh.com/writing/ghostty-1-0-reflection&quot;&gt;Ghostty: Reflecting on Reaching 1.0&lt;/a&gt; — This is a personal reflection on the project.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://waitbutwhy.com/2014/02/pick-life-partner-part-2.html&quot;&gt;How to Pick Your Life Partner - Part 2&lt;/a&gt; — To endure 20,000 days, 100 vacations, and 100,000 leisure hours with another human being and do so happily, there are three key ingredients necessary.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://waitbutwhy.com/2014/02/pick-life-partner.html&quot;&gt;How to Pick Your Life Partner - Part 1&lt;/a&gt; — Given that the choice of life partner is by far the most important thing in life to get right, how is it possible that so many smart people get it so wrong?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://eieio.games/essays/the-secret-in-one-million-checkboxes/&quot;&gt;The secret inside One Million Checkboxes&lt;/a&gt; — Teens wrote secret binary messages in One Million Checkboxes. The author found them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://37signals.com/group-chat-problems/&quot;&gt;Group Chat: The Best Way to Totally Stress Out Your Team&lt;/a&gt; — The perils of the modern communications conveyor belt that never ends, divides your attention, fractures your time, and chains you to FOMO.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://oxide.computer/blog/reflections-on-founder-mode&quot;&gt;Reflections on Founder Mode&lt;/a&gt; — Reflections on a recent Paul Graham piece – and on the culture at Oxide&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://paulgraham.com/foundermode.html&quot;&gt;Founder Mode&lt;/a&gt; — At a YC event last week Brian Chesky gave a talk that everyone who
was there will remember. Most founders I talked to afterward said it was the best they’d ever heard. Ron Conway, for the first time in his life, forgot to take notes. I’m not going to try to reproduce it here. Instead I want to talk about a question it raised.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://notes.eatonphil.com/2024-08-24-obsession.html&quot;&gt;Obsession&lt;/a&gt; — Reflections on the concept of healthy obsession and how it can be a source of motivation and success.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://stevedylan.dev/posts/leaving-neovim-for-zed/&quot;&gt;Leaving Neovim for Zed&lt;/a&gt; — A journey through text editors and how I landed on Zed after years of Neovim&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://litestream.io/blog/why-i-built-litestream/&quot;&gt;Why I Built Litestream&lt;/a&gt; — Despite an exponential increase in computing power, our applications require more machines than ever because of architectural decisions made 25 years ago. You can eliminate much of your complexity and cost by using SQLite &amp;#x26; Litestream for your production applications.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://registerspill.thorstenball.com/p/its-in-the-stories&quot;&gt;It’s in the stories&lt;/a&gt; — The author reflects on the importance of storytelling in leadership, sharing anecdotes about influential figures.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://blog.codingconfessions.com/p/simultaneous-multithreading&quot;&gt;Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood&lt;/a&gt; — Simultaneous multithreading (SMT) is a feature that lets a processor handle instructions from two different threads at the same time. But have you ever wondered how this actually works?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://cedardb.com/blog/german_strings/&quot;&gt;Why German Strings are Everywhere&lt;/a&gt; — German Strings are a custom string format developed for optimized data processing, adopted by many databases, focusing on performance and memory efficiency.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.warpstream.com/blog/the-kafka-metric-youre-not-using-stop-counting-messages-start-measuring-time&quot;&gt;The Kafka Metric You’re Not Using: Stop Counting Messages, Start Measuring Time&lt;/a&gt; — Traditional offset-based monitoring can be misleading due to varying message sizes and consumption rates. To address this, you can introduce a time-based metric for a more accurate assessment of consumer group lag.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>130 hours</title><link>https://jordivillar.com/notes/130-hours/</link><guid isPermaLink="true">https://jordivillar.com/notes/130-hours/</guid><pubDate>Sun, 29 Dec 2024 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;For the last few years, especially after the pandemic forced us to switch to remote work, I’ve been searching for ways to break free from a stressful, sedentary lifestyle.&lt;/p&gt;
&lt;p&gt;I’ve been trying to be more consistent. I’ve tried different approaches, but each attempt would start with enthusiasm, only to gradually fade away.&lt;/p&gt;
&lt;p&gt;This year something was different. I still don’t know what it was, but I’ve been able to keep making progress. I’ve managed to work out for +130 hours. Almost a 70% increase from last year. Around 20 minutes a day.&lt;/p&gt;
&lt;p&gt;These numbers aren’t impressive, but they represent something far more valuable for me: consistency. I’m proud not just of the numbers, but of maintaining a regular routine despite facing significant personal challenges during the second half of the year.&lt;/p&gt;
&lt;p&gt;Next year, I’ll try to keep making progress. I’m going to keep making progress. &lt;strong&gt;I’m going to keep making progress.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;em&gt;This is the closest I’ll ever publicly get to a year review and a new year resolution. I’ve created &lt;a href=&quot;/workouts&quot;&gt;this page to keep track of my progress&lt;/a&gt;.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;</content:encoded></item><item><title>Beyond Code</title><link>https://jordivillar.com/blog/beyond-code/</link><guid isPermaLink="true">https://jordivillar.com/blog/beyond-code/</guid><pubDate>Fri, 06 Dec 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Let’s get real for a moment. I’ve been in the software engineering game for a while now. And yet, the impostor syndrome has been haunting me from time to time: what makes me valuable? I’ve been lucky to work with some amazing engineers who can solve any technical challenge, and I’ve seen them struggle with the same question.&lt;/p&gt;
&lt;p&gt;Here’s what I believe: our most valuable skills aren’t about how many programming languages you can juggle or how many complex algorithms you can solve.&lt;/p&gt;
&lt;h3 id=&quot;the-hard-skills-trap&quot;&gt;The Hard Skills Trap&lt;/h3&gt;
&lt;p&gt;Remember when we all thought being a rockstar engineer meant knowing every framework and writing the most mind-blowingly complex code? Yeah, I’ve been there. I used to look at my colleagues who could deep dive into low-level technical rabbit holes like the internals of Linux and feel like I was somehow falling short.&lt;/p&gt;
&lt;p&gt;Early in my career, I was obsessed with proving my technical prowess. I’d spend countless hours trying to optimize code that was already working, learning obscure programming techniques, and collecting certifications like they were badges of honor. I thought these were the markers of a great engineer. Spoiler alert: they’re not.&lt;/p&gt;
&lt;p&gt;Spoiler alert: I wasn’t falling short. I was just playing a different game.&lt;/p&gt;
&lt;h3 id=&quot;what-you-should-actually-be-good-at&quot;&gt;What You Should Actually Be Good At&lt;/h3&gt;
&lt;p&gt;Here’s the thing: the real superpower is understanding how stuff works. And I mean really works. Not just the code, but everything around it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What does the product actually need?&lt;/li&gt;
&lt;li&gt;What is the client really trying to solve?&lt;/li&gt;
&lt;li&gt;How does this fit into the bigger picture?&lt;/li&gt;
&lt;li&gt;Will this solution actually make someone’s life easier?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;ship-it-dont-perfect-it&quot;&gt;Ship It, Don’t Perfect It&lt;/h3&gt;
&lt;p&gt;I’m a huge believer in the “done is better than perfect” philosophy. I’m driven by action and the smaller possible changes that can be shipped to deliver value. And trust me, this isn’t not only about being lazy. It’s about understanding that software is about delivering solutions to actual problems, not an artisan job where you are crafting museum like pieces of code.&lt;/p&gt;
&lt;p&gt;Want to know a good approach? If something works and solves the core problem, ship it. Iterate later. The world doesn’t need another project stuck in endless refinement limbo.&lt;/p&gt;
&lt;h3 id=&quot;the-skills-they-dont-teach-in-bootcamp&quot;&gt;The Skills They Don’t Teach in Bootcamp&lt;/h3&gt;
&lt;p&gt;The most valuable things I’ve learned aren’t in any curriculum:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How to explain tech stuff to non-tech people without making their eyes glaze over&lt;/li&gt;
&lt;li&gt;Seeing the entire system you are building, not just the tiny technical corner&lt;/li&gt;
&lt;li&gt;Making decisions that balance “what’s possible” with “what’s actually needed”&lt;/li&gt;
&lt;li&gt;Moving fast and adapting quickly&lt;/li&gt;
&lt;li&gt;Understanding that every technical decision has a human impact&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What’s fascinating is that these skills aren’t static. They evolve constantly. Every project is a new opportunity to understand more deeply, to communicate more clearly, to solve problems more effectively. The moment you think you’ve mastered these skills is the moment you stop growing.&lt;/p&gt;
&lt;p&gt;The backside of this ongoing learning is that most of the time you will need to trust your gut and make decisions based on the context, even when you don’t have all the information. It’s about developing a intuition that comes from experience, not from perfect knowledge. Sometimes, a well-informed guess guided by your accumulated understanding is more valuable than paralysis by analysis.&lt;/p&gt;
&lt;h3 id=&quot;take-it-or-leave-it&quot;&gt;Take It or Leave It&lt;/h3&gt;
&lt;p&gt;To any engineer feeling like they’re not technical enough: Stop. Your ability to understand context, communicate clearly, and drive solutions is worth way more than being able to write the most elegant code.&lt;/p&gt;
&lt;p&gt;Technical skills? They’re important. But the ability to see the whole chess board? That’s the real game-changer.&lt;/p&gt;
&lt;p&gt;In this wild world of software engineering, being a great technician is cool. But being someone who can actually solve real problems and create value? That’s where the magic happens.&lt;/p&gt;
&lt;p&gt;The best engineers aren’t those who know the most, they’re those who understand the most. Understanding the problem, the people, the context, that’s the true skill that separates good engineers from great ones.&lt;/p&gt;
&lt;p&gt;So keep learning, stay curious, and remember: your most important line of code is the one that makes someone’s life better.&lt;/p&gt;</content:encoded></item><item><title>The Joy and Pain of Problem-Solving</title><link>https://jordivillar.com/notes/problem-solving/</link><guid isPermaLink="true">https://jordivillar.com/notes/problem-solving/</guid><pubDate>Sun, 06 Oct 2024 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Investigating challenging problems and bugs is an exciting experience that often leads to remarkable discoveries and personal growth. As a engineer, I’ve always found joy in decoding complex issues and diving deep into the unknown. The thrill of the hunt, the satisfaction of piecing together clues, and the ahá moment when everything falls into place are unique.&lt;/p&gt;
&lt;p&gt;One of the most rewarding aspects of problem-solving is the opportunity to learn new things. Every bug, every system failure, and every unexpected behavior is a chance to expand your knowledge and skillset. It’s like being a detective in the digital world, where each case brings its own unique challenges and lessons.&lt;/p&gt;
&lt;p&gt;The skills developed through problem-solving and investigation are not only valuable in professional settings but can also be applied to personal life. Embracing the challenge of investigating problems and bugs not only makes us better professionals but also equips us with valuable life skills. It teaches us to be persistent, adaptable, and open to learning.&lt;/p&gt;
&lt;p&gt;While the truths we uncover can be difficult or painful, they ultimately lead us to growth and a deeper understanding of ourselves and the world around us.&lt;/p&gt;
&lt;p&gt;Indeed, these problem-solving skills can sometimes lead to unexpected and challenging discoveries in our personal relationships. Such as uncovering long-term-perfectly-hidden lies and infidelities.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;em&gt;Slightly inspired by this thread: &lt;a href=&quot;https://twitter.com/rosapolis/status/1828014752112545924&quot;&gt;https://twitter.com/rosapolis/status/1828014752112545924&lt;/a&gt;&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;</content:encoded></item><item><title>Coding with AI</title><link>https://jordivillar.com/notes/coding-ai/</link><guid isPermaLink="true">https://jordivillar.com/notes/coding-ai/</guid><pubDate>Fri, 23 Aug 2024 00:00:00 GMT</pubDate><content:encoded>&lt;img src=&quot;/_vercel/image?url=_astro%2Fcoding-ai-1.QbckdsL1.png&amp;#38;w=640&amp;#38;q=100&quot; alt=&quot;Zed screenshot&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;512&quot; width=&quot;512&quot; height=&quot;311&quot;&gt;
&lt;p&gt;I’ve never been enthusiastic about the AI revolution.&lt;/p&gt;
&lt;p&gt;However, this week, I started using &lt;a href=&quot;https://zed.dev/&quot;&gt;Zed.dev&lt;/a&gt; together with their AI assistant integration. I’m surprised by the experience and efficiency gains. I still have reservations, but it can be valuable allies in tackling programming challenges and routine tasks.&lt;/p&gt;
&lt;p&gt;For example, a few days ago, I was working on a tool that needed to parse ClickHouse explain output&lt;sup&gt;&lt;a href=&quot;#user-content-fn-1&quot; id=&quot;user-content-fnref-1&quot; data-footnote-ref aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, an easy task requiring recursively transverse all the properties looking for some specific properties.&lt;/p&gt;
&lt;p&gt;With a simple prompt, the model wrote a working version.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Write code to extract all primary key granules from explain_result. An example of explain_result here:&lt;/p&gt;
&lt;p&gt;{…}&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;def&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; extract_primary_key_granules&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;explain_result&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    def&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; traverse&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;node&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        granules = []&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;        if&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; isinstance&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(node, &lt;/span&gt;&lt;span style=&quot;color:#4EC9B0&quot;&gt;dict&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;            if&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; &amp;#39;Indexes&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; in&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; node:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;                for&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; index &lt;/span&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; node[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Indexes&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;                    if&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; index.get(&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Type&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;) == &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;PrimaryKey&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                        granules.append({&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                            &amp;#39;Keys&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: index.get(&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Keys&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, []),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                            &amp;#39;Initial Granules&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: index.get(&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Initial Granules&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                            &amp;#39;Selected Granules&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: index.get(&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Selected Granules&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                        })&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;            for&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; value &lt;/span&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; node.values():&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                granules.extend(traverse(value))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;        elif&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; isinstance&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(node, &lt;/span&gt;&lt;span style=&quot;color:#4EC9B0&quot;&gt;list&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;):&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;            for&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; item &lt;/span&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; node:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                granules.extend(traverse(item))&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;        return&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; granules&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;    return&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; traverse(explain_result)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A9955&quot;&gt;# Example usage:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;result = extract_primary_key_granules(explain_result)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;for&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; granule &lt;/span&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;in&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; result:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;    print&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;f&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;quot;Keys: &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;granule[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Keys&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;}&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;    print&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;f&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;quot;Initial Granules: &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;granule[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Initial Granules&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;}&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;    print&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;f&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;quot;Selected Granules: &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;granule[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;Selected Granules&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;}&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;    print&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The task was relatively easy and I had to make changes and improve the initial code. Despite that, this saved me at least an hour of back-and-forth writing the function and running some tests to see if it worked as expected!&lt;/p&gt;
&lt;p&gt;I started using Zed AI assistant with ChatGPT integration. A day after they released &lt;a href=&quot;https://zed.dev/blog/zed-ai&quot;&gt;their own integration&lt;/a&gt; that I’m now testing together with Anthropic’s Claude model. I’m not even using Google for simple searches anymore.&lt;/p&gt;
&lt;section data-footnotes class=&quot;footnotes&quot;&gt;&lt;h2 class=&quot;sr-only&quot; id=&quot;footnote-label&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-1&quot;&gt;
&lt;p&gt;This is how a &lt;a href=&quot;https://gist.githubusercontent.com/jrdi/5934c25ce11e003877e1f319096d7d87/raw/f447390e7234db5ad2015ae7301f474a86df7b56/clickhouse_explain.json&quot;&gt;ClickHouse explain output&lt;/a&gt; looks like &lt;a href=&quot;#user-content-fnref-1&quot; data-footnote-backref aria-label=&quot;Back to reference 1&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;</content:encoded></item><item><title>2024W28</title><link>https://jordivillar.com/reads/2024w28/</link><guid isPermaLink="true">https://jordivillar.com/reads/2024w28/</guid><pubDate>Mon, 15 Jul 2024 00:00:00 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://robert.ocallahan.org/2024/06/browser-engine.html&quot;&gt;So You Want To Build A Browser Engine&lt;/a&gt; —
This article goes deep into all the complicated things involved in making a browser engine.
It’s amazing how browsers keep getting better and adding more features while maintaining all the required complexity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.blef.fr/databricks-snowflake-and-the-future/?ref=data-news-newsletter/&quot;&gt;Databricks, Snowflake and the future&lt;/a&gt; —
This piece tells you about the recent meetings of Snowflake and Databricks. It focuses on how they are putting money into Apache Iceberg.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://buttondown.email/jaffray/archive/in-codd-we-trust-or-not/&quot;&gt;The time I spent three months investigating a 7-year old bug and fixed it in 1 line of code&lt;/a&gt; —
This is an interesting story about someone spending three months trying to fix a really old bug. And how they only needed one line of code to solve it.
Nice read about debugging software and hardware.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://tonsky.me/blog/crdt-filesync/&quot;&gt;Local, first, forever&lt;/a&gt; —
An insightful article on the local-first philosophy and the implementation of services without reliance on a centralized server.
Includes an introduction to CRDT and a sample implementation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec2024.pdf&quot;&gt;What Goes Around Comes Around… And Around…&lt;/a&gt; —
A retrospective review of the evolution of relational database management systems over the past two decades.
It gives a good overview of how things are and how have evolved.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html&quot;&gt;A write-ahead log is not a universal part of durability&lt;/a&gt; —
Explores the concept of durability in databases, challenging the notion of a write-ahead log as a universal component.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://utcc.utoronto.ca/~cks/space/blog/tech/FsyncDurabilityVsIntegrity&quot;&gt;Unix’s fsync(), write ahead logs, and durability versus integrity&lt;/a&gt; —
Expands on Phil Eaton’s post about database durability, incorporating the crucial aspect of integrity into the discussion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://paulgraham.com/persistence.html&quot;&gt;The Right Kind of Stubborn&lt;/a&gt; —
Paul Graham talks about two types of stubbornesss: persistence and obstinacy.
He explains that it’s good to keep going when you need to, but it’s bad when you won’t change your mind.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2024W24</title><link>https://jordivillar.com/reads/2024w24/</link><guid isPermaLink="true">https://jordivillar.com/reads/2024w24/</guid><pubDate>Fri, 21 Jun 2024 00:00:00 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://zed.dev/blog/zed-decoded-rope-sumtree&quot;&gt;Zed Decoded: Rope &amp;#x26; SumTree&lt;/a&gt; —
The Zed team explained how they use B+Tree (or SumTree) to deal with file content in their text editor.
I have never considered how text editors use different structures to represent and manipulate huge strings while maintaining good performance.
Interesting read.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://avi.im/blag/2024/sqlite-bad-rep/&quot;&gt;Why does SQLite (in production) have such a bad rep?&lt;/a&gt; —
A short post about the controversy around SQLite, a database often perceived as unsuitable for production.
The most interesting thing is the discussions originated in &lt;a href=&quot;https://lobste.rs/s/fxkk7v/why_does_sqlite_production_have_such_bad&quot;&gt;lobste.rs&lt;/a&gt; and &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1djkt2y/why_does_sqlite_in_production_have_such_a_bad_rep&quot;&gt;Reddit&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://buttondown.email/jaffray/archive/in-codd-we-trust-or-not/&quot;&gt;In Codd we Trust (or not)&lt;/a&gt; —
A brief reflection on Codd’s original idea, which separates logical and physical schema, and the significant role query planners and optimizers play in modern database performance.
While I understand the point of letting experienced users write their queries and execute them as they are, I can’t fully agree.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://blog.nelhage.com/post/some-opinionated-sql-takes/&quot;&gt;Some opinionated thoughts on SQL databases&lt;/a&gt; —
Another opinionated reflection on relational databases and SQL.
I agree with some points. Still, SQL is here to stay.
The point that resonated more is how inefficient it is to use SQL as an API in some workloads. I’ve seen query parsing take +50% of the processing time in low latency and high throughput environments. Of course, those numbers are for very specific use cases (e.g., point queries) where the data being read is small, taking just a few milliseconds. Still, the overhead is relevant, and there is little room for improvement when using plain SQL.
On the other hand, it is good to use almost all relational databases using a single API language. If you are an experienced PostgreSQL user, you can query a MySQL database without investing hours learning how the new API works.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://luminousmen.com/post/senior-engineer-fatigue&quot;&gt;Senior Engineer Fatigue&lt;/a&gt; —
An article to justify why older people get slower at their jobs.
Jokes aside, it’s a good read about why sometimes you need to slow down to get faster. And it applies not only to senior engineers!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://motherduck.com/blog/olap-database-in-browser/&quot;&gt;What Happens When You Put a Database in Your Browser?&lt;/a&gt; —
The browser is the new OS. A DuckDB article about running their software in browsers using Wasm. The text provides a good example of pre-visualizing parquet file schema on the fly from the browser.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>2024W23</title><link>https://jordivillar.com/reads/2024w23/</link><guid isPermaLink="true">https://jordivillar.com/reads/2024w23/</guid><pubDate>Sat, 15 Jun 2024 00:00:00 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://waitbutwhy.com/2014/12/what-makes-you-you.html&quot;&gt;What Makes You You?&lt;/a&gt; —
Article exploring theories of personal identity and what makes us who we are&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://yaledailynews.com/blog/2022/03/29/how-people-fall-apart-yale-faculty-discuss-the-impact-of-burnout-on-the-brain/&quot;&gt;How people fall apart&lt;/a&gt; —
Essay discussing burnout’s impact on the brain&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://mazzo.li/posts/mac-distributed-tx.html&quot;&gt;Message authentication codes for safer distributed transactions&lt;/a&gt; —
A short post on using message authentication codes to prevent bugs in distributed systems&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://blog.resonatehq.io/deterministic-simulation-testing&quot;&gt;Deterministic Simulation Testing&lt;/a&gt; —
A brief introduction to Deterministic Simulation Testing. A crucial framework for building reliable distributed systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sqlite.org/draft/whybytecode.html&quot;&gt;Why SQLite Uses Bytecode&lt;/a&gt; —
An explanation of why SQLite uses bytecode for easier understanding and debugging.
The most exciting thing is that this comes from an interesting convo on Twitter&lt;sup&gt;&lt;a href=&quot;#user-content-fn-1&quot; id=&quot;user-content-fnref-1&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 class=&quot;sr-only&quot; id=&quot;footnote-label&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-1&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/DRichardHipp/status/1785037995101290772&quot;&gt;https://twitter.com/DRichardHipp/status/1785037995101290772&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-1&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to reference 1&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;</content:encoded></item><item><title>/proc/thread-self</title><link>https://jordivillar.com/notes/proc/</link><guid isPermaLink="true">https://jordivillar.com/notes/proc/</guid><pubDate>Thu, 02 May 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I’ve never invested much time in understanding how operating systems work. It has never been on my interests list, nor have I considered it useful knowledge for my career. What can I say? We all make bad decisions in our lives.&lt;/p&gt;
&lt;p&gt;Over the last two years, I have invested a lot of time in understanding how some things work in Linux. Those are the consequences (or benefits?) of working with a complex and low-level system like a database.&lt;/p&gt;
&lt;p&gt;This week, I was working on tracking data that was read from the page cache. While debugging some ClickHouse metrics directly collected from the Kernel, I discovered the &lt;code&gt;/proc/thread-self&lt;/code&gt; directory&lt;sup&gt;&lt;a href=&quot;#user-content-fn-1&quot; id=&quot;user-content-fnref-1&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;/proc/thread-self&lt;/p&gt;
&lt;p&gt;When a thread accesses this magic symbolic link, it resolves to the
process’s own /proc/self/task/[tid] directory.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So &lt;code&gt;/proc/thread-self&lt;/code&gt; points to the &lt;code&gt;proc&lt;/code&gt; folder for the current thread. In those folders, you can find a lot of helpful information. In my case, I was interested in &lt;code&gt;/proc/thread-self/io&lt;/code&gt; &lt;sup&gt;&lt;a href=&quot;#user-content-fn-2&quot; id=&quot;user-content-fnref-2&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, where you have IO statistics.&lt;/p&gt;
&lt;p&gt;I was focused on investigating whether the Kernel reported bytes read from S3 inside &lt;code&gt;rchar&lt;/code&gt;. I shared more information in &lt;a href=&quot;https://github.com/ClickHouse/ClickHouse/pull/63276&quot;&gt;this PR&lt;/a&gt; in ClickHouse repository. Despite the PR being closed, the examples I shared there still hold valuable insights and a reproducer that can contribute to the collective understanding of the system.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 class=&quot;sr-only&quot; id=&quot;footnote-label&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-1&quot;&gt;
&lt;p&gt;There is also a &lt;code&gt;/proc/self&lt;/code&gt; for the current process. This is something I didn’t know either. &lt;a href=&quot;#user-content-fnref-1&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to reference 1&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-2&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://docs.kernel.org/filesystems/proc.html#proc-pid-io-display-the-io-accounting-fields&quot;&gt;https://docs.kernel.org/filesystems/proc.html#proc-pid-io-display-the-io-accounting-fields&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-2&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to reference 2&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;</content:encoded></item><item><title>Data Engineering papers/articles</title><link>https://jordivillar.com/notes/papers/</link><guid isPermaLink="true">https://jordivillar.com/notes/papers/</guid><pubDate>Sat, 04 Nov 2023 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It’s quite common to find folks sharing papers on minor breakthroughs in areas like NLP, Computer Vision, Machine Learning, Deep Learning, and related fields here.&lt;/p&gt;
&lt;p&gt;We’ve carefully gathered a collection of papers focused on databases, distributed systems, and data in general. We’ve explored the latest developments in these areas and gained valuable insights from the many articles in our list.&lt;/p&gt;
&lt;p&gt;Some of these publications were suggested to us after we made our list public, adding to the richness of our curated collection.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf&quot;&gt;MapReduce: Simplified Data Processing on Large Clusters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://amplab.cs.berkeley.edu/wp-content/uploads/2011/06/Spark-Cluster-Computing-with-Working-Sets.pdf&quot;&gt;Spark: Cluster Computing with Working Sets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://notes.stephenholiday.com/Kafka.pdf&quot;&gt;Kafka: a Distributed Messaging System for Log Processing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://storage.googleapis.com/pub-tools-public-publication-data/pdf/36632.pdf&quot;&gt;Dremel: Interactive Analysis of Web-Scale Datasets&lt;/a&gt;: A paper describing the technology behind Google BigQuery&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45a6cea2b9c101761ea1b51c961628093ec1d5da.pdf&quot;&gt;Procella: Unifying serving and analytical data at YouTube&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying&quot;&gt;The Log: What every software engineer should know about real-time data’s unifying abstraction&lt;/a&gt;: Not a paper but an article from one of the Kafka creators. He explains the basic data structure, key for many databases, and modern distributed systems.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://erlang.org/download/armstrong_thesis_2003.pdf&quot;&gt;Making reliable distributed systems in the presence of software errors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pages.lip6.fr/Marc.Shapiro/papers/RR-7687.pdf&quot;&gt;Conflict-free Replicated Data Types (CRDT)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://arxiv.org/pdf/1603.01529&quot;&gt;Delta State Replicated Data Types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.microsoft.com/en-us/research/uploads/prod/2016/12/Time-Clocks-and-the-Ordering-of-Events-in-a-Distributed-System.pdf&quot;&gt;Time, Clocks and the Ordering of Events in a Distributed System&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf&quot;&gt;Dynamo: Amazon’s Highly Available Key-value Store&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf&quot;&gt;Linearizability: A Correctness Condition for Concurrent Objects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p422-bloom.pdf&quot;&gt;Space/Time Trade-offs in Hash Coding with Allowable Errors&lt;/a&gt;: Not especially interesting reading but had to mention Bloom Filters and this is the original paper. Perhaps the most surprising data structure I discovered working with data.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dataintensive.net&quot;&gt;Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems&lt;/a&gt;: Not a paper but a book. THE book for anyone interested in databases, data, and distributed systems.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/2517349.2522738&quot;&gt;Naiad: A Timely Dataflow System&lt;/a&gt;: Precursor paper of &lt;a href=&quot;https://materialize.com&quot;&gt;Materialize&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/2882903.2903741&quot;&gt;The Snowflake Elastic Data Warehouse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf&quot;&gt;The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Resolving a year-long ClickHouse lock contention</title><link>https://jordivillar.com/blog/clickhouse-lock-contention/</link><guid isPermaLink="true">https://jordivillar.com/blog/clickhouse-lock-contention/</guid><pubDate>Mon, 30 Oct 2023 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Original post: &lt;a href=&quot;https://www.tinybird.co/blog-posts/clickhouse-lock-contention&quot;&gt;https://www.tinybird.co/blog-posts/clickhouse-lock-contention&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is a story about completely saturating a ClickHouse replica (and being happy about it). We were able to fix a long-running issue that limited our query concurrency, and we increased CPU utilization to 100% in the process.&lt;/p&gt;
&lt;h3 id=&quot;the-unknown-limit&quot;&gt;The unknown limit&lt;/h3&gt;
&lt;p&gt;The story begins about a year ago. One of our ClickHouse clusters started to malfunction, causing some queries to slow down significantly. At first glance, everything seemed fine. We checked the usual bottlenecks - CPU, memory, I/O - and all checked out. But still, at a certain number of requests per second, query responses on the replica would slow down. We couldn’t figure out why.&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=%2Fclickhouse-lock-contention%2Fcpu_average--1-.jpg&amp;#38;w=750&amp;#38;q=100&quot; alt=&quot;A Grafana dashboard showing average CPU utilization for a ClickHouse cluster over the span of 12 hours. Average utilization was well below 20%.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;773&quot; width=&quot;773&quot;&gt;
&lt;small&gt;The CPU was just chillin’ at &amp;lt;20%.&lt;/small&gt;
&lt;p&gt;We spent days digging into different metrics, checking query performance, assessing required resources, tweaking cloud provider settings, and many other things. We had many more hypotheses than we had real solutions. Maybe we could find and resolve some hidden bandwidth limits or improve concurrency by limiting the number of threads used per query? Everything was pointing to some kind of contention, but we couldn’t see it. &lt;em&gt;“We must be missing something”&lt;/em&gt; was a common refrain in those days.&lt;/p&gt;
&lt;p&gt;As engineers, we would have loved to dive deep into this issue, understand what was happening, and develop some novel fix. But we have to balance our itch to solve problems with some pragmatism; we still have a &lt;a href=&quot;https://www.tinybird.co/product&quot;&gt;real-time data platform&lt;/a&gt; to build, customers to support, and features to release. So we made a quick fix, slightly changing the cluster setup and isolating some workloads so we could move on to the next GitLab issue.&lt;/p&gt;
&lt;h4 id=&quot;some-context-about-tinybird-and-clickhouse&quot;&gt;Some context about Tinybird and ClickHouse&lt;/h4&gt;
&lt;p&gt;Tinybird is a real-time platform for data and engineering teams. We give our customers the power to unify data from many source systems (including databases, data warehouses, and streaming data), develop &lt;a href=&quot;https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide&quot;&gt;real-time analytics&lt;/a&gt; with SQL, and publish their queries as low-latency, scalable APIs.&lt;/p&gt;
&lt;p&gt;We are focused primarily on streaming ingestion and real-time use cases; our clients can ingest millions of events and make thousands of concurrent API requests per second. Under the hood, we use &lt;a href=&quot;https://github.com/clickhouse/clickhouse&quot;&gt;ClickHouse&lt;/a&gt; as our primary analytical database and real-time SQL “engine”. ClickHouse is an incredibly powerful open-source database, and we only use a subset of its features. Still, with the scale we support and our customers’ latency and throughput requirements, we often push it to the limit.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;ClickHouse is an incredibly powerful open-source database, and we only use a subset of its features. Still, with the scale we support and our customers’ latency and throughput requirements, we often push it to the limit.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As such, we frequently contribute to the ClickHouse project by developing performance improvements that help us continue to support our customers’ scale.&lt;/p&gt;
&lt;h4 id=&quot;finding-the-root-cause&quot;&gt;Finding the root cause&lt;/h4&gt;
&lt;p&gt;Despite our quick fix, this resource contention became a recurring issue. We’ve been able to dodge it by deploying more quick fixes, but that changed a few weeks ago. To be able to support a particular use case for one of our clients, we were approaching a scale that made it impossible to ignore. It was time to invest in the time and resources to fix it for good.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We had been able to avoid fixing this problem for almost a year, but certain customers began approaching a scale that made it impossible to ignore.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Every time we had faced this issue before and explored various metrics and profile events, we ended up in the same dead end. We saw lock-related profile events such as &lt;code&gt;RWLock*&lt;/code&gt; and discarded them since we knew they were not the root cause but merely a symptom.&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=%2Fclickhouse-lock-contention%2Fcontext_lock_events.jpg&amp;#38;w=1200&amp;#38;q=100&quot; alt=&quot;A Grafana dashboard showing lock-related profile events. A small spike of ContextLockWait events is evident.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1293&quot; width=&quot;1293&quot;&gt;
&lt;small&gt;When examining lock-related profile events, we noticed a small spike in &lt;code&gt;ContextLockWait&lt;/code&gt; events&lt;/small&gt;
&lt;p&gt;This time was different. We noticed a small peak of &lt;code&gt;ContextLockWait&lt;/code&gt; events. After all this time surveying typical server metrics for clues and coming up empty, we finally had something promising!&lt;/p&gt;
&lt;p&gt;Unfortunately, we didn’t have a good way to measure its impact on performance. Unlike &lt;code&gt;RWLock*&lt;/code&gt; events, which have &lt;code&gt;*WaitMilliseconds&lt;/code&gt; properties that indicate how long they’ve been waiting to acquire the lock, &lt;code&gt;ContextLockWait&lt;/code&gt; events don’t come with a time-based equivalent that would have let us measure the time waiting for those locks. So we were flying blind, with no way to understand the consequences of these events.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In ClickHouse, &lt;code&gt;ContextLockWait&lt;/code&gt; events didn’t come with an equivalent time-based wait property, like &lt;code&gt;RWLockWaitMilliseconds&lt;/code&gt;, that would have allowed us to measure the performance impact of these events.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We decided to dump all threads during the incident and see if we could extract something from there. After a first look at the dump, it was clear that &lt;code&gt;ContextLock&lt;/code&gt; was, at least, one of the issues.&lt;/p&gt;
&lt;p&gt;So, we built a small reproducer to report the issue to the ClickHouse repository to help the contributors with the fix. While working on the reproducer, we sent the first improvement upstream, &lt;a href=&quot;https://github.com/ClickHouse/ClickHouse/pull/55029&quot;&gt;adding a profile event to report Context Lock waiting time&lt;/a&gt;. With this metric available, it became easy to find a query that, given sufficient concurrency, would cause the same contention we were seeing in our clusters. This moved our first roadblock out of the way.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We built a small reproducer and pushed a change to add a waiting time profile to &lt;code&gt;ContextLock&lt;/code&gt; events in ClickHouse.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;With the reproducer in place and a way to measure the impact of the contention, it was just a matter of digging into the code and figuring out how to reduce contention and improve performance. After quite a big refactor spearheaded by my colleague &lt;a href=&quot;https://maksimkita.com/&quot;&gt;Maksim Kita&lt;/a&gt;, we &lt;a href=&quot;https://github.com/ClickHouse/ClickHouse/pull/55121&quot;&gt;managed to remove lock contention&lt;/a&gt;, reducing the impact on our synthetic example by an order of magnitude.&lt;/p&gt;
&lt;p&gt;Here’s how we did it:&lt;/p&gt;
&lt;h3 id=&quot;context-refactoring&quot;&gt;Context refactoring&lt;/h3&gt;
&lt;p&gt;Here’s the basic architecture of Context in ClickHouse before the refactor:&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=%2Fclickhouse-lock-contention%2Fcontextsharedpart.png&amp;#38;w=2048&amp;#38;q=100&quot; alt=&quot;A basic architecture diagram of ClickHouse Contexts, showing how ContextSharedPart and Context share a single global mutex.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;2000&quot; width=&quot;2000&quot;&gt;
&lt;small&gt;Before we addressed this issue, &lt;code&gt;ContextSharedPart&lt;/code&gt; and &lt;code&gt;Context&lt;/code&gt; instances all shared a single global mutex.&lt;/small&gt;
&lt;p&gt;In ClickHouse, &lt;code&gt;ContextSharedPart&lt;/code&gt; is responsible for storing global shared objects that are shared between all sessions and queries, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Thread pools&lt;/li&gt;
&lt;li&gt;Server paths&lt;/li&gt;
&lt;li&gt;Global trackers&lt;/li&gt;
&lt;li&gt;Clusters information&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;ContextSharedPart&lt;/code&gt; also provides a lot of useful methods for working with these objects with synchronization.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Context&lt;/code&gt; is responsible for storing query- or session-specific objects, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Per query settings&lt;/li&gt;
&lt;li&gt;Per query caches&lt;/li&gt;
&lt;li&gt;Current database&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;Context&lt;/code&gt; also provides a lot of methods for working with these objects, and also it uses &lt;code&gt;ContextSharedPart&lt;/code&gt; to provide some of that functionality.&lt;/p&gt;
&lt;p&gt;During query execution, ClickHouse can create a lot of &lt;code&gt;Contexts&lt;/code&gt; because each subquery in ClickHouse can have unique settings. For example:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; id, &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;value&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;  SELECT&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; id, &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;value&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;  FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; test_table &lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  SETTINGS max_threads = &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;16&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;WHERE&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; id &amp;gt; &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;10&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;SETTINGS max_threads = &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this example, a nested subquery will have &lt;code&gt;max_threads = 16&lt;/code&gt;, requiring its own &lt;code&gt;Context&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The problem was that a single mutex was used for most of the synchronization between &lt;code&gt;Context&lt;/code&gt; and &lt;code&gt;ContextSharedPart&lt;/code&gt;, even when we worked with objects local to &lt;code&gt;Context&lt;/code&gt;. A large number of low-latency, concurrent queries with many subqueries will create a lot of &lt;code&gt;Contexts&lt;/code&gt; per query, and the problem becomes even bigger.&lt;/p&gt;
&lt;p&gt;We did a big refactoring, replacing a single global mutex with two &lt;a href=&quot;https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock&quot;&gt;read-write mutexes&lt;/a&gt;. One global read-write mutex for &lt;code&gt;ContextSharedPart&lt;/code&gt; and one local read-write mutex for each &lt;code&gt;Context&lt;/code&gt;. We used read-write mutexes because most of the time we do a lot of concurrent reads (for example read settings or some path) and rarely concurrent writes.&lt;/p&gt;
&lt;p&gt;In many places, we completely got rid of synchronization where it was used for initialization and used &lt;code&gt;call_once&lt;/code&gt; for objects that are initialized only once.&lt;/p&gt;
&lt;h4 id=&quot;the-context-architecture-after-our-refactor&quot;&gt;The Context architecture after our refactor&lt;/h4&gt;
&lt;p&gt;Here’s the way things looked after our refactor:&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=%2Fclickhouse-lock-contention%2Fcontextsharedpartrefactor.png&amp;#38;w=2048&amp;#38;q=100&quot; alt=&quot;A diagram showing refactored ClickHouse contexts, which each Context getting its own individual read-write mutex.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;2000&quot; width=&quot;2000&quot;&gt;
&lt;small&gt;After refactoring, each &lt;code&gt;Context&lt;/code&gt; had its own read-write mutex.&lt;/small&gt;
&lt;p&gt;We also added &lt;a href=&quot;https://clang.llvm.org/docs/ThreadSafetyAnalysis.html&quot;&gt;clang ThreadSafetyAnalysis (TSA) annotations&lt;/a&gt;, so we can be sure that our refactoring does not introduce race-conditions or deadlocks.&lt;/p&gt;
&lt;h3 id=&quot;bringing-the-cpu-to-100&quot;&gt;Bringing the CPU to 100%&lt;/h3&gt;
&lt;p&gt;Once both changes were accepted and merged to ClickHouse master we decided to run some tests in a production-like environment. The main objective was to see if we could reproduce the issues and assess the impact of the fix by simulating our clients’ workload on two different ClickHouse versions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;23.9: Contains the metric to measure the impact&lt;/li&gt;
&lt;li&gt;master: Contains the &lt;code&gt;ContextLock&lt;/code&gt; fix&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, we recreated the databases and tables, extracted 1,000 example queries from the last seven days, and ran them continuously for a few minutes using &lt;code&gt;clickhouse-benchmark&lt;/code&gt; with enough concurrency:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;clickhouse&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; benchmark&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; -r&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; --ignore-error&lt;/span&gt;&lt;span style=&quot;color:#D7BA7D&quot;&gt; \&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    --concurrency=500&lt;/span&gt;&lt;span style=&quot;color:#D7BA7D&quot;&gt; \&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    --timelimit&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 600&lt;/span&gt;&lt;span style=&quot;color:#D7BA7D&quot;&gt; \&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    --max_execution_time=0&lt;/span&gt;&lt;span style=&quot;color:#D7BA7D&quot;&gt; \&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    --connect_timeout=20&lt;/span&gt;&lt;span style=&quot;color:#D7BA7D&quot;&gt; \&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    --cumulative&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; &amp;lt; &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;queries.txt&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id=&quot;results-with-239&quot;&gt;Results with 23.9&lt;/h4&gt;
&lt;p&gt;Here are the results we saw on 23.9, where we implemented a way to monitor the impact of the Context lock contention:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;Stopping&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; launch&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; of&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; queries.&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; Requested&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; time&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; limit&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 600&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; seconds&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; is&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; exhausted.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;Queries&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; executed:&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 122933&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (12293.300%).&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;localhost:9000,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; queries:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 122084,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; errors:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 849,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; QPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 201.059,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; RPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 108762.836,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; MiB/s:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 4.354,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; result&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; RPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 3726.263,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; result&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; MiB/s:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 0.137.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;0.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;      0.004&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;10.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.353&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;20.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.620&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;30.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.732&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;40.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.834&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;50.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.997&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;60.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     1.386&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;70.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     2.594&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;80.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     4.559&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;90.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     5.195&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;95.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     5.780&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     18.171&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.900%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     73.734&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.990%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     75.998&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;┌─version&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;()───┬──────c─┬───d─┬───────w─┬─────max_w─┐&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;│&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 23.9.1.1854&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 122937&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 948&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 934.609&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 75545.504&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;└─────────────┴────────┴─────┴─────────┴───────────┘&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;img src=&quot;/_vercel/image?url=%2Fclickhouse-lock-contention%2F23_9--1-.jpg&amp;#38;w=1080&amp;#38;q=100&quot; alt=&quot;Two Grafana dashboards showing CPU utilization and load on a ClickHouse cluster&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1064&quot; width=&quot;1064&quot;&gt;
&lt;small&gt;Baseline results on 23.9&lt;/small&gt;
&lt;p&gt;From these results, you can see that, as a baseline, we were able to achieve these results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~200 QPS&lt;/li&gt;
&lt;li&gt;CPU utilization of only ~20%.&lt;/li&gt;
&lt;li&gt;Half of queries took at least 1s&lt;/li&gt;
&lt;li&gt;Slowest queries took ~75s&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;results-with-master&quot;&gt;Results with master&lt;/h4&gt;
&lt;p&gt;Here are the results we saw on master, where we had refactored Contexts to attempt to resolve our lock contention:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;Stopping&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; launch&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; of&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; queries.&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; Requested&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; time&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; limit&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 600&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; seconds&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; is&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; exhausted.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;Queries&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; executed:&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 398845&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (39884.500%).&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;localhost:9000,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; queries:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 395940,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; errors:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 2905,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; QPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 659.062,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; RPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 660165.301,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; MiB/s:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 11.490,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; result&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; RPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 12228.846,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; result&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; MiB/s:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 0.451.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;0.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;      0.001&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;10.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.150&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;20.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.320&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;30.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.442&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;40.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.574&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;50.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.682&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;60.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.783&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;70.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     0.902&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;80.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     1.021&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;90.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     1.211&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;95.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     1.408&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     2.050&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.900%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     4.103&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.990%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;     5.959&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;┌─version&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;()───┬──────c─┬───d─┬─w─┬─max_w─┐&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;│&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 23.10.1.131&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 398847&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 680&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 0&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 28.09&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;└─────────────┴────────┴─────┴───┴───────┘&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;img src=&quot;/_vercel/image?url=%2Fclickhouse-lock-contention%2Fmaster.jpg&amp;#38;w=1200&amp;#38;q=100&quot; alt=&quot;Two Grafana dashboards showing performance improvements after Context refactoring in ClickHouse&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1349&quot; width=&quot;1349&quot;&gt;
&lt;small&gt;With the fixes and at &lt;code&gt;concurrency=500&lt;/code&gt; we were able to 3x performance.&lt;/small&gt;
&lt;p&gt;With the refactor, we managed these results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~600 QPS (3x better)&lt;/li&gt;
&lt;li&gt;CPU utilization to ~60% (3x better)&lt;/li&gt;
&lt;li&gt;Median query time to ~0.6s (2x better)&lt;/li&gt;
&lt;li&gt;Slowest queries to ~6s (12x better)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since tests with master were successful, we added more concurrency to &lt;code&gt;clickhouse-benchmark&lt;/code&gt; (1000 instead of 500) to see what would happen. Here are the results:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;Stopping&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; launch&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; of&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; queries.&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; Requested&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; time&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; limit&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 600&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; seconds&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; is&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; exhausted.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;Queries&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; executed:&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 581870&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (58187.000%).&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;localhost:9000,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; queries:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 577893,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; errors:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 3977,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; QPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 961.997,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; RPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 1896173.448,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; MiB/s:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 16.859,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; result&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; RPS:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 17730.009,&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; result&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; MiB/s:&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; 0.645.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;0.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.001&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;10.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.011&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;20.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.015&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;30.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.020&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;40.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.025&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;50.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.033&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;60.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.047&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;70.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.088&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;80.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.209&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;90.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.266&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;95.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.306&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.000%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		0.552&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.900%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		3.259&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;99.990%&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;		4.153&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; sec.&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;┌─version&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;()───┬──────c─┬──d─┬─w─┬──max_w─┐&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;│&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 23.10.1.131&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 581875&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 32&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 0&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 18.286&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;└─────────────┴────────┴────┴───┴────────┘&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;img src=&quot;/_vercel/image?url=%2Fclickhouse-lock-contention%2Fmaster_1000.jpeg&amp;#38;w=1080&amp;#38;q=100&quot; alt=&quot;Two Grafana dashboards showing a big performance improvement on ClickHouse.&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1088&quot; width=&quot;1088&quot;&gt;
&lt;small&gt;Boom. ~100% CPU utilization and at least 5x performance boost!&lt;/small&gt;
&lt;p&gt;These results are incredible:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~1,000 QPS (5x better)&lt;/li&gt;
&lt;li&gt;~100% CPU utilization (5x better)&lt;/li&gt;
&lt;li&gt;0.033s median response time (30x better)&lt;/li&gt;
&lt;li&gt;Slowest queries ~4s (20x better)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, these are just tests, and we don’t expect to get a 5x improvement in production, since we certainly will find other bottlenecks with I/O, CPU, and/or memory. But, we have at least removed a contention that had stumped us for a long time. Even if all we got was a 1.5x bump in performance, this is a massive improvement for us, our infrastructure, and the customers who use it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We don’t expect to get a 5x boost in production, but even if it’s just 1.5x, that’s still a massive improvement for us and our customers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;With the ClickHouse 23.10 official release (&lt;a href=&quot;https://clickhouse.com/company/events/v23-10-community-release-call&quot;&gt;slated for November 2nd&lt;/a&gt;), we will update some of our most performance-demanding clients, and see the real impact of these changes.&lt;/p&gt;
&lt;p&gt;We expect good things.&lt;/p&gt;</content:encoded></item><item><title>SQL Is All You Need</title><link>https://jordivillar.com/blog/sql-is-all-you-need/</link><guid isPermaLink="true">https://jordivillar.com/blog/sql-is-all-you-need/</guid><pubDate>Mon, 10 Apr 2023 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Across all my professional experiences, SQL has been always an important tool. Being it wrestling with Redshift to get historical data to build some models, writing algorithms on Spark to get data from a data lake, or even trying to get dimensional data out of OracleDB.&lt;/p&gt;
&lt;p&gt;I strongly believe that our lives would be way easier if SQL was everything (or almost) we needed when it comes to data. In this article, I want to play with the idea of building a machine learning algorithm by just using SQL and &lt;a href=&quot;https://github.com/ClickHouse/ClickHouse&quot;&gt;ClickHouse&lt;/a&gt;. Hence the title, which is a clear reference to the &lt;a href=&quot;https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf&quot;&gt;Attention Is All You Need&lt;/a&gt; paper.&lt;/p&gt;
&lt;h3 id=&quot;a-bit-of-context&quot;&gt;A bit of context&lt;/h3&gt;
&lt;p&gt;This piece has been sitting on the back of my head for a while. Since I joined &lt;a href=&quot;https://www.tinybird.co&quot;&gt;Tinybird&lt;/a&gt; one of my responsibilities it’s been to understand the product we are building and help clients to get as much as they can from it. Once I understood ClickHouse’s potential I haven’t managed to prevent myself from thinking about solving problems I had in previous companies.&lt;/p&gt;
&lt;p&gt;Although I’m now working as Data Engineer, I’ve spent a vast amount of my career working in roles strongly related to Data Science, and Machine Learning. The two most relevant experiences for this article:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Working doing deep learning in a small 3D medical company. There, I heavily contributed in &lt;a href=&quot;https://openaccess.thecvf.com/content_ICCVW_2019/papers/GMDL/Ramon_Hyperparameter-Free_Losses_for_Model-Based_Monocular_Reconstruction_ICCVW_2019_paper.pdf&quot;&gt;a couple&lt;/a&gt; &lt;a href=&quot;https://openaccess.thecvf.com/content_ICCVW_2019/papers/3DFAW/Ramon_Multi-View_3D_Face_Reconstruction_in_the_Wild_Using_Siamese_Networks_ICCVW_2019_paper.pdf&quot;&gt;of papers&lt;/a&gt;, despite not appearing in them as an author (that was academia and everybody knows how things are played in there, this is another topic that probably deserves a full article for itself)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Led the data science efforts in an ad-tech company. Where I implemented a custom version of an online learning algorithm known as &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf&quot;&gt;FTRL-Proximal&lt;/a&gt; in Spark&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The common pain in both experiences was always to find a cheap and fast way to store the data that is compatible with real-time inference and also supports experimenting, iterating, and training different models. Being a machine learning practitioner nowadays requires you to use a myriad of tools such as feature stores, training platforms, infrastructure, streams, etc, to be able to train your models and provide batches of predictions.&lt;/p&gt;
&lt;p&gt;Now imagine a system being used by a retail company to decide which product to show you based on the probability of you buying it, while they send events to the algorithm to tell it if a shown product has been bought or not. The list of tools needed would keep getting longer.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;💭 ClickHouse already has built-in methods implementing linear regression (&lt;code&gt;stochasticLinearRegression&lt;/code&gt;), and logistic regression (&lt;code&gt;stochasticLogisticRegression&lt;/code&gt;). Those implementations are a bit rigid but still can be used to build machine learning models without leaving the database in a batch environment.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So let’s play a bit with the idea of building an end-to-end model inside ClickHouse to try to remove all those tools while covering all the described requirements.&lt;/p&gt;
&lt;p&gt;To make it easier to follow and reproduce we are going to use some data from the &lt;a href=&quot;https://archive.ics.uci.edu/dataset/20/census+income&quot;&gt;Census Income Data Set&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; pandas &lt;/span&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; pd&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;dataset_url = &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data&amp;#39;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;column_names = [&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;age&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;workclass&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;fnlwgt&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;education&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;education-num&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;marital-status&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;occupation&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;relationship&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;race&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;sex&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;capital-gain&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;capital-loss&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;hours-per-week&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;native-country&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;df = pd.read_csv(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    dataset_url,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    names&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; = column_names&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;df[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;] = (df[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;] == &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39; &amp;gt;50K&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;).astype(&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;int&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;df.tail()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;       age      workclass  fnlwgt    education  education-num  ...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32556&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;   27&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Private  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;257302&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;   Assoc-acdm             &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;12&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  ...  \&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32557&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;   40&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Private  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;154374&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;      HS-grad              &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;9&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  ...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32558&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;   58&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Private  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;151910&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;      HS-grad              &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;9&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  ...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32559&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;   22&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        Private  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;201490&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;      HS-grad              &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;9&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  ...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32560&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;   52&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;   Self-emp-inc  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;287927&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;      HS-grad              &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;9&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;  ...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;      capital-gain capital-loss hours-per-week  native-country  y&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32556&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;             38&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;   United-States  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32557&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;             40&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;   United-States  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32558&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;             40&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;   United-States  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32559&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;             20&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;   United-States  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;32560&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;        15024&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            0&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;             40&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;   United-States  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The dataset illustrates a classification problem. It’s quite interesting because it contains continuous and categorical data, something that is not common to see when following implementation examples but that almost every real-life approach will have.&lt;/p&gt;
&lt;p&gt;Let’s start by implementing online gradient descent.&lt;/p&gt;
&lt;h3 id=&quot;online-gradient-descent&quot;&gt;Online gradient descent&lt;/h3&gt;
&lt;p&gt;Online gradient descent is essentially the same as stochastic gradient descent (SGD), the online word is commonly added when applied over stream data. So, instead of randomly selecting a sample from the training set at each iteration, a new iteration is performed every time a new sample is received.&lt;/p&gt;
&lt;p&gt;SGD algorithm is easy to describe in pseudo-code:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Initialize the weights 𝑤&lt;/li&gt;
&lt;li&gt;Iterate over all samples&lt;/li&gt;
&lt;li&gt;For each sample 𝒾 update weights as:&lt;/li&gt;
&lt;/ol&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fsgd.CIWT_0zq.png&amp;#38;w=640&amp;#38;q=100&quot; alt=&quot;SGD algorithm&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;238&quot; width=&quot;238&quot; height=&quot;125&quot;&gt;
&lt;p&gt;SGD is an iterative method, so having the values of the previous weights, it’s possible to generate the next ones with simple operations. Repeat the process over hundreds/thousands of events and, in theory, it will converge.&lt;/p&gt;
&lt;p&gt;Being a recursive algorithm it could probably be implemented using window functions with some recursivity, or recursive CTEs. But leaving apart that they are not supported in ClickHouse, we are interested in a stateful approach (we need the weights to be stored somewhere), and update them every time we receive a new sample. The idea is to use basic database tables and &lt;a href=&quot;https://clickhouse.com/docs/en/guides/developer/cascading-materialized-views&quot;&gt;Materialized Views&lt;/a&gt;, which are executed on each insert, computing the weights offsets that will later be aggregated. This will allow us to update them on the fly.&lt;/p&gt;
&lt;p&gt;First of all, let’s build a simple model using &lt;code&gt;sklearn&lt;/code&gt; so we can use it as a reference. It will also allow us to validate results while writing our version. Since this is a classification problem, we’ll use the &lt;code&gt;SGDClassifier&lt;/code&gt; with some special settings to simulate online gradient descent. For now, we are going to use only the continuous features and the first 1000 samples, as it will make things easier. Once everything works as expected and things are validated, we will add categorical features, and use the entire dataset if we want.&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;from&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sklearn &lt;/span&gt;&lt;span style=&quot;color:#C586C0&quot;&gt;import&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; linear_model&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;model = linear_model.SGDClassifier(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    loss&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;log_loss&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    penalty&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;None&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    fit_intercept&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;True&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    max_iter&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    shuffle&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;False&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    learning_rate&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;constant&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#9CDCFE&quot;&gt;    eta0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0.01&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;X = df[[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;age&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;fnlwgt&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;education-num&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;capital-gain&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;capital-loss&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;hours-per-week&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;y = df[[&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;y&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;]]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;model = model.fit(X[:&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1000&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;], y[:&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1000&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(model.intercept_[&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;])&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#4EC9B0&quot;&gt;dict&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;zip&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(X.columns, model.coef_[&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;])))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;python&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0.10499999999999998&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;{&lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;age&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;16.665000000000006&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; &amp;#39;capital-gain&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;2844.8600000000006&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; &amp;#39;capital-loss&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;253.21000000000004&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; &amp;#39;education-num&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;4.035000000000002&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; &amp;#39;fnlwgt&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;2824.270000000001&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt; &amp;#39;hours-per-week&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;14.259999999999996&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And this is the SQL implementation:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;DROP&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; DATABASE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; IF&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; EXISTS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd SYNC;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; DATABASE&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; sgd&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; TABLE&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; sgd&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.samples (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    age UInt8,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    workclass LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    fnlwgt UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    education LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    educationNum UInt8,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    maritalStatus LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    occupation LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    relationship LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    race LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    sex LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    capitalGain UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    capitalLoss UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    hoursPerWeek UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    nativeCountry LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    income LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    dt &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;DateTime&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; DEFAULT&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; now&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;ENGINE = MergeTree&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ORDER BY&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; dt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; TABLE&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; sgd&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.weights (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    column String,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    w Float64&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;ENGINE = SummingMergeTree&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ORDER BY&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; MATERIALIZED VIEW sgd.update_weights &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;TO&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    WITH&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; sum&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(x * w0) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;OVER&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; () &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; wTx&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        - &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;01&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; * (&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/(&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;+&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;exp&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(-wTx)) - y) * x &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; w&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            value&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; x,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            y&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        FROM&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                income = &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;&amp;gt;50K&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; y,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                [&amp;#39;intercept&amp;#39;, &amp;#39;age&amp;#39;, &amp;#39;fnlwgt&amp;#39;, &amp;#39;educationNum&amp;#39;, &amp;#39;capitalGain&amp;#39;, &amp;#39;capitalLoss&amp;#39;, &amp;#39;hoursPerWeek&amp;#39;] keys,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                [1, age, fnlwgt, educationNum, capitalGain, capitalLoss, hoursPerWeek] &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;values&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.samples&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        ARRAY&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; JOIN&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            keys &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            values&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; value&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    ANY &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;LEFT JOIN&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        SELECT&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column, w w0 &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights FINAL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    ) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;USING&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After ingesting the first 1000 samples 1-by-1 into &lt;code&gt;sgd.samples&lt;/code&gt; this is the result we are getting:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    w&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;FINAL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ORDER BY&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ASC&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;┌─column───────┬───────────────────w─┐&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ age          │  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;16&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;665000000000006&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ capitalGain  │  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;2844&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;8600000000006&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ capitalLoss  │  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;253&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;21000000000004&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ educationNum │   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;4&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;035000000000002&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ fnlwgt       │   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;2824&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;270000000001&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ hoursPerWeek │  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;14&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;259999999999996&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ intercept    │ &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;10499999999999998&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;└──────────────┴─────────────────────┘&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Everything is looking fine, the results are the same as we got by running sklearn! Honestly, I was expecting some minor differences here but I’m not going to complain.&lt;/p&gt;
&lt;p&gt;At this point, if you are familiar with ClickHouse you’ll realize that ingesting row by row is suboptimal. The good news is that it’s usually also suboptimal for gradient descent, and there are already solutions out there.&lt;/p&gt;
&lt;h3 id=&quot;mini-batches&quot;&gt;Mini batches&lt;/h3&gt;
&lt;p&gt;Stochastic gradient descent with mini-batches is essentially the same but instead of going sample by sample, a batch of N samples is processed in each step. The algorithm described in pseudo-code is basically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Initialize the weights 𝑤&lt;/li&gt;
&lt;li&gt;Iterate over all samples in batches of size b:&lt;/li&gt;
&lt;li&gt;For each batch update weights as:&lt;/li&gt;
&lt;/ol&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fsgd_batch.BBzUS08V.png&amp;#38;w=640&amp;#38;q=100&quot; alt=&quot;SGD mini batches algorithm&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;258&quot; width=&quot;258&quot; height=&quot;155&quot;&gt;
&lt;p&gt;So, in plain English, exactly the same as before but the update step is the average over the entire mini-batch. If you are interested in reading more about the differences and when to use each one there are good resources out there, listing a few of them:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://medium.com/intuitionmath/difference-between-batch-gradient-descent-and-stochastic-gradient-descent-1187f1291aa1&quot;&gt;The difference between Batch Gradient Descent and Stochastic Gradient Descent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.baeldung.com/cs/gradient-stochastic-and-mini-batch&quot;&gt;Differences Between Gradient, Stochastic and Mini Batch Gradient Descent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=4qJaSmvhxi8&quot;&gt;Mini Batch Gradient Descent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=-_4Zi8fCZO4&quot;&gt;Understanding Mini-Batch Gradient Descent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can add mini-batches by slightly modifying the previous SGD implementation:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;DROP&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; DATABASE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; IF&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; EXISTS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd SYNC;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; DATABASE&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; sgd&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; TABLE&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; sgd&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.samples (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    age UInt8,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    workclass LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    fnlwgt UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    education LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    educationNum UInt8,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    maritalStatus LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    occupation LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    relationship LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    race LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    sex LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    capitalGain UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    capitalLoss UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    hoursPerWeek UInt32,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    nativeCountry LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    income LowCardinality(String),&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    dt &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;DateTime&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; DEFAULT&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; now&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;()&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;ENGINE = MergeTree&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ORDER BY&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; dt;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; TABLE&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt; sgd&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.weights (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    column String,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    w Float64&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;ENGINE = SummingMergeTree&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ORDER BY&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;CREATE&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; MATERIALIZED VIEW sgd.update_weights &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;TO&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    WITH&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;        sum&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(x * w0) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;OVER&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;partition&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; by&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; dt) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; wTx,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        countDistinct(dt) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;OVER&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; () &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; b&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        - (&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;01&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/b) * (&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/(&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;+&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;exp&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(-wTx)) - y) * x &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;as&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; w&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            value&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; x,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            y&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        FROM&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                income = &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;&amp;gt;50K&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; y,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                [&amp;#39;intercept&amp;#39;, &amp;#39;age&amp;#39;, &amp;#39;fnlwgt&amp;#39;, &amp;#39;educationNum&amp;#39;, &amp;#39;capitalGain&amp;#39;, &amp;#39;capitalLoss&amp;#39;, &amp;#39;hoursPerWeek&amp;#39;] keys,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                [1, age, fnlwgt, educationNum, capitalGain, capitalLoss, hoursPerWeek] &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;values&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.samples&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        ARRAY&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; JOIN&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            keys &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            values&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; value&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    ANY &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;LEFT JOIN&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        SELECT&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column, w w0 &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights FINAL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    ) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;USING&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, after running it with mini-batch = 5, meaning ingesting 5 rows each time, we get the following weights:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    w&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;FINAL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ORDER BY&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ASC&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;Query id: &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;28992088&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-d561-&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;4835&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;9504&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-834b80816159&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;┌─column───────┬───────────────────w─┐&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ age          │  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;15&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;909999999999995&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ capitalGain  │  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;2868&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;7700000000013&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ capitalLoss  │              &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;165&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;49&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ educationNum │   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;4&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;204999999999998&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ fnlwgt       │    &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;341&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;324999999998&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ hoursPerWeek │  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;13&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;735000000000003&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ intercept    │ &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;04499999999999994&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;└──────────────┴─────────────────────┘&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that the SQL code is also prepared to receive a flexible mini-batch size, so an arbitrary number of rows can be ingested every time. This will be extremely useful in case you are receiving data from a stream that is buffering events together, and flushing every certain seconds.&lt;/p&gt;
&lt;p&gt;The only missing step now is to add categorical features. This will require using one-hot encoder in &lt;code&gt;sklearn&lt;/code&gt;, but in our code is just a matter of changing the subquery that is pivoting the data, from:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    value&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; x,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    y&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        income = &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;&amp;gt;50K&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; y,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        [&amp;#39;intercept&amp;#39;, &amp;#39;age&amp;#39;, &amp;#39;fnlwgt&amp;#39;, &amp;#39;educationNum&amp;#39;, &amp;#39;capitalGain&amp;#39;, &amp;#39;capitalLoss&amp;#39;, &amp;#39;hoursPerWeek&amp;#39;] keys,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        [1, age, fnlwgt, educationNum, capitalGain, capitalLoss, hoursPerWeek] &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;values&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.samples&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ARRAY&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; JOIN&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    keys &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    values&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; value&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To something like this:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    value&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; x,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    y&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        income = &lt;/span&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;&amp;#39;&amp;gt;50K&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; y,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;intercept&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;age&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;fnlwgt&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;educationNum&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;capitalGain&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;capitalLoss&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;hoursPerWeek&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;workclass:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || workclass,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;education:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || education,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;maritalStatus:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || maritalStatus,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;occupation:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || occupation,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;relationship:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || relationship,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;race:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || race,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;sex:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || sex,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;            &amp;#39;nativeCountry:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || nativeCountry&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        ] keys,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            age,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            fnlwgt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            educationNum,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            capitalGain,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            capitalLoss,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            hoursPerWeek,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;            1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        ] &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;values&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.samples&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ARRAY&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; JOIN&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    keys &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    values&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; value&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After repeating the process weights will be looking as&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    w&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;FINAL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ORDER BY&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ASC&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;┌─column──────────────────────────────┬──────────────────────w─┐&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ age                                 │      &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;6&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;489999999999998&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ capitalGain                         │      &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;926&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0200000000001&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ capitalLoss                         │                 &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;107&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;03&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ education:10th                      │                  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;02&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ education:11th                      │                 -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;065&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ education:12th                      │                   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;01&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ education:Masters                   │                   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;05&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ education:Prof-school               │                   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;03&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ education:Some-college              │                  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;035&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ educationNum                        │      &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;259999999999999&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ fnlwgt                              │    -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;361&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;92500000000473&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ hoursPerWeek                        │      &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;3&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;579999999999996&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ intercept                           │ -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0050000000000000565&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ maritalStatus:Divorced              │   -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;09500000000000003&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ maritalStatus:Married-AF-spouse     │                  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;01&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ maritalStatus:Widowed               │                  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;01&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ nativeCountry:?                     │   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;005000000000000001&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ nativeCountry:Cambodia              │                   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;01&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ nativeCountry:Thailand              │                   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;01&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ nativeCountry:United-States         │  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;009999999999999943&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ occupation:?                        │                  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;02&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ occupation:Adm-clerical             │                  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;03&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;...&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ workclass:?                         │                  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;02&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ workclass:Federal-gov               │   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;019999999999999997&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ workclass:&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;Local&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-gov                 │   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;009999999999999998&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ workclass:&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;Private&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                   │                  -&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;11&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ workclass:&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;Self&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-emp-inc              │                   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;05&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ workclass:&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;Self&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-emp-&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;not&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-inc          │                  &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;025&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;│ workclass:&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;State&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;-gov                 │   &lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;019999999999999997&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; │&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;└─────────────────────────────────────┴────────────────────────┘&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, to run a prediction we just have to run a query to retrieve the weights and apply the simple computations:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;WITH&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;    sum&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(x * w0) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;OVER&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;partition&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; by&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; dt) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; wTx&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;    1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;/(&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;+&lt;/span&gt;&lt;span style=&quot;color:#DCDCAA&quot;&gt;exp&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;(-wTx)) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; p&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        value&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; x&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    FROM&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;            now&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;() dt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;intercept&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;age&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;fnlwgt&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;educationNum&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;capitalGain&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;capitalLoss&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;hoursPerWeek&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;workclass:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || workclass,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;education:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || education,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;maritalStatus:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || maritalStatus,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;occupation:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || occupation,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;relationship:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || relationship,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;race:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || race,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;sex:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || sex,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;nativeCountry:&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; || nativeCountry&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            ] keys,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                age,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                fnlwgt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                educationNum,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                capitalGain,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                capitalLoss,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;                hoursPerWeek,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                1&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;            ] &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;values&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;             SELECT&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                20&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; age,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;Private&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; workclass,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                148294&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; fnlwgt,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;Some-college&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; education,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                10&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; educationNum,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;Never-married&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; maritalStatus,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;Other-service&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; occupation,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;Own-child&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; relationship,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;White&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; race,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;Male&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sex,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; capitalGain,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                0&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; capitalLoss,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt;                40&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; hoursPerWeek,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#CE9178&quot;&gt;                &amp;#39;United-States&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; nativeCountry&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    ARRAY&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; JOIN&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        keys &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; key&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;        values&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; AS&lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt; value&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;        )&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;ANY &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;LEFT JOIN&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; (&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;    SELECT&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column, w w0 &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; sgd.weights FINAL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;) &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;USING&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; column&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;LIMIT&lt;/span&gt;&lt;span style=&quot;color:#B5CEA8&quot;&gt; 1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;It’s possible to implement logistic regression in SQL, and thanks to ClickHouse and the Materialized Views we have managed to implement an online gradient descent algorithm capable of predicting events in real-time. This opens the door to a lot of possibilities.&lt;/p&gt;
&lt;p&gt;Going back to the original example, the one about a retail company that wants to decide in real-time which product to show to a client based on the probability of buying them. We’d have to &lt;strong&gt;program the described algorithm&lt;/strong&gt; in your database, &lt;strong&gt;build an easy way to ingest data&lt;/strong&gt; in real-time from the online store to instantly update your model, and then &lt;strong&gt;provide an interface to run inference&lt;/strong&gt; and return probabilities.&lt;/p&gt;
&lt;p&gt;Lucky us, we don’t have to do that much! We already implemented the algorithm, and the interfaces to ingest and query data already come with the database.&lt;/p&gt;
&lt;p&gt;Of course, the approach described here is simple, SGD is not the strongest model out there, but it’s doing its job and setting a good baseline. Perhaps you are not confident enough with SQL but I hope this post encourages others to run similar experiments, trying to get as much as they can from their day-to-day tools.&lt;/p&gt;</content:encoded></item><item><title>How I Made $350k in Financial Tournaments</title><link>https://jordivillar.com/blog/financial-tournaments/</link><guid isPermaLink="true">https://jordivillar.com/blog/financial-tournaments/</guid><pubDate>Mon, 19 Oct 2020 16:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Between 2015 and 2020 I have participated in Quantopian and &lt;a href=&quot;/numerai&quot;&gt;Numerai&lt;/a&gt; tournaments. Being involved in these tournaments has led me to a very profitable journey: I made around $350k during these years.&lt;/p&gt;
&lt;p&gt;Many people have reached out to me and asked how I did it. I decided to write this post to inspire others with this non-conventional path.&lt;/p&gt;
&lt;h3 id=&quot;selling-a-trading-algorithm&quot;&gt;Selling a Trading Algorithm&lt;/h3&gt;
&lt;p&gt;In 2015 algorithmic trading was one of my interests. This is how I came across Quantopian, a platform that provides data and tools, so anyone can develop and test trading algorithms. After learning the basics and playing around with the platform, I decided to submit an algorithm to their recently created tournament.&lt;/p&gt;
&lt;p&gt;The algorithm was pretty basic, but surprisingly it remained in the top 10 for a few weeks. I received many messages: companies suggesting joining them as a Financial Quant, people interested in my approach, and a few offers from people interested in buying the algorithm.&lt;/p&gt;
&lt;p&gt;One of those potential buyers was particularly interested, shockingly enough, even after I explained to him how simple my algorithm was. I eventually shared the code with him for free. Still, he expressed his desire to send me the amount he offered in the first instance. And that’s how I made the first $2k out of participating in ML/financial tournaments.&lt;/p&gt;
&lt;h3 id=&quot;getting-the-dream-job&quot;&gt;Getting The Dream Job&lt;/h3&gt;
&lt;p&gt;During these weeks I exchanged many emails with Apple employees, financial quants, people from Europe, Asia, or America. But the best interaction came from a guy from my town. He was building his own trading algorithm and was interested in my approach. I meet him a couple of times. Obviously, his knowledge was years ahead of where I was at that time.&lt;/p&gt;
&lt;p&gt;Those meetings were very productive, especially for me. After discussing with him about our personal and professional interests, and how difficult it was to find good jobs in our city, he introduced me to my next employer. Of course, I was not aware of it at that time. He introduced me to the CEO of an ad-tech company, and at first glance, I saw many things in common between this sector and high-frequency trading. The offered position (not publicly opened at that time) would also allow me to face and try to solve many machine learning, big data, and architectural challenges.&lt;/p&gt;
&lt;p&gt;After a few weeks and a couple of meetings, I decided that the company was an excellent fit, a place where I would continue professionally growing and developing as a data scientist and engineer.&lt;/p&gt;
&lt;p&gt;For years, we worked on building a solid platform that led us to be acquired in 2018. The experience was great and the outcome even better. Including the exit bonus, I made a big sum during this period.&lt;/p&gt;
&lt;h3 id=&quot;the-altcoins-trading-madness&quot;&gt;The Altcoins Trading Madness&lt;/h3&gt;
&lt;p&gt;In 2016, I discovered the Numerai Tournament, “the hardest data science tournament on the planet” as they claim. Numerai is a weekly global data science tournament to solve the stock market. Data scientists work with clean, regularized, and obfuscated datasets provided by Numerai to create machine-learning models to predict the stock market. Numerai builds a meta-model of the most unique and contributive predictions and uses this to command the capital in its global equity hedge fund.&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fnumerai.CexrvExV.jpg&amp;#38;w=640&amp;#38;q=100&quot; alt=&quot;Numerai classification&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;501&quot; width=&quot;501&quot; height=&quot;271&quot;&gt;
&lt;p&gt;In early 2016 I had finished a few rounds in good spots, as you can see in the image. After a few months of actively participating, I ended 2016 with around 0.5 BTC. It was cool, but it required too much of my time. I had other priorities, so I withdrew the earnings and left the tournament.&lt;/p&gt;
&lt;p&gt;Almost a year later, a post about a Numerai airdrop came across my timeline. After login back into my account, I discovered I received 1800 NMR.&lt;/p&gt;
&lt;p&gt;At that time, the amount was valued around $30k, so I initially decided to withdraw almost everything, but something changed my mind and I started buying and selling altcoins. After some back and forth, and some lucky movements, I ended accumulating a wallet valued at around $100k.&lt;/p&gt;
&lt;p&gt;2017 was arriving at its end and I took the last lucky movement, I withdrew half of that sum a few weeks before the bitcoin crash. I still hold the other half but it wasn’t easy to stick to that decision during 2018, while cryptos prices continued decreasing.&lt;/p&gt;
&lt;h3 id=&quot;the-compound-effect&quot;&gt;The Compound Effect&lt;/h3&gt;
&lt;p&gt;In July 2018, I came back to Numerai Tournament. I made some submissions but I didn’t take it seriously again until mid-2019.&lt;/p&gt;
&lt;p&gt;Since then, I have had a cumulative return of +141%. I have built an algorithm with an average weekly return of +2%, which taking into account the compounding effect it translates to +180% annualized total return.&lt;/p&gt;
&lt;h3 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h3&gt;
&lt;p&gt;I’m going to continue participating in the Numerai Tournament, if everything continues as it is now, it would provide a good income without requiring much effort. I will also continue taking small bets, experimenting with other projects, continue doing things I like, and learning in the process.&lt;/p&gt;</content:encoded></item><item><title>How to detect locks on Redshift</title><link>https://jordivillar.com/notes/redshift-locks/</link><guid isPermaLink="true">https://jordivillar.com/notes/redshift-locks/</guid><pubDate>Wed, 10 Feb 2016 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When a query or transaction acquires a &lt;strong&gt;lock on a table&lt;/strong&gt;, the lock remains for the duration of the query or transaction. Other queries or &lt;strong&gt;transactions&lt;/strong&gt; that are waiting to acquire the same lock are &lt;strong&gt;blocked&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When you take a look to &lt;strong&gt;Redshift&lt;/strong&gt; documentation they recommend you using &lt;a href=&quot;http://docs.aws.amazon.com/redshift/latest/dg/r_STV_LOCKS.html&quot;&gt;STV_LOCKS&lt;/a&gt;, which results on:&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fredshift_1.Ctt6vtJP.png&amp;#38;w=1200&amp;#38;q=100&quot; alt=&quot;STV_LOCKS results&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; inputtedWidth=&quot;1188&quot; width=&quot;1188&quot; height=&quot;117&quot;&gt;
&lt;p&gt;It seems really useful until you have a real database lock. Last month I was trying to solve a lock that was blocking lots of processes. Finally, I found a better way to locate the queries that are causing locks:&lt;/p&gt;
&lt;img src=&quot;/_vercel/image?url=_astro%2Fredshift_2.6irgbubJ.jpeg&amp;#38;w=1200&amp;#38;q=100&quot; alt=&quot;Queries causing locks&quot; loading=&quot;lazy&quot; decoding=&quot;async&quot; fetchpriority=&quot;auto&quot; width=&quot;1200&quot; height=&quot;261&quot;&gt;
&lt;p&gt;Here you have the query itself:&lt;/p&gt;
&lt;pre class=&quot;astro-code dark-plus&quot; style=&quot;background-color:#1E1E1E;color:#D4D4D4;overflow-x:auto;white-space:pre-wrap;word-wrap:break-word&quot; tabindex=&quot;0&quot; data-language=&quot;sql&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;SELECT&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; current_time,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    c.relname,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    l.database,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    l.transaction,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    l.pid,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    a.usename,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    l.mode,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt;    l.granted&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;FROM&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; pg_locks l&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;JOIN&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; pg_catalog.pg_class c &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ON&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; c.oid = l.relation&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;JOIN&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; pg_catalog.pg_stat_activity a &lt;/span&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;ON&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; a.procpid = l.pid&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#569CD6&quot;&gt;WHERE&lt;/span&gt;&lt;span style=&quot;color:#D4D4D4&quot;&gt; l.pid &amp;lt;&amp;gt; pg_backend_pid();&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&quot;https://wiki.postgresql.org/wiki/Lock_Monitoring&quot;&gt;source&lt;/a&gt;&lt;/p&gt;</content:encoded></item></channel></rss>