Travis Head — The Expert Layer You Activate When the World Is On Fire

Growing up in India in the ’90s, everybody had one religion: Tendulkar. And yet there I was the lone kid in a room full of India fans cheering for Australia. Not because I wanted India to lose. But because something about that Aussie side felt like pure, ruthless excellence:

  • Warne’s sorcery

  • Gilchrist’s audacity

  • McGrath’s metronome

  • Ponting’s cold authority

It felt like watching a team that had found a higher gear. A different plane of performance. A machine tuned to borderline perfection.

And then I grew up. Life got heavier  with career, kids, responsibilities. India got strong.
Australia’s golden generation retired. And that inner fan in me- went quiet.

Until a man named Travis Head walked in a  World Cup final and played like he was batting in a video game. Head doesn’t “settle in.” He doesn’t “get his eye in.” He arrives fully loaded. He assesses nothing. He just destroys. And when he does it in big matches  something inside me wakes up again. The boy who loved the Aussie bulldozer style opens his eyes. He recently destroyed England in the first test of the Ashes and that is the real inspiration for the post.

That feeling of a sudden ignition is exactly what helped me understand something in AI called Mixture of Experts.

MoE = Travis Head

In a Mixture of Experts model, you don’t fire the whole network for every input.
You only activate the right expert for the right situation.Most experts sit idle.
Most neurons rest. Only the specialist steps forward.

That’s Travis Head. He’s the “expert layer” you only activate when you need surgical violence.

He’s not the batter for all conditions.
But when the model (or the team) faces chaos, the router ,the gate, says: “Load Head.exe. Activate the destruction module”

And then he does things no algorithm can predict.

AI and Sports. Sports and AI

People ask me why I mix sports with AI. Why I write with this blend of emotion, humor, nostalgia, and curiosity. It’s because this is how I learn. This is how I make sense of the world.
This is how I connect the engineer in me with the kid I once was.

AI research gives me the intellectual spark. Sports gives me the emotional spark. Writing ties both sparks together into a voice that feels like mine.

And somewhere between Transformers, MoE, and Travis Head, I’m rediscovering what it feels like to be awake, curious, and fully alive again.

Mixture of Experts — The Batting Order Inside the Transformer

After learning about the Transformer architecture, here’s the part that really made something click for me: Not every part of the model needs to think for every token.

Sometimes, you only need the right specialist at the right moment  running at full tilt. Enter Mixture of Experts (MoE). If the Transformer is the stadium and the pitch, MoE is the batting order inside it.

What Is Mixture of Experts (In Human Words)?

An MoE layer is basically:

  • A bunch of experts (tiny specialized neural networks)

  • A gate (a router that chooses which experts to activate)

  • Sparse activation (only 1–2 experts run for each token)

Instead of a dense model where every neuron fires for every word, MoE works like a fampous cricket cliche –  “Horses for courses”

Some experts learn math.
Some learn jokes.
Some learn Tamil lyrics.
Some learn how to code.
Some learn how to reason.

And the gate learns over millions of training steps – which specialist to call upon at what moment.

It’s specialization, efficiency and ruthless matching of talent to situation.

The Cricket Version (because my brain only learns with sports)

In a dense neural network: Every player on the team walks out for every delivery.
Chaos. Heat. No runs. Nobody wins.

In MoE:
When the ball is spinning, you send your best spin player.
When the ball is short, you send the pull-shot specialist.
When the bowler is rattled, you send someone who can finish the job.

That gate ,the router  becomes the biggest difference between brilliance and burnout.

Why MoE Is a Breakthrough

MoE gives you:

  • Bigger capacity (many experts = huge brain)

  • Lower compute cost (only a few experts activate)

  • Faster inference (because most of the model is sleeping)

  • Smarter specialization (each expert gets extremely good at one slice of the world)

This is how modern large models scale without melting the electricity grid. GPUs already run hot enough; MoE prevents them from becoming volcanic.

The Emotional Part

The idea that intelligence emerges not from everyone doing everything, but from the right expert stepping in at the right moment hit me personally.

It reminded me of something from my childhood. Growing up in India, everyone supported Tendulkar. I was the boy openly cheering for the Australians. Years later, Travis Head would walk into the opening game of the Ashes and do exactly what MoE does: activate only when needed and then unleash absolute destruction.

But that’s for the next post.

MoE woke up the engineer in me. Travis Head woke up the boy in me. And somewhere between those two, I found the voice that’s writing these posts today.

Next up:
Travis Head — The Expert Layer You Activate When Everything Is On Fire.

RNN, LSTM, CNN — The Models That Ruled Before Transformers Took Over

Before Transformers became the Rajinikanth (Before anyone learns AI, they should first know the man who said “If I say something once, it is like me saying it a 100 times.” Thats clarity, no Hallucination)  of AI architectures, the field ran on three big families of models. Each one tried to solve a different piece of the “how do we make machines understand patterns?” puzzle.

Let’s break them down in the simplest possible way.

RNN (Recurrent Neural Network)

What it is:

A model that processes sequences one step at a time, remembering what came before.

How it thinks:

“I saw this word earlier… let me keep that in mind.”

What it was used for:

  • Early language models

  • Time-series prediction

  • Simple speech tasks

Why it struggled:

Memory fades fast. RNNs forget long sentences. If a sentence was 20 words long, it remembered… maybe 5. Think of it like a student who remembers the line “The words are dark lovely and deep” but forgets “but miles to go before I sleep”

LSTM (Long Short-Term Memory)

An advanced RNN that uses “gates” to decide what to remember and what to forget.

How it thinks:

“I’ll store important stuff and throw away the junk.”

What it was used for:

  • Speech recognition

  • Machine translation

  • Predictive text

  • Music generation

Why it was better than RNNs:

LSTMs could remember longer sequences. Instead of forgetting after 5 words, they could recall 20, maybe 30. They were like the topper kid who remembered the whole poem and recited it proudly on stage.

Why they still fell short:

They processed everything sequentially. No parallelism → slow, expensive, don’t scale.

CNN (Convolutional Neural Network),

What it is:

A network that looks for patterns in small patches of an image or signal and stitches them into bigger patterns.

How it thinks:

“I’ll check every small window, find edges, curves, textures, and build up the picture.”

What it’s used for:

  • Image classification (cat/dog)

  • Object detection

  • Facial recognition

  • Early medical imaging tasks

Why CNNs were kings for 10 years:

They are insanely good at visual patterns:

  • they’re fast

  • they’re parallel

  • they reuse filters efficiently

They were the Aussie cricket team of computer vision – dominant, ruthless, unbeatable.

Why they didn’t become LLMs:

CNNs don’t have a natural way to handle long-range relationships in text. CNNs are great for “here’s a face,” but bad at “here’s a paragraph with meaning spread across 80 words.”

Why Transformers Replaced All Three

Transformers brought:

  • Attention (focus on what matters)

  • Parallelism (look at everything at once)

  • Long-context handling

  • Scalability

  • MoE compatibility

  • Cleaner training dynamics

RNNs forgot too easily. LSTMs remembered but were slow. CNNs saw patterns but not meaning. Transformers merged the best of all worlds. They became universal: text, vision, audio, protein folding, coding, reasoning… all with one architecture.

CNNs and Harari: Why Machines Saw Patterns but Couldn’t Gossip

CNNs were brilliant at one thing: detecting patterns in small windows.
Edges. Corners. Curves. Textures.

They could look at a tiny 3×3 patch of pixels and say, “Oh, this looks like an eyebrow” or “This is definitely a  wheel”. Then they stitched first edges, then shapes, then whole objects.

But here’s the catch: CNNs only look locally. Their entire worldview is “small patch → next small patch → next small patch.” They missed the long range connection. The thing that Harrari talks bout in Sapiens as the thing that made humans dominate the planet: Our ability to connect abstract ideas and gossip across tribes.

We’re not powerful because we see edges and textures.
We’re powerful because we can connect:

  • stories

  • beliefs

  • rumors

  • meanings

  • relationships

  • consequences that unfold over time

We can remember something someone said three weeks ago and use it to interpret something said today.

CNNs?
Nope. They’re the quiet, hardworking kid who studies hard but has no tea to spill.

Why This Leads Naturally to Transformers

Transformers introduced attention, which gives a model the ability to say: “This word from the beginning matters to this word at the end and let me connect them.”

That’s gossip.
That’s meaning-spreading.
That’s long-range dependency.
That’s human-style cognition.

The Architecture of AI — The Blueprint of the Brain

When I started writing about AI, I thought the “model” was whatever lived on the GPU and spat out answers.
Simple.
Done.
Onward.

Turns out, not quite. There’s something that sits before training, before inference, and before any cleverness shows up. It’s called architecture, and it’s the part we almost never talk about outside research circles.

Architecture = The Brain’s Blueprint

Architecture is just the design of how the model thinks.

Not the data. Not the training. Not the GPUs. Not the math.

Just the layout.
The wiring diagram.

For decades, this wiring diagram came in different shapes:

  • RNNs (models that remembered yesterday)

  • LSTMs (models that remembered yesterday slightly better)

  • CNNs (models that inspected images like bouncers checking IDs)

One architecture changed everything: The Transformer.

What Makes a Transformer a Transformer?

It has two big superpowers:

  1. Attention :the ability to look at every part of a sentence and decide what matters

  2. Parallelism : the ability to think about many things at once, without step-by-step bottlenecks

And inside each Transformer block, you mostly have two components:

  • Multi-Head Attention

  • Feed Forward Networks (FFN)

Stack  a number of those and suddenly you’ve built a modern LLM. This stack of blocks is called as architecture. Think of it as the playing conditions before the cricket match even starts:

  • the pitch

  • the boundaries

  • the field layout

  • the weather

The players (your neurons) and the coach (the optimizer) are important, but the ground determines how the match will flow.

Why This Matters for My Writing Journey

I realized that if I wanted to understand AI deeply enough to explain it to my kids one day, I needed to understand the blueprint first. Everything else (training, inference, MoE, GPUs, CUDA, Randy Johnson and Curt Schilling, and yes, Travis Head ) all sit on top of this architecture . This is the moment in the story where the camera pans out and you finally see the whole cricket ground.

Next up:
Mixture of Experts the specialist lineup inside the Transformer.

And after that?
A man named Travis Head walks into a game and becomes an expert layer all by himself.

Stay tuned.

The LLM Era — When the Toy Soldiers Started to Sing

A few years ago, training AI models felt less like engineering and more like lining up those little green army soldiers from Toy Story. Hundreds of them, identical but slightly crooked, waiting for orders. Each one labeled optimizer, loss function, GPU, dataset standing shoulder to shoulder, disciplined but lifeless until you gave them purpose.

Now those tiny soldiers have come to life.They write, draw, argue, and even sing back to us.
And sometimes, when I’m chatting with ChatGPT late at night, it almost feels like it’s humming “You’ve Got a Friend in Me.”

Welcome to the Large Language Model [LLM] era, the moment the toy soldiers started thinking (and singing) for themselves.

From Soldiers to Symphony

Everything we’ve talked about training, inference, loss, compute, communication still applies.An LLM is just all those pieces assembled at breathtaking scale:

  • Supervised and self-supervised learning for the foundations.
  • GPUs, networks, and HBM for the muscles.
  • Reinforcement Learning from Human Feedback (RLHF) for refinement 

The result is a model with billions of parameters that can predict the next word with eerie fluency. It’s not intelligence; it’s pattern completion with perfect pitch.

LLMs as the General Manager

If Steph Curry taught us training, Peyton Manning taught us inference, and Randy Johnson taught us power, then LLMs are the general managers (Bob Myers, Theo Epstein) are the architects of the franchise deciding which players, datasets, and strategies to bring together.

They don’t call plays; they shape the roster. They’re built on everything the coaching staff provides (I am calling the the software stack, the coaching staff that actually runs the team):

  • The compute runtime frameworks coordinate every move, assigning roles and drawing up plays.
  • The collective communication libraries synchronize the squad, ensuring gradients and signals flow perfectly across GPUs.
  • The transport layer keeps the sideline chatter crisp, moving data packets at near-light speed so nothing gets lost in translation.

Together, they form the coaching staff. The LLM simply decides who to hire and what philosophy to play by.That’s why modern LLMs are systems-of-systems, directing compute, memory, and communication the way a general manager manages rosters, contracts, and chemistry. If I were to take the Toy story analogy a bit farther, Woody would be software stack and Andy would the LLM.

⚙️ What’s Actually Going On Under the Hood

When you type a prompt, here’s the quick replay:

  1. Your text is broken into tokens, which is nothing but a fragment of words.
  2. Each token passes through layers of neurons that compute probabilities for what should come next.
  3. The model picks the most likely next token and repeats that millions of times per second.
  4. The whole thing runs on GPUs, coordinated by software, and kept alive by HBM , the same trio that powered your baseball heroes in the last post.

    Massive-scale next-word prediction performed beautifully and at breathtaking speed.

The Infrastructure Reality Check

Every new model you hear about — GPT-5, Claude 3, Gemini 2, Llama hides a staggering infrastructure story:

  • Hundreds of thousands of GPUs wired together like neurons.
  • Petabytes of text scraped, filtered, and tokenized.
  • Energy footprints that rival small cities.

The next frontier after bigger models is better orchestration: models that learn faster, infer locally, and waste less.

The Summer of Sosa, McGwire, and a Few Terabytes of Hype

I didn’t live in the U.S. when Sammy Sosa and Mark McGwire were trading moonshots in ’98, but I’ve seen the ESPN 30 for 30 about the nightly cut-ins, the flashbulbs, the disbelief. Later, when I moved to the Bay Area, the steroid era was still echoing across McCovey Cove.
Barry Bonds was parking baseballs in the water and making everyone tune in just to see how far physics could bend.

That’s what the LLM era feels like.Every few weeks, a new model drops and we have more parameters, more context, more compute. The crowd (that’s us) cheers from the digital bleachers.We’re living through an arms race worthy of ESPN’s highlight reel: bigger models, faster inference, brighter lights… and maybe a little “extra training” behind the scenes.

Like baseball then, AI now will have to find balance between precision over power, discipline over spectacle. Because in the end, every home-run race ends the same way: the noise fades, the lights dim, and someone quietly starts training for the next season.

 Where We Go from Here

LLMs are what happens when every Lego brick, every green soldier, every data packet finally clicks into formation.They’re the first orchestra built from all the right instruments.

And maybe that’s the real lesson:AI doesn’t grow by replacing people. It grows by learning from us. Because sometimes, late at night, when it autocompletes my thoughts just right, I swear I can almost hear it whisper,  “You’ve got a friend in me.”

⚙️ Box Score for the Curious

Concept Analogy What It Means
Transformer Architecture Team formation How LLMs manage attention across every token.
Parameters Players’ muscle memory The learned weights that define behavior.
Training Data Game-film library Everything the model studies to predict patterns.
Inference Game-day execution Running the trained model on new prompts.
RLHF Coaching feedback Human reinforcement that refines tone and accuracy.

The Shohei Ohtani of AI Infrastructure — When Your NIC Starts Hitting Home Runs

If Randy Johnson and Curt Schilling were the perfect 1–2 punch,  the GPU and network fabric working in rhythm, then Shohei Ohtani breaks the mold entirely. He doesn’t need a partner. He pitches and hits. He’s both the weapon and the engine.

And that’s exactly what’s happening in AI infrastructure right now. We’re watching the rise of the Shohei Ohtani of computing: the modern day NIC.

Shohei, Meet the modern day NIC

For decades, the playbook was simple:

  • The CPU handled the logic, orchestration and control, the brains.
  • The NIC (Network Interface Card) handled the brawn, moving packets in and out.

Then workloads exploded: AI models, data center traffic, disaggregated architectures. We couldn’t afford to have the CPU pitching while the NIC just stood around chewing sunflower seeds.

Enter the modern day NIC – a card that not only moves data but processes it, secures it, and accelerates it.It’s both pitcher and slugger. It offloads tasks like encryption, telemetry, and storage directly from the CPU, freeing up compute while adding intelligence right at the edge.

That’s Shohei Ohtani in silicon form.

Dual Threat, Dual Benefit

When Ohtani steps on the mound, he delivers 100 mph fastballs. When he steps up to the plate, he hits 450-foot home runs. Every team dreams of having one player who can do both.

Similarly, every AI infrastructure team dreams of hardware that can both move data and make sense of it.
A modern day NIC can:

  • Perform packet filtering, encryption, and telemetry inline.
  • Run microservices or inference models locally.
  • Reduce CPU overhead by up to 30–40%.

It’s not just acceleration,  it’s role compression.

Baseball’s Future and AI’s Frontier

Shohei isn’t just a player; he’s a category redefinition.He made the baseball world rethink what a roster spot means. Modern day NICs do the same, they make data centers rethink what a “network card” is.

The boundaries between compute, communication, and security are dissolving.Soon, we won’t just have GPUs talking to NICs ,we’ll have NICs that learn, offload, and decide.

⚙️ Box Score for the Curious

Concept Baseball Analogy What It Means in AI
CPU The manager calling plays from the dugout Orchestrates overall workload strategy.
Traditional NIC Players either hit or pitch Moves data between nodes or systems.
Modern Day NIC Shohei Ohtani pitching and hitting Offloads, processes, and secures data simultaneously.
Compute Offload Ohtani covering two lineup spots Reduces CPU overhead and improves efficiency.
Programmability Adjusting mid-game NICs can run firmware and logic for evolving workloads.

Randy Johnson, Curt Schilling and What They Can Teach Us About AI’s Compute and Communication Game

If you don’t instantly know who Randy Johnson and Curt Schilling are, that’s okay, my daughters thought they were youtubers too. But in 2001, those two men turned the Arizona desert electric, pitching the Diamondbacks past the mighty Yankees and straight into baseball history

I was at ASU back then, newly arrived and newly addicted. Johnson, the Big Unit(this was my yahoo password for a long time) threw 100 mph fastballs that looked unfair. Schilling(not a fan of his politics)  was his mirror image: precise, methodical, strategic. Together, they were baseball’s perfect 1-2 punch.
Years later, as I got pulled into the world of AI infrastructure, I realized those two aces were basically the GPU and the network fabric that power today’s data centers.

Randy Johnson = The GPU

All power, no hesitation, pure velocity. That’s compute. GPUs deliver trillions of operations per second—mathematical heat that can flatten any workload. But raw power alone doesn’t win championships; it needs rhythm and coordination.

Curt Schilling = The Network Fabric

Schilling obsessed over sequencing and timing. He knew how to sync with his catcher, coaches, and even Johnson.That’s what network fabrics do in AI: they connect GPUs, synchronize gradients, and make sure every node knows what the others are doing. Without that orchestration, you’d have a dugout full of Randy Johnsons throwing in different directions.

Compute + Communication = Efficiency.
Otherwise, all that speed just hits the backstop.

Luis González = The HBM (High-Bandwidth Memory)

And then there was Luis González—the quiet hero who drove in the winning run in Game 7, his bat snapping as the ball blooped over Derek Jeter’s head. That’s HBM in the cluster: not flashy, but the moment it matters most, it delivers.It feeds data to GPUs fast enough to keep them humming, turning potential into performance. Without HBM, the whole system stalls; without Gonzo, there’s no parade at the Bank One BallPark.

The 2001 Diamondbacks and the 2025 Datacenter

That team alternated their aces to perfection.Modern AI systems do the same dance between compute, communication, and memory. If GPUs pitch too many gradients before the network or memory catch up, efficiency tanks. That’s why engineers obsess over latency and throughput—it’s not about throwing harder, it’s about staying in rhythm.

⚙️ Box Score for the Curious

Concept Baseball Analogy What It Means in AI
GPU (Compute) Randy Johnson’s 100 mph fastball Raw processing power driving training and inference.
Network Fabric Curt Schilling calling the pitch sequence Synchronizes data and gradients between GPUs.
HBM (Memory) Luis González’s perfectly-timed bloop single Feeds data fast enough to keep compute in motion.
Latency Time from pitch to catcher’s mitt How quickly data moves between nodes.
Throughput Pitches per inning Total data or computation volume handled at once.
Cluster Efficiency Johnson-Schilling-González rhythm Balanced compute + communication + memory with no bottlenecks.

 

Omaha! Omaha!! — Peyton Manning and the Art of Edge vs. Cloud Inference

If Steph Curry taught me about training, Peyton Manning taught me about inference.

When I think of edge vs. cloud inference in AI, I can’t help but picture Peyton standing at the line of scrimmage, scanning the defense, shouting “Omaha! Omaha!!” before changing the play.

That moment that audible is edge inference in its purest form.

Edge Inference: The Quarterback on the Field

Edge inference means AI decisions made locally, right where the data is created with no waiting, no network round trips, no cloud overhead.

That’s Peyton at the line.He sees the pad level of the linebackers, the corner creeping up, and instantly adjusts the play.He doesn’t radio the offensive coordinator in the booth to ask, “Hey, do you think they’re blitzing?” He makes fast, decisive, low latency call right there. 

That’s exactly what your Tesla does when it sees a flashing red light or a cyclist in the crosswalk.It doesn’t send video to a data center and wait for permission, it decides locally.

Edge inference is all about latency and autonomy.The decision happens where the action is.

And sometimes it’s messy just like a quarterback improvising mid-play. But it’s the only way to play the game at full speed.

Cloud Inference: The Coordinator in the Booth

Now imagine the offensive coordinator sitting high above the field, watching multiple camera angles, spotting patterns the quarterback can’t.That’s cloud inference.It’s slower, but it has a panoramic view.

When the play is over, the coordinator reviews the footage, analyzes coverages, and updates the strategy for the next drive.In AI terms, the cloud is where heavy, compute-hungry inference happens.

ChatGPT is an example of cloud inference. You type your prmpt in the browser, the text is sent to the openAI servers in the cloud. A large models running across thousands of GPUs generates the response. The result is streamed back to you. The benefit is you can ask whether Rumi has long hair or Zoey does in K Pop Demon Hunters.

Cloud inference is strategic.
Edge inference is instinctive.
Both are essential.

Why It Matters

AI systems like football teams win because of coordination between the booth and the field.

  • The cloud handles the big picture: aggregating insights, running analytics, feeding new strategies.
  • The edge executes in real time: responding to what’s right in front of it.

Without the booth, the quarterback loses perspective.Without the quarterback, the booth is just a collection of PowerPoints.

AI’s challenge and its beauty lies in balancing both.

The Infrastructure Behind the Play

Underneath all this football talk is a serious technical story: Edge devices (cars, phones, cameras) run lightweight, optimized models  tuned for speed, not size.The cloud runs the  giant language models, video analytics, and retraining pipelines. The connection between them the network is like the headset in Manning’s helmet.Too slow, and the play breaks down.Too noisy, and the message gets garbled.

That’s why companies obsess over the holy trinity of AI communication, latency, bandwidth, and reliability.

Wrapping Up — The “Omaha!” of AI

Peyton Manning didn’t need more compute. He needed better timing, cleaner communication, and trust in his instincts.Edge inference is that — the art of decision-making without waiting for approval.Cloud inference is the strategy room that keeps the playbook evolving.

Together, they make AI feel human: fast, aware, and occasionally screaming “Omaha! Omaha!!” before doing something brilliant.

 

Concept Football Analogy What It Means in AI Infrastructure
Edge Inference Peyton Manning reading the defense and audibling at the line AI decisions made locally for real-time action with minimal latency.
Cloud Inference The offensive coordinator in the booth watching all 22 players Centralized processing  with larger models running in data centers, using full-field (global) context but higher latency.
Latency The delay between the snap and the QB’s release The time it takes for a model to process input and return a prediction. Edge minimizes it; cloud trades it for more insight.
Bandwidth The playbook thickness and how much can be communicated before the snap The volume of data that can move between edge devices and the cloud. Higher bandwidth = richer coordination.
Model Size The QB’s mental playbook Edge runs lighter, optimized models; cloud handles large, memory-hungry LLMs.
Network Fabric The headset between QB and coordinator The connectivity layer (Ethernet, 5G, Wi-Fi) that links edge and cloud
Synchronization QB and receiver timing their routes Coordinating model updates and telemetry across edge nodes and cloud servers.
Trade-off Choosing between a quick slant and a deep route Balancing speed (edge) vs. context and compute (cloud). The best systems know when to switch.

How AI Learns

When I first started exploring AI, I pictured mysterious servers glowing somewhere in the desert, humming equations I’d never understand. Turns out, AI learning isn’t that different from how we learn , just with fewer snack breaks and more GPUs. Here’s my running commentary on the ways machines (and sometimes humans) learn.

Supervised Learning — The Teacher’s Pet

In supervised learning, the model gets examples with the answers attached.
It’s like flashcards for robots:

“This is a cat. This is also a cat. That’s a dog. No, still a cat.”

Over time it learns the pattern. And me? I’ve been part of this system all along  squinting at CAPTCHAs, labeling traffic lights, crosswalks, and the poor soul with the polka-dot umbrella.
I thought I was proving I wasn’t a robot. Turns out, I was training one.

That’s supervised learning: humans labeling the world so machines can catch up.

 

Unsupervised Learning 

When I once typed “Diane Lane” into Netflix purely for cinematic reasons, of course my feed suddenly filled with movies about infidelity. That’s unsupervised learning in action. It simply looked at the watch patterns of millions of people who also searched “Diane Lane” and discovered a hidden cluster: these users often end up watching “Unfaithful,” “Eyes Wide Shut,” and other midlife-crisis cinema. It didn’t know the meaning of fidelity or Diane Lane, it just noticed patterns.

That’s what unsupervised learning does:
it finds associations in the data, whether or not we meant to reveal them.

The model doesn’t judge. It just groups.
We’re the ones who have to live with the awkward recommendations.

Semi-Supervised Learning 

There’s an old Tamil saying: “A pot of rice can be judged by a single grain.”
That’s semi-supervised learning in a sentence.

In medicine, it’s not always possible to label millions of scans , each one needs a radiologist’s eye and hours of expertise.So instead, a model learns from a small set of labeled cancer images, then applies what it’s learned to a much larger pool of unlabeled ones.
It starts spotting subtle textures, light gradients, tissue density that even trained eyes can miss.

That’s why semi-supervised learning is quietly powering breakthroughs in cancer detection, diabetic retinopathy, and lung nodule screening.

Self-Supervised Learning 

Here, the AI hides part of the input and tries to predict it. That’s how language models like ChatGPT train: by filling in blanks billions of times. Think of autocorrect. It read oceans of text, learning that “I’m on my” is usually followed by “way,” not “llama.” That was Self-Supervised Learning

Reinforcement Learning 

Reinforcement learning is all about feedback. The model acts, gets rewarded or punished, and updates its playbook.That’s how AlphaGo mastered Go . 

Transfer Learning 

Self-driving car models are often trained in simulation environments first (cheap, safe, controlled), then fine-tuned on real-world traffic data.
The pattern recognition of lanes, pedestrians, intersections transfers from virtual to reality.

Federated Learning 

Most people think Siri just listens, answers, and sometimes gets things hilariously wrong.
What they don’t see is that Siri  like millions of her clones on iPhones worldwide, is quietly learning every day.

Each iPhone keeps track of your accent, your tone, your phrasing.
It trains a tiny model locally on your device to better understand your speech patterns, without sending the raw audio to Apple.

Once in a while, your phone sends back small mathematical changes to the model and not your recordings or transcripts.
Apple’s servers then aggregate those updates from millions of users, average them out, and push back a smarter, global Siri model to every device.

No one outside ever hears your voice.
But Siri still gets better because everyone contributes, privately.

Type of Learning How It Learns Everyday Example Real-World Application
Supervised Learning Learns from labeled data Clicking all the traffic lights in a captcha Image recognition, 
Unsupervised Learning Finds hidden patterns Netflix showing you infidelity dramas after one “Diane Lane” search Recommendation engines
Semi-Supervised Learning A few labels teach the rest “Taste one grain, judge the pot of rice.” Cancer detection, 
Self-Supervised Learning Predicts what’s missing Autocorrect turning “I dare you” → “I date you” GPT-style language models
Reinforcement Learning Trial, error, reward Learning not to shop online at 2 AM Autonomous driving
Transfer Learning Applies past experience Thinking you can fly a plane because you ride an e-bike  Multilingual translation
Federated Learning Learns together, keeps data private Gboard learning your slang without sending your messages Keyboard prediction
Bribed Learning Motivation through rewards “Finish math homework, get a cookie.” Education gamification, sales incentives (and parenting!)

The Steph Curry Guide to Learning, Inference, and Loss Functions

When I started learning about AI, it all felt abstract -gradients, weights, backpropagation. None of it clicked until I realized: Steph Curry has been doing machine learning his whole career.

Stay with me.


Training = The Reps Nobody Sees

Training is when an AI model learns. It runs through data again and again, adjusting itself until it gets better.
That’s Steph in an empty gym. Thousands of threes, same motion, tiny corrections each time.

Each shot is a data point.
Each miss is a feedback signal.
Each make slightly adjusts the “weights” -his form, release angle, foot position.

Curry’s dataset is the rim. His loss function is the clang of the miss.
The coach doesn’t have to yell – the ball bouncing off the iron is the gradient telling him what to fix.

Training is painful, repetitive, and invisible.
But that’s where the magic happens – in the noise of trial and error.

 


Inference = Game Time Decisions

Inference is what happens once the model is trained -when it applies what it’s learned to new data.

That’s Steph in a playoff game. The defense takes away the three-point line.
He doesn’t retrain. He adapts. Steps inside, hits a mid-range jumper, or finds a team mate for an assist.

 

 

No more weight updates, no feedback loops. Just applying learned patterns in real time.

That’s inference -fast, confident, and built on a foundation of hours of training data.

In infrastructure terms: training happens on GPU clusters with massive data pipelines; inference happens courtside- at the edge, low latency, high precision.


Loss Functions = The Coach That Yells (and the Clang That Teaches)

Every AI model has a loss function -a way to measure how wrong it is.Without it, the model never knows it’s off target.

For Steph, the loss function is simple: Did the shot go in?A miss triggers an update. The sound of the rim is the model’s “you messed up” alert.
That’s how the model learns -by minimizing loss over time.

And like a good coach, the loss function doesn’t lie.It doesn’t care if you’re tired, famous, or broke the single-season record.A miss is a miss. Backpropagate, adjust, try again.


Generalization = Defenses You’ve Never Seen

Every great player  and every great model eventually faces a situation it wasn’t trained on.For AI, that’s called generalization: performing well on data it’s never seen before.
For Steph, it’s when the defense throws a new scheme at him in the Finals.

A model that memorized training data would crumble.But one that truly learned the game? It adapts, finds patterns, and still scores.

 

Paolo Maldini once said, “If I have to make a tackle, I’ve already made a mistake.”Steph Curry plays the same way -he anticipates, reads, adjusts before the problem even appears. That’s generalization in motion.


Wrapping Up: The AI of Joy

Steph Curry isn’t just a player; he’s an algorithm of joy.
He trained on repetition, infers under pressure, minimizes his loss, and generalizes beautifully.
Every swish is an inference executed perfectly; every miss is backpropagation in disguise.

And for the rest of us  learning AI or just life that’s the lesson:
Keep training. Learn from the clangs. Trust your dataset.
Because when you finally hit that shot that makes the world go quiet, that’s what real inference feels like.

#30 forever on the court, and now in my AI playbook.

 

The Red Light That Stopped Me (and Started Me)

The other day, my Tesla on Full Self Driving rolled up to a flashing red light.It slowed down. Stopped. Waited for the car across from me. Then pulled ahead. That wasn’t me driving. That was the car making a decision.
And in that moment, I thought: Wait… how does AI actually do this?

I’ve worked around technology for years, but I’ve never really taken the time to unpack what “AI” means. So I’ve started this blog, not as an expert, but as a student. My goal is to learn AI piece by piece — like Lego blocks — and share that journey with you. If you’ve ever been curious but intimidated, maybe we can learn together.

What Is AI?

Artificial Intelligence is the broadest umbrella. It’s any attempt to make machines act in ways that we’d normally call “intelligent” if a human did them. That could be as simple as a chess program or as complex as a self-driving car.

But here’s the catch: AI doesn’t always mean “learning.” Old-school AI was full of rules and logic. If-then statements written by humans. If the light is red, stop. If it’s green, go. That’s AI too — but brittle. It only works in situations you’ve anticipated and coded.

What Is Machine Learning?

Machine Learning is a subset of AI. Instead of hand-coding every rule, you let the system learn patterns from data.

Example: instead of telling the computer exactly what makes an email spam, you feed it thousands of emails labeled “spam” or “not spam.” The model finds the patterns on its own.

So ML = AI that learns from examples, not just rules.

What Is Deep Learning?

Deep Learning is a further subset of ML. It uses neural networks with many layers (hence “deep”) to learn incredibly complex patterns.

This is the engine behind modern breakthroughs: image recognition, speech recognition, and language models like ChatGPT.

When my Tesla recognized the flashing red, that was deep learning vision models at work — neural networks trained on millions of traffic light images.

Wrapping Up

So here’s my Lego box so far:

  • AI = the big umbrella.
  • ML = machines learning from data.
  • DL = neural networks stacked deep.

I’m still squinting into the AI sun — sometimes literally, when I forget my sunglasses and rely on my Tesla screen to tell me when the light has turned green. But I’m starting to see the pieces more clearly.

This blog is my way of stacking those pieces. If you’re curious too, stick around. We’ll build together. And I’m publishing this here so that anyone can leave comments anonymously, without fear of shame or employer backlash — a safe space to learn out loud.

What’s Next

This first post was my Lego starter kit: AI, ML, DL, and reinforcement learning.

But the real magic happens when we look inside the box:

  • Training → how models “practice” with data.
  • Inference → how they “play the game” in real time.
  • Loss functions → the yelling coaches keeping score.
  • Generalization → why Paolo Maldini said, “If I have to make a tackle, I’ve already made a mistake.”

That’s where we’re headed next.