Welcome Back

Sign in to continue your training

or

Don't have an account? Create Account

Back to Blog
Development 8 min read 52 views

Origin and Mission

K

kdough01

Published March 26, 2026

FORGE DEVELOPMENT BLOG · VOL. 01

Built in the heat of every rep.

How we built a computer-vision coaching platform that watches you lift, thinks like a coach, and earns its name every session.

JUNE 2025 · 12 MIN READ · PLATFORM & ENGINEERING


Iron doesn’t know what it wants to become. That only happens in a forge — under heat, pressure, and the repeated impact of something that knows exactly what it’s shaping.

The fitness app landscape is full of logging tools. You tap in your sets and reps, maybe snap a photo, and a number in a database goes up. That was never what we were building. From the first commit, Forge was conceived as something that watches — a system that understands the difference between a good rep and a dangerous one, that tracks fatigue before you feel it, and that speaks back to you in the language of an experienced coach standing two feet away.

This is the story of how we got there.


01 — Origin: The Problem with “Did You Work Out?”

The initial frustration was simple: every fitness platform we looked at asked only one question. Did you work out? Sets, reps, weight — the what, never the how. But anyone who has trained seriously knows that two people lifting identical weights for identical reps can be moving in entirely different directions. One is building. The other is grinding themselves down.

We kept coming back to the same scenario: an athlete finishes a set of squats, logs “5 × 100 kg,” and gets credit for it. But on rep three, their knees caved. On rep four, their heels rose. By rep five, they were grinding through spinal flexion that would catch up with them in three months. The data said: great session. The truth said: early injury in progress.

We wanted to build the coach that doesn’t look away. The one who watches every rep, and tells you the truth — even when the truth is uncomfortable.

The technical question followed immediately: how do you watch a rep and understand it? The answer required solving several hard problems simultaneously — computer vision, biomechanics, velocity-based training theory, fatigue modelling, and natural language coaching that doesn’t sound like a disclaimer.


02 — Architecture: From Pixels to Coaching, in One Pipeline

The core pipeline runs in three stages. A video of a lift goes in. Structured, rep-level coaching comes out. In between, a lot happens.

Stage 1: Pose Detection & Kinematics

The ExerciseFormTracker processes each frame of the video, extracting joint positions, computing angles, estimating bar speed, and flagging events like sticking points and torso rotation. Critically, it also estimates forces — so when a weight isn’t logged, the system falls back to a 20 kg bare-bar assumption, but when the user has tracked their load, every force estimate is anchored to reality. The tracker outputs a per-frame JSON structure that becomes the ground truth for everything downstream.

Stage 2: Form Analysis — The Heart of Forge

The ExerciseFormAnalyzer is where the platform earns its keep. It ingests the frame-level data and runs a multi-layer analysis stack: rep detection, per-rep quality scoring, injury-risk flagging, VBT metrics, and fatigue modelling. Each of those layers was built from scratch to be honest — not to make users feel good, but to tell them what’s actually happening.

Rep detection starts with a Savitzky-Golay-smoothed joint position signal — typically hip height for squats and deadlifts, wrist height for bench. Peak and trough detection runs with distance and prominence constraints. We built a user-guided correction loop: if the lifter logged that they performed five reps and the algorithm finds four, it retries detection with progressively looser thresholds rather than silently under-counting. The user’s rep count is always the authoritative floor.

Quality scoring uses weighted rubrics that are exercise-specific, not generic. A squat quality score combines six components: depth (25%), knee valgus (20%), back angle (20%), symmetry (15%), eccentric tempo (10%), and heel rise (10%). A detected butt-wink automatically costs ten points regardless of other scores, because posterior pelvic tilt at depth is a load-bearing spine issue, not a style preference. The bench press scorer substitutes elbow angle and flare for the squat’s depth and valgus dimensions. Each exercise has its own biomechanical model.

Injury-risk flags run on every rep, independently of quality scoring. Butt-wink is detected from pelvic tilt signal crossing −8°. Knee valgus is measured in pixels of inward deviation and severity-graded: warning above 40 px, critical above 70 px. Heel rise is detected from heel marker displacement. Left/right asymmetry is computed from the mean absolute difference between symmetric joint angles across the rep window. These flags aren’t just logged — they’re fed directly into the coaching prompt with rep numbers and measurements attached.

Stage 3: Fatigue Modelling & VBT

Velocity-based training insight came later in the build, and it changed how we thought about the whole product. Bar speed is the most honest signal in strength training — it reflects effort, fatigue, and proximity to failure in ways that rep counts alone never can. Forge computes mean concentric velocity, peak concentric velocity, and eccentric speed for every rep. It then models velocity loss across the set and projects an estimated failure rep using linear regression on the velocity trend.

The failure velocity thresholds are exercise-specific: 0.20 m/s for squat, 0.18 m/s for bench press, 0.15 m/s for deadlift. When velocity loss crosses 20% for the set, the coaching system is required — hard-coded in the prompt rules — to recommend a specific load reduction for the next set. Sticking point detection uses trough-finding on the concentric velocity curve, with position expressed as a percentage of concentric ROM. Over multiple sessions this becomes a diagnostic tool: a sticking point consistently at 60–70% through the press suggests a different weakness than one appearing at 30%.


03 — Coaching: Why Claude, and How We Constrain It

The final stage — generating the actual coaching feedback — was the one we agonised over longest. Every piece of AI-generated fitness advice we had seen shared the same failure mode: it was generic. “Make sure to keep your back straight.” “Focus on your breathing.” Advice that could apply to anyone, because it was anchored to no one.

Our solution was to build a prompt architecture that makes generic advice structurally impossible. The system prompt passed to the model contains seven mandatory rules, and the most important is the first: every sentence must reference a specific number, rep, or metric from the data. It can’t say “your velocity was dropping” — it must say “your concentric velocity fell from 0.41 m/s on rep one to 0.22 m/s by rep five, a 46% loss.” It can’t say “your knees were caving” — it must cite the rep number and the pixel deviation measurement.

The prompt is also personalised in three directions. Anthropometrics — if we know the user has a high femur-to-torso ratio, the system tells the model explicitly that forward lean in their squat is structural, not a technique flaw. A coach who doesn’t know that would flag the wrong thing. Experience level changes the vocabulary: beginners get one simple cue in plain language, advanced athletes get full biomechanical terminology. Session history allows the system to track recurring issues — if knee valgus has shown up in three of the last five sessions, the coaching opens with “Again today…” and escalates urgency accordingly.

The trend questions feature — a second API call per session — generates two short coaching questions grounded in the session data. Not “how did you feel?” but “your velocity loss jumped from 12% last week to 31% today on the same weight — what changed in your sleep or food in the last 48 hours?”


04 — Why Forge

forge /fôrj/ · verb · To shape a metal object by heating it in a fire and beating or hammering it. To create by means of concentrated effort.

We went through a lot of names. Most of them were nouns — things you could hold or look at. “Velocity.” “Form.” “Kinetic.” They described features, not intent.

Forge is a verb first. It describes what the platform does to you — what every hard session is supposed to do. Metal doesn’t become useful by being stored. It becomes useful by being put under heat and worked. Repeatedly.

There’s also something honest about the forge as a metaphor for our engineering process. This system wasn’t designed in a whiteboard session and built cleanly from spec. It was hammered out. The Savitzky-Golay filter window that gives us clean velocity curves. The −8° pelvic tilt threshold for butt-wink that started at −5° and kept triggering on healthy movement patterns. The decision to trust the user’s rep count as the authoritative floor after the detection algorithm kept under-counting sets with uneven tempo. Every one of those decisions was forged, not designed.

And then there’s the obvious thing: every good set is a forge. The load, the effort, the time under tension — that’s the heat. What comes out on the other side is harder than what went in. We wanted a name that understood that. That took it seriously. That didn’t promise transformation with a stock photo and a tagline, but described, with one syllable, what was actually happening.


06 — What’s Next

The analysis quality is only as good as the pose detection underneath it.

We’re laying the groundwork for a research data programme. As the platform accumulates sessions, it builds something genuinely rare: a large-scale, rep-level dataset of real lifting with matched biomechanical measurements, injury flags, and velocity curves — collected outside a lab, on real athletes, under real fatigue. We plan to license anonymised versions of this data to institutional researchers, with the goals of validating and improving our own metrics and surfacing new ones. Better sticking-point models, more accurate failure-velocity thresholds, exercise-specific asymmetry norms — the kind of ground truth that currently doesn’t exist at scale. The research feeds back into the product, and every athlete using Forge contributes to a body of knowledge that makes the coaching smarter for everyone.

We’re also working on a model fine-tuned on barbell sport movement specifically — generic human pose estimation gets confused by loaded positions that deviate from “natural” posture. A person bracing a heavy deadlift looks different from a walking human, and the model needs to know that.

Multi-angle support is the other major near-term priority. Right now the system assumes a sagittal-plane camera position. A second camera at the frontal plane would unlock more reliable knee valgus measurement, hip shift detection, and bar path tracking in three dimensions.

On the coaching side, we want to build longitudinal trend analysis across full training blocks — not just session-to-session comparisons, but the ability to ask: over the last eight weeks of this programme, where has quality consistently dropped, and is it correlating with any loading pattern? That’s the question a good strength coach asks every month. It should be answerable with data.

The name will still fit when we get there. The work of forging is never really finished — it just produces something better than what went in.


Forge Engineering Team · Platform & Product Blog · June 2025

Tags: VBT · Computer Vision · Anthropic API · Biomechanics


0 Comments

No comments yet. Be the first to share your thoughts.

Leave a Comment

Comments are reviewed before appearing.