Fielding Independent Hitting
Does applying DIPS theory to hitting work like it does for pitching?
Last week, we discussed how hitting for power is among the more predictable traits for batters. Along with strikeouts and walks, home runs complete the defense-independent “three true outcome” triangle, and as a result, the trifecta of hitter predictability. Meanwhile, base hits have to wait a good bit longer to “stabilize,” so we can’t be as confident in a hitter’s ability to reliably get singles, doubles, and triples at the rate they’ve shown this early in the season.
Some of this may sound familiar if you’re privy to modern pitching analytics. Voros McCracken—whose real name remains a mystery—spearheaded the sabermetric breakthrough known as DIPS theory (Defense Independent Pitching Statistics). This theory posits that pitchers have much less control over balls in play than initially suspected, and that removing such data from the equation can clarify a pitcher’s underlying talent level in smaller samples. FIP (Fielding Independent Pitching) is the most common iteration of DIPS nowadays. Its formula is:
FIP = (13×HR + 3×(BB+HBP) - 2×K)/IP + cFIP
Besides cFIP (which is just a constant to put FIP on an ERA scale), every component of FIP is an outcome that’s pretty much irrelevant to what the defense behind a pitcher does. Fangraphs’ WAR (fWAR) uses FIP as its key component, and as a result is more predictive and reliable than Baseball Reference’s WAR (bWAR), which uses runs allowed. If you wish to have an even more stable version of this idea, you can try xFIP (which normalizes HR to a standard rate), SIERA (which includes general batted ball data), or kwERA (which removes the unstable HR & HBP in favor of just K & BB).
My question then is: Can we apply the same DIPS concepts to hitters? (Should we?) Let’s find out.
High stabilization potential
As alluded to earlier, HRs and HBPs are not all that reliable for pitchers, with HRs taking ~1320 batters faced to stabilize and HBPs ~640. Given Ks (~70) and BBs (~170) stabilize much quicker, I naturally favor kwERA or xFIP early in a season to FIP.
However, for batters, HRs and HBPs are much more reliable, at ~170 PA and ~240 PA, respectively. The former makes intuitive sense considering batters have much more control over the type of batted ball than pitchers do. Whereas pitchers will face a slew of batters with varying tendencies, batters just have their set of tendencies. Add in the fact that home runs are defense-independent and you have the most reliable hit type for hitters. HBPs being more reliable for hitters than pitchers might make less sense on the surface, but comes from the same idea: Pitchers face various batting stances over time, whereas batters just have their one stance that determines how closely they put their bodies in danger to get plunked.
So, if we were to make a “Fielding Independent Hitting” stat, it could pretty much mirror FIP in its components, with the added benefit of all of its parts stabilizing relatively quickly.
The one quirk is that hitting stats don’t function on an ERA scale, with lower numbers being better. Instead, they function on triple slash line scales, with higher numbers being better. Enter wOBA (weighted On-Base Average), the premier hitting rate stat. The formulation for wOBA is powered by linear weights, which is the concept of applying run expectancy averages to every relevant plate appearance outcome, stripping away context in order to more accurately assess talent level.
Let’s make a fielding-independent wOBA, then.
FIwOBA
Since wOBA’s linear weights were derived through linear regression, it’s only right we do the same for FIwOBA. Only this time, we’ll regress our desired variables (HR, BB, HBP, K) to wOBA itself. Basically, given ONLY those four outcomes, how well and to what extent do they explain a hitter’s overall production?
Doing so on all 360 qualified hitters from 2021-2025 yields the following coefficients:
HR%: 1.51
BB%: .36
HBP%: .39
K%: -.21
Those numbers may not look familiar, but what if we scaled them all up, maybe by some random number like 8.6?
HR%: 1.51 × 8.6 = 13.0 ≈ 13
BB%: .36 × 8.6 = 3.1 ≈ 3
HBP%: .39 × 8.6 = 3.4 ≈ 3
K%: -.21 × 8.6 = -1.8 ≈ -2
Oh look, they’re FIP coefficients.
Mind blown! As it turns out, the approximate run values for the three (four) true outcomes for pitchers correspond the same way for hitters. Both FIP and FIwOBA ignore anything fielders could interfere with (besides the occasional home run robbery), and in turn draw the same conclusions about the importance of each outcome. This makes total sense of course, but discovering this after running the regression gave me a unique feeling of vindication most stats nerds can probably relate to. What Tango, McCracken, and Dreslough found nearly thirty years ago for pitchers, we’re still finding today for hitters.
How well do these outcomes explain wOBA, though? The R-squared value for this regression was .6798, meaning about 68% of the variance in wOBA can be explained by HR, BB, HBP, and K. That’s… not great, but it’s not bad! We’re completely ignoring base hits, yet we can still explain over two-thirds of a hitter’s overall production anyway. For reference, the R-squared between ERA and FIP for all qualified pitchers during the same time period is about 70%. Pretty close!
Now, would I recommend using FIwOBA as dependably as FIP since they explain their target metrics similarly well? Probably not. The generalized nature of batted balls pitchers face allows for more trust to be placed in FIP, since alterations from it are more likely to be random in small samples. On the other hand, hitters’ profiles are individualized, so the direction in which FIwOBA is wrong will be much more obvious to us. The direction in which FIP is wrong will not be, so it’s safer.
In any case, let’s see how hitters rank in FIwOBA so far this season:

Aaron Judge is indeed the best hitter in the league, and early-season FIwOBA agrees. The stat naturally favors sluggers like Schwarber and Moniak, and doesn’t like Matt Olson as much since it’s oblivious to the fact that he’s been a doubles machine so far. If we sort by w-FIw (the difference between wOBA and FIwOBA), the biggest “ball in play” overachievers so far have been Riley Greene (.094), Troy Johnston (.085), and Otto Lopez (.080), whereas the biggest BIP underachievers have been Cedric Mullins (-.091), Alec Bohm (-.072), and Caleb Durbin (-.068). Differences in the 2021-2025 sample were not nearly that drastic.
FIwOBA is not a better indicator of how productive a hitter has been this season—that’s wOBA’s job. Where FIwOBA is designed to outclass wOBA is in stability. Since FIwOBA is built only with highly reliable PA outcomes, I would not expect to see nearly as much year-over-year or within-year variation in it as opposed to wOBA, which necessarily anchors the muddy base hits along with it. I have not done the math on this, but I am nearly certain it is true.
Sorting that list by w-FIw is particularly illuminating for another reason. Let me just list off the BABIPs (Batting Average on Balls In Play) of the ten highest overachievers in order: .327, .291, .395, .287, .394, .354, .364, .367, .281, .436. League average BABIP is .288.
Now, the ten worst underachievers’ BABIPs: .170, .197, .193, .210, .171, .192, .204, .208, .198, .231.
Stark. The correlation between w-FIw and BABIP sits at an intense r = .7. This indicates that a huge chunk of what FIwOBA is missing lies within BABIP. Regressing wOBA on FIwOBA and BABIP applies the finishing touch:
wOBA ≈ 1.14×FIwOBA + 0.6×BABIP - .221; R-squared = .9805
98% of wOBA can be explained by FIwOBA (primarily) and BABIP (secondarily). That is huge. The math implies the remaining 2% is accounted for in the type of base hit (single/double/triple) and intentional walks.
What to do about BABIP
When FIwOBA is wrong, it’s largely because of BABIP. They negatively correlate, whereas wOBA and BABIP positively correlate. The problem with BABIP is that it is quite unreliable (~820 BIP)—thankfully not as much as for pitchers (~2000 BIP), but still substantially so. Yes, hitters can control it, but not consistently in just 150-200 PA. And yet, what FIwOBA misses is so obviously found within it. What to do?
Now, you may be content with FIwOBA as is, sprinkling in BABIP to taste. The regression formula from a couple paragraphs ago will describe wOBA better than anything else I’ll present here, but will be more unstable and less predictive. If you’re interested in trying to capture some of BABIP while retaining predictiveness, read on.
While researching which metrics best predict BABIP, I stumbled across this helpful article from Pitcher List that digs deep into various hitting stats and how they correlate. Key insights I learned from that piece were that the angle at which the ball is struck is more important to BABIP than how well it’s struck. Line drives are best for boosting BABIP, but are much less reliable (~600 BIP) than ground balls and fly balls (~80 BIP). Darn. For the purposes of keeping our stat predictive, LD% unfortunately does not belong. Whether the ball is pulled or pushed (“spray angle”) also impacts BABIP a lot, and those tendencies thankfully stabilize expeditiously.
That left me with wanting to include one variable for where hitters send balls on the y-axis (up or down) and one for where they send them on the x-axis (pull or push). Through some testing (and with confirmation from that article), FB% appeared more effective as a proxy for the y-axis component than GB%, and including both would be redundant, so I chose FB%. Similarly, I found Oppo% (push frequency) to fit slightly better than Pull% in that it produced a more defined direction in the regression.
The last part of this process involved separating FB% into outfield FB% (OFFB%) and infield FB% (IFFB%)—incorporating some z-axis. I decided to do this because the average BABIP of an IFFB is so much worse than that of an OFFB, and so delineating between them seemed appropriate. This is potentially controversial though as IFFB% is moderately less reliable in small samples than FB%. Theoretically, this separation slightly dampens predictiveness while slightly improving descriptiveness. At least IFFB% stabilizes before season’s end, unlike BABIP.
Including IFFB%, OFFB%, and Oppo% in the regression yielded the following coefficients, which have been multiplied by 5 for ease of viewing:
HR%: 10.0
BB%: 1.7
HBP%: 3.0
K%: -1.1
IFFB%: -1.0
OFFB%: -0.5
Oppo%: 0.6
Relationships between the original four outcomes remain relatively consistent, though the benefit of walks is dampened by the new inclusions. IFFBs were found to be basically as detrimental as Ks, which tracks with their near-zero BABIP and their inclusion as effectively strikeouts in ifFIP, the version of FIP used in fWAR.
I’m calling this new metric bbFIwOBA, which is potentially the busiest acronym in sabermetrics history (take that, GBkwERA!). Whereas FIwOBA explained 68% of the variance in wOBA, bbFIwOBA explains 80% (generalizing with FB% would’ve had it explain 79%, so very slight). That’s a huge improvement! And the best part is that predictiveness likely doesn’t take much of a hit in the transition to the batted ball version, though if I later find that it does I will pivot back to using just FB%.
Here’s how hitters rank in bbFIwOBA so far this season:

In terms of the difference between wOBA and bbFIwOBA (w-bbFIw), there’s still some correlation with BABIP due to early season instability, but it’s subdued. In the five-year sample, wOBA and BABIP were positively correlated (r = .37), FIwOBA and BABIP were slightly negatively correlated (r = -.21), and bbFIwOBA and BABIP were virtually not correlated (r = .07). Personally, that’s exactly what I want to see. wOBA has the balls in play, retaining noise and all; FIwOBA removes them entirely, missing potentially valuable noise; and bbFIwOBA re-introduces the parts of them that are more reliable, balancing predictiveness with descriptiveness.
Another way of conveying this point is correlating the differences with BABIP. In the five-year sample, BABIP explained 89.9% of the variance in w-FIw, but only 47.2% of the variance in w-bbFIw, meaning a hefty portion of BABIP is reliably expressed through FB% and Oppo%. Furthermore, the standard deviation in w-bbFIw (0.0116) is noticeably smaller than in w-FIw (0.0147), indicating greater stability between wOBA and bbFIwOBA.
What about xwOBA?
The elephant in the room right now is xwOBA. If we already have that all-in-one, defense-independent Statcast metric that’s meant to be more predictive and stable than wOBA, why bother with FIwOBA and bbFIwOBA?
The primary answer is that I wanted to try. In the age of Statcast where scores of metrics are powered by advanced physics and technology, gleaning valuable information from data that humans can feasibly track means something to me.
Another answer is that xwOBA isn’t crafted with solely prediction in mind. The public formula is unknown, but from what we do know, all batted ball exit velocities and launch angles are included to determine the BIP portion of it. While accurate, this leaves it susceptible to unsustainable batted balls. bbFIwOBA and especially FIwOBA are specifically tailored to ignore such outcomes for the sake of predictiveness.
We see this play out when we run correlations with xwOBA in the five-year sample. Among those 360 qualified hitters, xwOBA explained 65.6% of the variance in wOBA (hang on—didn’t FIwOBA explain 68%?). Also for what it’s worth, xwOBA explained 68.6% of FIwOBA. The entree, though, is its correlation with bbFIwOBA, for which it explained 81.2%.
So… bbFIwOBA is better than xwOBA at explaining wOBA, AND it’s better than wOBA at staying true to xwOBA. It’s basically a less sophisticated SIERA.
Conclusion
I will definitely be returning to this concept in the near future, but I think I should stop for now. There has already been so much to analyze, and amazingly there’s much more still. Some avenues that have crossed my mind include replacing BBs and Ks with even more stable swing decision metrics, actually analyzing year-over-year and within-year predictiveness of FIwOBA and bbFIwOBA (which I simply didn’t have time for), and introducing sprint speed into the equation (though that’s a Statcast exclusive). I will explore at least some of these soon.
Overall though, I did not expect this defense-independent approach to work as well as it did for hitters, so I am pleasantly surprised. I hope amidst the sea of goofy acronyms that this was a fun and worthwhile read!
