頁籤選單縮合
| 題 名 | 應用機器學習方法分析中華職棒投手之期望失壘表現=Analysis of Pitchers' "Expected-Base-Loss" in the Chinese Professional Baseball League Using Machine Learning Methods |
|---|---|
| 作 者 | 黃海潮; 陳郁蕙; 王瓊霞; 黃致豪; | 書刊名 | 臺灣體育運動管理學報 |
| 卷 期 | 24:2 2024.12[民113.12] |
| 頁 次 | 頁251-288 |
| 分類號 | 528.955 |
| 關鍵詞 | 投手表現; 棒球; 中華職棒; 運動經濟學; Pitching performance; Baseball; CPBL; Sports economics; |
| 語 文 | 中文(Chinese) |
| DOI | 10.6547/tassm.202412_24(2).0003 |
| 中文摘要 | 目的:在職業運動的舞台上,總是希望找到一些數據去衡量球員能力,過去常用 自責分率 (Earned Run Average,ERA) 去衡量投手的壓制能力,但傳統上的這些指標 並無法反應投手全面的表現,因此本研究將運用記錄完整投球過程的數據計算新的進 階數據指標「期望失壘」衡量投手相對而言更完整的能力。希望能透過「機器學習」 演算法分析中華職棒 2021 年賽季中每一顆投球於投打對決中所產生的結果,分析投 手的投球品質,進而找出最佳模型來預測投手之「期望失壘」,本研究亦希望運用「期 望失壘」指標與其他既存指標相互比較 (如:ERA、FIP、WHIP 等),並討論彼此之異 同。此外,本研究亦將「期望失壘」與球員薪資一同進行討論,探討找出中華職棒投 手薪水是否能對應選手之比賽表現。方法:本研究使用國立臺灣體育運動大學所蒐集 中華職棒第32 年的球季棒球紀錄程式之資料,共計 296 場,分別運用線性迴歸模型、 決策樹迴歸模型、隨機森林迴歸模型去建立對投球品質的預測模型,並從中決定最佳 預測模型。接著,本研究將再使用此最佳預測模型計算每位投手之投球品質指標 (RMSE 及MAE),並以運用關聯分析與現有傳統或進階數據及薪資進行比較,提供球 團進行合約訂定之參考。結果:本研究使用了三個常見的機器學習模型,分別為線性 迴歸模型、迴歸樹模型與隨機森林迴歸模型,比較模型各自之RMSE 及 MAE 預測指 標。而計算之結果不論從 RMSE 或是MAE 的角度來看此三模型之預測效果,皆是隨 機森林迴歸的表現最好,其次為迴歸樹,最後才是線性迴歸模型。也顯示出相對隨機 森林複雜之模型確實有較好的預測成效,但相較於線性迴歸的可解釋性與直觀性,較 複雜的模型若無更明顯的優勢,在實務上或許會相對難以應用。結論:期望失壘的估 計與傳統上常使用的ERA、WHIP 或進階一點的FIP 皆有明確的正相關,而與 ERA+ 則是存在負相關的關係,這顯示出期望失壘也同樣能夠衡量投手在場上的表現,並且 能與其他投手進行比較。此外,本文也指出較年輕的球員的薪資也許被低估,未來球 團需要將菜鳥紅利 (Rookie bonus) 納入考量。 創新性:1.透過機器學習演算法所求出之投球品質指標更能標準化各投手在不同形況 下的投球表現。2.可透過演算法得知比賽中最能影響投球表現的關鍵因素。3.運用演 算法結果討論投手能力與薪資的關聯。 |
| 英文摘要 | Purpose: Traditionally, the performance of a professional baseball pitcher is evaluated on the basis of earned run average (ERA); however, this index cannot capture the full ability of a pitcher. Therefore, we developed a new index called the “expected-base-loss” and compared its evaluating potential with that of some existing indices, such as ERA, ERA+, walks plus hits per inning pitched, fielding independent pitching, and salary. These results can guide club managers in the Chinese Professional Baseball League (CPBL)to offer a better and optimal contract to baseball players. Methods: A total of 296 regular games from the 32nd season of theCPBL were analyzed to build a model of “expected-base-loss” that incorporated pitch speed, pitch types, pitching location, pitcher (batter) handedness, and the current count. We compared the prediction performance of and variables involved in the linear, decision tree, and random forest regression models. Finally, the random forest regression model was chosen to predict the “expected-base-loss.” Results: When a pitcher reaches a three-ball count, the “expected-base-loss” becomes a significant factor. Pitching location and pitch speed are significant factors in this situation as well; therefore, a pitcher should avoid pitching another “ball.” “Pitching command” is the most criticalfactor that influences the pitching performance. The “expected-base-loss” predictions for pitchers in the CPBL revealed that foreign players performed better than local players, which explains the higher demand for foreign players. “Expected-base-loss” records all the pitching factors influencing each pitch; therefore, it can complement the other pitching indices well. Conclusion:We discovered that “expected-base-loss” was negatively correlated with salaries of local pitchers. We also noted that young pitchers were underpaid relative to their performance. We suggest that teams in the CPBL cultivate young and new playersto benefit from the “rookie bonus.” Originality/value: 1.The pitching performance in every moment of a game could be standardized using machine learning algorithms. 2.Critical variables influencing pitching performance were identified. 3.Algorithms were used to elucidate the relationship between pitchers’ ability and salaries. |
本系統中英文摘要資訊取自各篇刊載內容。