Determining Rookie Years in Lahman Database
I'm using the MySQL version of the Lahman Baseball Databa开发者_JS百科se and I'm having trouble trying to determine the year a player lost their rookie standing. The rules for an MLB player losing rookie standing are:
A player shall be considered a rookie unless, during a previous season or seasons, he has (a) exceeded 130 at-bats or 50 innings pitched in the Major Leagues; or (b) accumulated more than 45 days on the active roster of a Major League club or clubs during the period of 25-player limit (excluding time in the military service and time on the disabled list).
Is there a query that can be run to do this for Batters and Pitchers, or is this something that would be programmatically done?
Using the Lahman Database you can figure out Rookies by At Bats (>130) and Innings Pitched (>50), however there isn't anything for service time during the 25 man roster (non-Sept) limit.
You would need retrosheets {http://www.retrosheet.org/game.htm} data to do that.
The queries below would give you ALL of the rookies by At Bats and Innings Pitched, however the service time rookies would be the exception. There's only a few of those as teams don't tend to keep rookies on the MLB roster and not play them. The lose development time (not playing) and accelerate their service time to lose out on controlled years. So if you're happy with that, these tables will do.
You can use this as a Xref table with batters or pitchers to highlight their rookie year. Or you could add an extra column to batters and pitchers with the RookieYr distinction (advise against it as if you want to add new seasons to your Lahman DB - less customizing needed).
/************************************ Create MLB Rookie Xref Table **********************************************
-- Sort Out Batters who accumulate 130 AB
-- Sort Out Pitchers who accumulate 50 IP
-- Define Rookie Year, Drop off years previous and years after
-- Can be updated Annually using "player ID not in (select distinct playerID from Xref_RookieYr)
-- Using the Sean Lahman Database
-- Authored By Paul DeVos {www.linkedin.com/in/devosp/}
*****************************************************************************************************************/
/****** Query uses T-SQL, Query ran in MS SQL 2012 - you may need to tweek for other platorms or versions. ******/
--Step 1 - Run this for hitter accumulated ABs and when Rookie Year (130 Career At Bats)
Select
concat(m.nameFirst, ' ', m.nameLast) as Name,
b.PlayerID,
b.yearID,
m.debut,
sum(b.ab) over (partition by b.playerID order by b.playerID, b.yearID) as CumulativeAB,
null as CumulativeIP, -- Place Holder for Rookie Pitchers Insert
case when sum(b.ab) over (partition by b.playerID order by b.playerID, b.yearID) >= 130 then b.yearID end as RookieYR
into #temp_rookie_year
from
[master] m
inner join Batting b
on m.playerID=b.playerID
-- Selects Position Players
where b.playerID not in (select distinct f.playerID from Fielding f where f.pos = 'P')
--Step 2 - Run this to get accumulated IP and Rookie Year (50 Career IP)
Insert into #temp_rookie_year
(
Name, PlayerID, YearID, Debut, CumulativeAB, CumulativeIP, RookieYR
)
Select
concat(m.nameFirst, ' ', m.nameLast) as Name,
p.PlayerID,
p.yearID,
m.debut,
null as CumulativeAB,
sum(p.IPouts) over (partition by p.playerID order by p.playerID, p.yearID) as CumulativeIP,
case when sum(p.IPouts) over (partition by p.playerID order by p.playerID, p.yearID) >= 150 then p.yearID end as RookieYR
from [master] m
inner join pitching p
on m.playerID=p.playerID
--Chooses Pitchers
where p.playerID in (select distinct f.playerID from Fielding f where f.pos = 'P')
--Step 3 Run this - sorts out the rookie year into Rookie Xref Table
select Name, PlayerID, min(RookieYr) as RookieYear
into #Xref_RookieYr
from #temp_rookie_year
--where name = 'Hank Aaron'
group by Name, PlayerID
order by RookieYear desc
--Step 4 - run IF you want to remove players who never lost rookie status (cup of cofee players, etc - anyone under 130 AB or 50 IP)
select * from #Xref_RookieYr
order by playerID
Delete from #Xref_RookieYr where RookieYear is null
select * from #Xref_RookieYr
order by playerID
/*****************************************************************************************************************
You can change drop the "#" in front of the table (and name it whatever you want) when you want a permanent table.
If you leave it, it'll drop off when you close the program. e.g. Xref_Rookie_2013
*****************************************************************************************************************/
This can be done in SQL. How it is done will be based upon what is the most optimal way of doing it. Most likely it could be done with one query like so (pseudo-code):
SELECT Master.*
FROM Master
LEFT JOIN Batting ON Master.player_id = Batting.player_id
LEFT JOIN Pitching ON Master.player_id = Pitching.player_id
WHERE Batting.AB > 130 OR Pitching.IPOuts > (50 x 3)
OR Master.DaysActive > 45
That last part of the WHERE statement is a bit iffy because I don't find anything like that in the data from your database provider. I see active games but that isn't the same thing. The Appearances table might get you close but that is about all you can do.
Here is the data I based my pseudo-code off of:
http://baseball1.com/files/database/readme58.txt
I did find another guy who was doing something similar to what you are doing (including calculating who is a rookie). Here is his site (with code):
http://baseballsimulator.com/blog/category/database/
精彩评论