***This article was originally published on April 30, 2019***

For this project, we wanted to analyze the NBA players who have won the Most Valuable Player award, or the MVP award. Generally, this award is given to the best basketball player for a particular season.

Media votes at the end of the regular season on who wins the award. We wanted to get a better idea of what criteria and factors play a role in what makes a player an MVP, along with what the voters consider to be the "most valuable."

Data Collection

For our data source, we chose Sports Reference. Specifically, we chose Basketball-Reference.com. They have all the statistics regarding the NBA and basketball, from team records, player stats, award winners, game results, and much more.

They had a page about the history of NBA MVP winners, and it was the perfect source to pull from to gather data and analyze.

The page featured info such as player info, age, and voting numbers. It also showed accompanying stats such as points, rebounds, assists, blocks, steals, shooting percentage, and more.

We first had to crawl for the data. We used a similar crawler tool from a class lecture (INFO-I369 Performance Analytics class taught at Indiana University Bloomington), which also used Basketball-Reference.com.

There, we were able to download the HTML page, which will allow us to go through the table and parse the data. The crawler downloads from the Basketball-Reference URL, and downloads as an HTML page called "nba-mvp."

#Downloading page data and HTML

from urllib.request import Request, urlopen

#NBA MVP award winners
##download  https://www.basketball-reference.com/awards/mvp.html                                                                                                           
url = 'https://www.basketball-reference.com/awards/mvp.html'
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urlopen(req).read()

##save file to disk                                                                                                                                                           
f = open('nba-mvp.html','w')

Along with the crawler, we had to find a way to download the URLs of all the NBA player pages from the list of MVP winners. We wanted to do this to be able to use player data and info to compare statistics.

To do this, we needed to go through the table and download the URLs from the column featuring the players' names. The names are all accompanied by an URL to their individual profile page on the Basketball-Reference website.

To start off, we had to read from the HTML file we downloaded earlier via the crawler. We then created an empty list to store the URLs of the players. We then went through the HTML and found all elements with "td" for the table.

From there, the next piece of code in the for loop is an if statement that looks only for the players with "NBA" in the table, since the table also features "ABA" MVP winners (a different basketball league at the time that eventually merged with the NBA).

After this, we were able to get through the column of the players and add the elements to the list. After this, we noticed that there were duplicates in the table since there were players with multiple MVP awards. The next step was to remove the duplicates.

The code we used was similar to an example in a lecture of downloading URLs via structure of the URL and for-loop, which also used Basketball-Reference. However, the structure of player URLs made it a bit more difficult.

The structure involved a string for the player's name along with a separate number ID. With this, we had to "re.findall" for both formats and combine them into a dictionary. With this, we are able to match the ID with the player string to create the URL structure and download the player pages.

#Script to download pages of the NBA players and store into a folder

from urllib.request import Request, urlopen
import time
import random
import re
from bs4 import BeautifulSoup

#from file
filein = 'nba-mvp.html'
soup = BeautifulSoup(open(filein), 'lxml')

#empty list for urls
player_urls = []

#Searching through table
entries = soup.find_all('tr', attrs={'class' : ''})

for entry in entries:
    #table extracting from contained td elements
    columns = entry.find_all('td')
    #Searching in NBA MVP winner table
    if len (columns)>3 and columns[0].get_text() == 'NBA':
        #Searching through column of player names contained with links
        player = columns[1]
        for p in player:
            #add players to list
#Remove duplicates from list (some players won award more than once)
seen = set()
players_no_duplicate = []
for item in player_urls:
    if item not in seen:
new_players = players_no_duplicate
str1 = ''.join(str(e) for e in new_players)

#HTML structure of player pages
mvp_players = re.findall('"/players/[a-z]/(.*?).html', str1)
mvp_players2 = re.findall('"/players/(.*?)/', str1)

#Combining lists into dictionary to make for loop work in order to write files
mvp_dict = dict(zip(mvp_players, mvp_players2))

#For loop saves HTML files of player's pages
for player, letter in mvp_dict.items():
    #Rest between parsing
    tmp = random.random()*5.0
    print ('Sleep for ', tmp, ' seconds')

    #URL structure for players
    url = 'https://www.basketball-reference.com/players/'+ str(letter) + '/' + str(player) +'.html'
    print ('Download from :', url)
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    html = urlopen(req).read()

    fileout = 'NBA-MVPs/'+ (player) +'.html'
    print ('Save to : ', fileout, '\n')

    #save file to disk
    f = open(fileout,'w')

Sleep for  3.4936360349325963  seconds
Download from : https://www.basketball-reference.com/players/a/abdulka01.html
Save to :  NBA-MVPs/abdulka01.html 

Sleep for  3.666410242661151  seconds
Download from : https://www.basketball-reference.com/players/c/cousybo01.html
Save to :  NBA-MVPs/cousybo01.html 

Sleep for  1.3745451959709787  seconds
Download from : https://www.basketball-reference.com/players/w/waltobi01.html
Save to :  NBA-MVPs/waltobi01.html 

Sleep for  3.995848551646914  seconds
Download from : https://www.basketball-reference.com/players/e/ervinju01.html
Save to :  NBA-MVPs/ervinju01.html 

Sleep for  3.8237503092385996  seconds
Download from : https://www.basketball-reference.com/players/r/reedwi01.html
Save to :  NBA-MVPs/reedwi01.html 

Sleep for  1.3029917173148626  seconds
Download from : https://www.basketball-reference.com/players/n/nowitdi01.html
Save to :  NBA-MVPs/nowitdi01.html 

Sleep for  3.283549197513305  seconds
Download from : https://www.basketball-reference.com/players/c/cowenda01.html
Save to :  NBA-MVPs/cowenda01.html 

Sleep for  0.7194799968133597  seconds
Download from : https://www.basketball-reference.com/players/i/iversal01.html
Save to :  NBA-MVPs/iversal01.html 

Sleep for  1.953568381269653  seconds
Download from : https://www.basketball-reference.com/players/r/rosede01.html
Save to :  NBA-MVPs/rosede01.html 

Sleep for  2.216694743687364  seconds
Download from : https://www.basketball-reference.com/players/b/birdla01.html
Save to :  NBA-MVPs/birdla01.html 

Sleep for  0.2951086570759498  seconds
Download from : https://www.basketball-reference.com/players/m/mcadobo01.html
Save to :  NBA-MVPs/mcadobo01.html 

Sleep for  2.0548438859024802  seconds
Download from : https://www.basketball-reference.com/players/c/curryst01.html
Save to :  NBA-MVPs/curryst01.html 

Sleep for  1.5588188715655682  seconds
Download from : https://www.basketball-reference.com/players/p/pettibo01.html
Save to :  NBA-MVPs/pettibo01.html 

Sleep for  4.47021686710008  seconds
Download from : https://www.basketball-reference.com/players/n/nashst01.html
Save to :  NBA-MVPs/nashst01.html 

Sleep for  0.3204173311522224  seconds
Download from : https://www.basketball-reference.com/players/d/duncati01.html
Save to :  NBA-MVPs/duncati01.html 

Sleep for  2.3266782946652045  seconds
Download from : https://www.basketball-reference.com/players/u/unselwe01.html
Save to :  NBA-MVPs/unselwe01.html 

Sleep for  3.984902952381843  seconds
Download from : https://www.basketball-reference.com/players/r/roberos01.html
Save to :  NBA-MVPs/roberos01.html 

Sleep for  0.4027238605708261  seconds
Download from : https://www.basketball-reference.com/players/r/russebi01.html
Save to :  NBA-MVPs/russebi01.html 

Sleep for  2.9228123790731297  seconds
Download from : https://www.basketball-reference.com/players/d/duranke01.html
Save to :  NBA-MVPs/duranke01.html 

Sleep for  4.130703470454606  seconds
Download from : https://www.basketball-reference.com/players/j/johnsma02.html
Save to :  NBA-MVPs/johnsma02.html 

Sleep for  0.32157602146052067  seconds
Download from : https://www.basketball-reference.com/players/m/malonmo01.html
Save to :  NBA-MVPs/malonmo01.html 

Sleep for  3.9674057026172194  seconds
Download from : https://www.basketball-reference.com/players/o/olajuha01.html
Save to :  NBA-MVPs/olajuha01.html 

Sleep for  2.342974653582497  seconds
Download from : https://www.basketball-reference.com/players/w/westbru01.html
Save to :  NBA-MVPs/westbru01.html 

Sleep for  0.11401640754318487  seconds
Download from : https://www.basketball-reference.com/players/j/jamesle01.html
Save to :  NBA-MVPs/jamesle01.html 

Sleep for  2.3864208976883443  seconds
Download from : https://www.basketball-reference.com/players/g/garneke01.html
Save to :  NBA-MVPs/garneke01.html 

Sleep for  1.4518620893393896  seconds
Download from : https://www.basketball-reference.com/players/h/hardeja01.html
Save to :  NBA-MVPs/hardeja01.html 

Sleep for  2.665840647859949  seconds
Download from : https://www.basketball-reference.com/players/m/malonka01.html
Save to :  NBA-MVPs/malonka01.html 

Sleep for  1.3516385494845196  seconds
Download from : https://www.basketball-reference.com/players/b/barklch01.html
Save to :  NBA-MVPs/barklch01.html 

Sleep for  2.632243605961218  seconds
Download from : https://www.basketball-reference.com/players/c/chambwi01.html
Save to :  NBA-MVPs/chambwi01.html 

Sleep for  1.3766649178931374  seconds
Download from : https://www.basketball-reference.com/players/r/robinda01.html
Save to :  NBA-MVPs/robinda01.html 

Sleep for  2.5778407697494288  seconds
Download from : https://www.basketball-reference.com/players/j/jordami01.html
Save to :  NBA-MVPs/jordami01.html 

Sleep for  2.2448373568042723  seconds
Download from : https://www.basketball-reference.com/players/b/bryanko01.html
Save to :  NBA-MVPs/bryanko01.html 

Sleep for  3.6874784692124765  seconds
Download from : https://www.basketball-reference.com/players/o/onealsh01.html
Save to :  NBA-MVPs/onealsh01.html 

The next step was parsing through the data. We used a similar example from class in going through the table, along with its columns and rows.

Using the beginning of the code from the example right above, we were able to go through the table and parse data. Along with this, we were able to recreate the table to analyze and download the data of the MVP players' statistics.



from bs4 import BeautifulSoup

#from file
filein = 'nba-mvp.html'
soup = BeautifulSoup(open(filein), 'lxml')

entries = soup.find_all('tr', attrs={'class' : ''})

for entry in entries:
    #print entry
    #table extracing from contained th and td elements
    seasons = entry.find_all('th')
    columns = entry.find_all('td')
    if len (columns)>3 and columns[0].get_text() == 'NBA':

        #Year MVP won
        year = seasons[0].get_text()
        #Team player was on
        team = columns[4].get_text()

        #NBA player's name
        player = columns[1].get_text()
        #Player age
        age = columns[3].get_text()
        #Games played
        games = columns[5].get_text()
        #Minutes per game
        minutes = columns[6].get_text()
        #Points per game
        points = columns[7].get_text()
        #Rebounds per game
        rebounds = columns[8].get_text()
        #assists per game
        assists = columns[9].get_text()
        #steals per game
        steals = columns[10].get_text()
        #blocks per game
        blocks = columns[11].get_text()
        #field goal percentage
        field_goal = columns[12].get_text()
        #3-point percentage
        three_point = columns[13].get_text()
        #Free throw percentage
        free_throw = columns[14].get_text()
        #win shares
        win_shares = columns[15].get_text()
        #win shares/48
        ws48 = columns[16].get_text()

        #table of winners and data
        tt = ''+year+'|:|'+team+'|:|'+player+'|:|'+age+'|:|'+games+'|:|'+minutes+'|:|'+points+'|:|'+rebounds+'|:|'+assists+'|:|'+steals+'|:|'+blocks+'|:|'+field_goal+'|:|'+three_point+'|:|'+free_throw+'|:|'+win_shares+'|:|'+ws48
        print (tt)

2017-18|:|HOU|:|James Harden|:|28|:|72|:|35.4|:|30.4|:|5.4|:|8.8|:|1.8|:|0.7|:|.449|:|.367|:|.858|:|15.4|:|.289 2016-17|:|OKC|:|Russell Westbrook|:|28|:|81|:|34.6|:|31.6|:|10.7|:|10.4|:|1.6|:|0.4|:|.425|:|.343|:|.845|:|13.1|:|.224 2015-16|:|GSW|:|Stephen Curry|:|27|:|79|:|34.2|:|30.1|:|5.4|:|6.7|:|2.1|:|0.2|:|.504|:|.454|:|.908|:|17.9|:|.318 2014-15|:|GSW|:|Stephen Curry|:|26|:|80|:|32.7|:|23.8|:|4.3|:|7.7|:|2.0|:|0.2|:|.487|:|.443|:|.914|:|15.7|:|.288 2013-14|:|OKC|:|Kevin Durant|:|25|:|81|:|38.5|:|32.0|:|7.4|:|5.5|:|1.3|:|0.7|:|.503|:|.391|:|.873|:|19.2|:|.295 2012-13|:|MIA|:|LeBron James|:|28|:|76|:|37.9|:|26.8|:|8.0|:|7.3|:|1.7|:|0.9|:|.565|:|.406|:|.753|:|19.3|:|.322 2011-12|:|MIA|:|LeBron James|:|27|:|62|:|37.5|:|27.1|:|7.9|:|6.2|:|1.9|:|0.8|:|.531|:|.362|:|.771|:|14.5|:|.298 2010-11|:|CHI|:|Derrick Rose|:|22|:|81|:|37.4|:|25.0|:|4.1|:|7.7|:|1.0|:|0.6|:|.445|:|.332|:|.858|:|13.1|:|.208 2009-10|:|CLE|:|LeBron James|:|25|:|76|:|39.0|:|29.7|:|7.3|:|8.6|:|1.6|:|1.0|:|.503|:|.333|:|.767|:|18.5|:|.299 2008-09|:|CLE|:|LeBron James|:|24|:|81|:|37.7|:|28.4|:|7.6|:|7.2|:|1.7|:|1.1|:|.489|:|.344|:|.780|:|20.3|:|.318 2007-08|:|LAL|:|Kobe Bryant|:|29|:|82|:|38.9|:|28.3|:|6.3|:|5.4|:|1.8|:|0.5|:|.459|:|.361|:|.840|:|13.8|:|.208 2006-07|:|DAL|:|Dirk Nowitzki|:|28|:|78|:|36.2|:|24.6|:|8.9|:|3.4|:|0.7|:|0.8|:|.502|:|.416|:|.904|:|16.3|:|.278 2005-06|:|PHO|:|Steve Nash|:|31|:|79|:|35.4|:|18.8|:|4.2|:|10.5|:|0.8|:|0.2|:|.512|:|.439|:|.921|:|12.4|:|.212 2004-05|:|PHO|:|Steve Nash|:|30|:|75|:|34.3|:|15.5|:|3.3|:|11.5|:|1.0|:|0.1|:|.502|:|.431|:|.887|:|10.9|:|.203 2003-04|:|MIN|:|Kevin Garnett|:|27|:|82|:|39.4|:|24.2|:|13.9|:|5.0|:|1.5|:|2.2|:|.499|:|.256|:|.791|:|18.3|:|.272 2002-03|:|SAS|:|Tim Duncan|:|26|:|81|:|39.3|:|23.3|:|12.9|:|3.9|:|0.7|:|2.9|:|.513|:|.273|:|.710|:|16.5|:|.248 2001-02|:|SAS|:|Tim Duncan|:|25|:|82|:|40.6|:|25.5|:|12.7|:|3.7|:|0.7|:|2.5|:|.508|:|.100|:|.799|:|17.8|:|.257 2000-01|:|PHI|:|Allen Iverson|:|25|:|71|:|42.0|:|31.1|:|3.8|:|4.6|:|2.5|:|0.3|:|.420|:|.320|:|.814|:|11.8|:|.190 1999-00|:|LAL|:|Shaquille O'Neal|:|27|:|79|:|40.0|:|29.7|:|13.6|:|3.8|:|0.5|:|3.0|:|.574|:|.000|:|.524|:|18.6|:|.283 1998-99|:|UTA|:|Karl Malone|:|35|:|49|:|37.4|:|23.8|:|9.4|:|4.1|:|1.3|:|0.6|:|.493|:|.000|:|.788|:|9.6|:|.252 1997-98|:|CHI|:|Michael Jordan|:|34|:|82|:|38.8|:|28.7|:|5.8|:|3.5|:|1.7|:|0.5|:|.465|:|.238|:|.784|:|15.8|:|.238 1996-97|:|UTA|:|Karl Malone|:|33|:|82|:|36.6|:|27.4|:|9.9|:|4.5|:|1.4|:|0.6|:|.550|:|.000|:|.755|:|16.7|:|.268 1995-96|:|CHI|:|Michael Jordan|:|32|:|82|:|37.7|:|30.4|:|6.6|:|4.3|:|2.2|:|0.5|:|.495|:|.427|:|.834|:|20.4|:|.317 1994-95|:|SAS|:|David Robinson|:|29|:|81|:|38.0|:|27.6|:|10.8|:|2.9|:|1.7|:|3.2|:|.530|:|.300|:|.774|:|17.5|:|.273 1993-94|:|HOU|:|Hakeem Olajuwon|:|31|:|80|:|41.0|:|27.3|:|11.9|:|3.6|:|1.6|:|3.7|:|.528|:|.421|:|.716|:|14.3|:|.210 1992-93|:|PHO|:|Charles Barkley|:|29|:|76|:|37.6|:|25.6|:|12.2|:|5.1|:|1.6|:|1.0|:|.520|:|.305|:|.765|:|14.4|:|.242 1991-92|:|CHI|:|Michael Jordan|:|28|:|80|:|38.8|:|30.1|:|6.4|:|6.1|:|2.3|:|0.9|:|.519|:|.270|:|.832|:|17.7|:|.274 1990-91|:|CHI|:|Michael Jordan|:|27|:|82|:|37.0|:|31.5|:|6.0|:|5.5|:|2.7|:|1.0|:|.539|:|.312|:|.851|:|20.3|:|.321 1989-90|:|LAL|:|Magic Johnson|:|30|:|79|:|37.2|:|22.3|:|6.6|:|11.5|:|1.7|:|0.4|:|.480|:|.384|:|.890|:|16.5|:|.270 1988-89|:|LAL|:|Magic Johnson|:|29|:|77|:|37.5|:|22.5|:|7.9|:|12.8|:|1.8|:|0.3|:|.509|:|.314|:|.911|:|16.1|:|.267 1987-88|:|CHI|:|Michael Jordan|:|24|:|82|:|40.4|:|35.0|:|5.5|:|5.9|:|3.2|:|1.6|:|.535|:|.132|:|.841|:|21.2|:|.308 1986-87|:|LAL|:|Magic Johnson|:|27|:|80|:|36.3|:|23.9|:|6.3|:|12.2|:|1.7|:|0.5|:|.522|:|.205|:|.848|:|15.9|:|.263 1985-86|:|BOS|:|Larry Bird|:|29|:|82|:|38.0|:|25.8|:|9.8|:|6.8|:|2.0|:|0.6|:|.496|:|.423|:|.896|:|15.8|:|.244 1984-85|:|BOS|:|Larry Bird|:|28|:|80|:|39.5|:|28.7|:|10.5|:|6.6|:|1.6|:|1.2|:|.522|:|.427|:|.882|:|15.7|:|.238 1983-84|:|BOS|:|Larry Bird|:|27|:|79|:|38.3|:|24.2|:|10.1|:|6.6|:|1.8|:|0.9|:|.492|:|.247|:|.888|:|13.6|:|.215 1982-83|:|PHI|:|Moses Malone|:|27|:|78|:|37.5|:|24.5|:|15.3|:|1.3|:|1.1|:|2.0|:|.501|:|.000|:|.761|:|15.1|:|.248 1981-82|:|HOU|:|Moses Malone|:|26|:|81|:|42.0|:|31.1|:|14.7|:|1.8|:|0.9|:|1.5|:|.519|:|.000|:|.762|:|15.4|:|.218 1980-81|:|PHI|:|Julius Erving|:|30|:|82|:|35.0|:|24.6|:|8.0|:|4.4|:|2.1|:|1.8|:|.521|:|.222|:|.787|:|13.8|:|.231 1979-80|:|LAL|:|Kareem Abdul-Jabbar|:|32|:|82|:|38.3|:|24.8|:|10.8|:|4.5|:|1.0|:|3.4|:|.604|:|.000|:|.765|:|14.8|:|.227 1978-79|:|HOU|:|Moses Malone|:|23|:|82|:|41.3|:|24.8|:|17.6|:|1.8|:|1.0|:|1.5|:|.540|:||:|.739|:|14.1|:|.200 1977-78|:|POR|:|Bill Walton|:|25|:|58|:|33.3|:|18.9|:|13.2|:|5.0|:|1.0|:|2.5|:|.522|:||:|.720|:|8.4|:|.209 1976-77|:|LAL|:|Kareem Abdul-Jabbar|:|29|:|82|:|36.8|:|26.2|:|13.3|:|3.9|:|1.2|:|3.2|:|.579|:||:|.701|:|17.8|:|.283 1975-76|:|LAL|:|Kareem Abdul-Jabbar|:|28|:|82|:|41.2|:|27.7|:|16.9|:|5.0|:|1.5|:|4.1|:|.529|:||:|.703|:|17.0|:|.242 1974-75|:|BUF|:|Bob McAdoo|:|23|:|82|:|43.2|:|34.5|:|14.1|:|2.2|:|1.1|:|2.1|:|.512|:||:|.805|:|17.8|:|.242 1973-74|:|MIL|:|Kareem Abdul-Jabbar|:|26|:|81|:|43.8|:|27.0|:|14.5|:|4.8|:|1.4|:|3.5|:|.539|:||:|.702|:|18.4|:|.250 1972-73|:|BOS|:|Dave Cowens|:|24|:|82|:|41.8|:|20.5|:|16.2|:|4.1|:||:||:|.452|:||:|.779|:|12.0|:|.168 1971-72|:|MIL|:|Kareem Abdul-Jabbar|:|24|:|81|:|44.2|:|34.8|:|16.6|:|4.6|:||:||:|.574|:||:|.689|:|25.4|:|.340 1970-71|:|MIL|:|Kareem Abdul-Jabbar|:|23|:|82|:|40.1|:|31.7|:|16.0|:|3.3|:||:||:|.577|:||:|.690|:|22.3|:|.326 1969-70|:|NYK|:|Willis Reed|:|27|:|81|:|38.1|:|21.7|:|13.9|:|2.0|:||:||:|.507|:||:|.756|:|14.6|:|.227 1968-69|:|BAL|:|Wes Unseld|:|22|:|82|:|36.2|:|13.8|:|18.2|:|2.6|:||:||:|.476|:||:|.605|:|10.8|:|.175 1967-68|:|PHI|:|Wilt Chamberlain|:|31|:|82|:|46.8|:|24.3|:|23.8|:|8.6|:||:||:|.595|:||:|.380|:|20.4|:|.255 1966-67|:|PHI|:|Wilt Chamberlain|:|30|:|81|:|45.5|:|24.1|:|24.2|:|7.8|:||:||:|.683|:||:|.441|:|21.9|:|.285 1965-66|:|PHI|:|Wilt Chamberlain|:|29|:|79|:|47.3|:|33.5|:|24.6|:|5.2|:||:||:|.540|:||:|.513|:|21.4|:|.275 1964-65|:|BOS|:|Bill Russell|:|30|:|78|:|44.4|:|14.1|:|24.1|:|5.3|:||:||:|.438|:||:|.573|:|16.9|:|.234 1963-64|:|CIN|:|Oscar Robertson|:|25|:|79|:|45.1|:|31.4|:|9.9|:|11.0|:||:||:|.483|:||:|.853|:|20.6|:|.278 1962-63|:|BOS|:|Bill Russell|:|28|:|78|:|44.9|:|16.8|:|23.6|:|4.5|:||:||:|.432|:||:|.555|:|13.5|:|.185 1961-62|:|BOS|:|Bill Russell|:|27|:|76|:|45.2|:|18.9|:|23.6|:|4.5|:||:||:|.457|:||:|.595|:|15.5|:|.217 1960-61|:|BOS|:|Bill Russell|:|26|:|78|:|44.3|:|16.9|:|23.9|:|3.4|:||:||:|.426|:||:|.550|:|13.0|:|.181 1959-60|:|PHW|:|Wilt Chamberlain|:|23|:|72|:|46.4|:|37.6|:|27.0|:|2.3|:||:||:|.461|:||:|.582|:|17.0|:|.245 1958-59|:|STL|:|Bob Pettit|:|26|:|72|:|39.9|:|29.2|:|16.4|:|3.1|:||:||:|.438|:||:|.759|:|14.8|:|.246 1957-58|:|BOS|:|Bill Russell|:|23|:|69|:|38.3|:|16.6|:|22.7|:|2.9|:||:||:|.442|:||:|.519|:|11.3|:|.206 1956-57|:|BOS|:|Bob Cousy|:|28|:|64|:|36.9|:|20.6|:|4.8|:|7.5|:||:||:|.378|:||:|.821|:|8.8|:|.178 1955-56|:|STL|:|Bob Pettit|:|23|:|72|:|38.8|:|25.7|:|16.2|:|2.6|:||:||:|.429|:||:|.736|:|13.8|:|.236

Data Analysis and Visualizations

After parsing and collecting data, we were now able to analyze and visualize it. We started off by creating two scatterplots to compare and analyze data about the MVP players.

We wanted to see if any two factors correlate in regards to the players and their performance, and if it has impact on winning the award.

To create the first scatter plot, we had to use and import matplotlib. From there, we created a function that allowed us to read data from a file and return it. This is through data files after parsing through the data in the previous code.

After this, we were able to obtain numbers such as alpha, beta, and correlation coefficient. We were then able to input the variables into the code used from the lecture for the scatterplot to produce the visualizations for analysis.

import matplotlib.pyplot as plt
%matplotlib inline 
import random
import numpy as np
from scipy import stats

def read_data (filein): year = [] stat = [] world_record = 0.0 with open(filein) as f: for line in f: if not line.startswith("#"): if int(line.strip().split(' ')[0]) and int(line.strip().split(' ')[0]): data = line.strip() data_col = data.split(' ') year.append(int(data_col[0])) stat.append(float(data_col[1])) else: world_record = float(line.strip()[1:]) return year, stat

Scatter Plot 1: Player's Age vs Games Played

#Games played in by player that season
filein = 'Data/nba_gp.dat'
year_gp, stat_gp = read_data (filein)

#Ages of NBA players during MVP season
filein = 'Data/nba_age.dat'
year_age, stat_age = read_data (filein)

#Games played print ('Results for Games Played:') slope_m, intercept_m, r_value_m, p_value_m, std_err_m = stats.linregress(year_gp, stat_gp) print ('alpha : ', intercept_m) print ('beta : ', slope_m) print ('correlation coefficient : ', r_value_m) #Age of player print ('Results for Player Ages:') slope_w, intercept_w, r_value_w, p_value_w, std_err_w = stats.linregress(year_age, stat_age) print ('alpha : ', intercept_w) print ('beta : ', slope_w) print ('correlation coefficient : ', r_value_w)
Results for Games Played: alpha : 43.869223630312334 beta : 0.017137096774193554 correlation coefficient : 0.05025295352158669 Results for Player Ages: alpha : -35.58696556579622 beta : 0.031634024577572965 correlation coefficient : 0.19713802712711728

This first scatter plot looks at the correlation between games played by the NBA players versus how old they are. Based on the scatterplot, the interpretation shows some of the MVP winners higher in age played fewer games.

However, there were some younger players who missed more or as many. Other factors that can affect the games played is how long the season is, as a couple of the NBA seasons had lockouts which led to shorter seasons and fewer games played.

From the interpretation, the correlation may not be strong enough to be a cause.

plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})
plt.rcParams['xtick.major.pad'] = 8
plt.rcParams['ytick.major.pad'] = 8

#Games played
plt.plot(year_gp,stat_gp, marker='o', color ='blue', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2050- 1950 / 10000.0))
best_fit_y = intercept_m + slope_m * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='blue', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

plt.plot(year_age,stat_age, marker='s', color ='red', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2052- 1950 / 10000.0))
best_fit_y = intercept_w + slope_w * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='red', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

plt.ylabel('Age / Games Played')


Scatter Plot 2: Points Per Game vs Field Goal Percentage

#Points per game by NBA player in MVP season
filein = 'Data/nba_pts.dat'
year_pts, stat_pts = read_data (filein)

# Field goal percentage of NBA player during MVP season
filein = 'Data/nba_fg.dat'
year_fg, stat_fg = read_data (filein)

#Points print ('Results for Games Played:') slope_m, intercept_m, r_value_m, p_value_m, std_err_m = stats.linregress(year_pts, stat_pts) print ('alpha : ', intercept_m) print ('beta : ', slope_m) print ('correlation coefficient : ', r_value_m) #Field goals print ('Results for Player Ages:') slope_w, intercept_w, r_value_w, p_value_w, std_err_w = stats.linregress(year_fg, stat_fg) print ('alpha : ', intercept_w) print ('beta : ', slope_w) print ('correlation coefficient : ', r_value_w)
Results for Games Played: alpha : -101.50359383000512 beta : 0.06416090629800307 correlation coefficient : 0.22211254723148693 Results for Player Ages: alpha : 20.190929019457297 beta : 0.015192972350230389 correlation coefficient : 0.0538293563809513

This second scatter plot takes a look at the correlation between how many points a player averaged during his MVP season versus the field goal percentage they had that season.

The field goal percentage shows how accurate a player is as a scorer. Seeing how field goal percentage correlates with points per game can give us a better idea of how NBA players are picked as MVPs, as these stats play a role into them winning the award.

It also shows us if or how efficient a player in correlation to how many points he scores per game. One player could score many points per game and have a poor field goal percentage, and another vice versa.

The chart shows with the line that average MVP winner averages in the mid-to-high 20s for points per game while hovering around 50% field goal percentage. It can give us an idea of what averages it can take to win MVP in the future.

plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})
plt.rcParams['xtick.major.pad'] = 8
plt.rcParams['ytick.major.pad'] = 8

plt.plot(year_pts,stat_pts, marker='o', color ='blue', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2050- 1950 / 10000.0))
best_fit_y = intercept_m + slope_m * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='blue', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

#Field goal
plt.plot(year_fg,stat_fg, marker='s', color ='red', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2052- 1950 / 10000.0))
best_fit_y = intercept_w + slope_w * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='red', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

#(Points per game / field goal percentage)
plt.ylabel('PPG / FG%')



For implementing this part, we utilized code from before in regards to the data files. This will be used to make it work with the code to create the distribution histograms. 

Utilizing points and games played stats with the "stat_age" and "stat_gp" variables allows us to see distributions and inferences for the NBA MVP player stats. 

Along with this, we also added a function that lets us measure the probability distribution. We also had code that allows us to produce these histograms and distribution visualizations.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  

##Measure probability distribution. computes also average
## value and standard deviation
def measure_probability_distribution (outcomes):
    average_value = 0.0
    variance = 0.0
    pdf = {}
    norm = 0.0
    ##count number of observations
    for x in outcomes:
        if x not in pdf:
            pdf[x] = 0.0
        pdf[x] += 1.0
        norm += 1.0
        average_value += x
        variance += x*x
    average_value /= norm
    variance /= norm
    variance = variance - average_value * average_value
    ##normalize pdf
    for x in pdf:
        pdf[x] /= norm
    return pdf, average_value, variance

Distribution 1: NBA Players' Ages

Our intepretation of this is that most NBA players who have won the MVP award have done so around their age 27-28 season. With this, we can expect the star players around this age as the most likely to win the award. 

The average age is around 27 years old. This makes sense as NBA players are believed to be in their prime around this age.

from scipy.stats import poisson

pdf, av, var = measure_probability_distribution (stat_age)

##visualize histogram

plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})
plt.rcParams['xtick.major.pad'] = 8
plt.rcParams['ytick.major.pad'] = 8

title = '$\\langle x \\rangle = ' + '% .2f' % av + ' \\quad \\sigma^2 = ' + '% .2f' % var + '$'
plt.title(title, fontsize = 20)

plt.xlabel('NBA players ages')
plt.ylabel('probability distribution')

##construct two lists for  visualization
x = []
Px = []
for q in pdf:

plt.bar(x, Px, color = 'red', align='center', alpha=0.5)

plt.plot(x, poisson.pmf(x, av), linestyle='-', linewidth=0.0,color='k', label='poisson pmf')


Distribution 2: Games Played 

Our interpretation of this distribution graph shows that players are expected to play the majority of the season if they want to be in contention for the NBA MVP award.

There are some outliers (likely due to lockout seasons when the NBA had only 50 games played or 66 games played during the regular season), but the majority of the NBA players have played more than 70 games.

The average is around 78 games played. There are 82 in a season, in which the majority of the NBA MVP winners have played. With this, playing the majority of all of the season is an important factor in winning.

from scipy.stats import poisson

pdf, av, var = measure_probability_distribution (stat_gp)

##visualize histogram

plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})
plt.rcParams['xtick.major.pad'] = 8
plt.rcParams['ytick.major.pad'] = 8

title = '$\\langle x \\rangle = ' + '% .2f' % av + ' \\quad \\sigma^2 = ' + '% .2f' % var + '$'
plt.title(title, fontsize = 20)

plt.xlabel('Games Played')
plt.ylabel('probability distribution')

##construct two lists for  visualization
x = []
Px = []
for q in pdf:

plt.bar(x, Px, color = 'red', align='center', alpha=0.5)

plt.plot(x, poisson.pmf(x, av), linestyle='-', linewidth=0.0,color='k', label='poisson pmf')



We can conclude from analysis and visualizations that there are factors that play a role in NBA players winning the MVP award, and what factors voters consider into picking who wins.

Many of the players average over 20 points per game at around 50 percent shooting. Along with this, the players are healthy and durable in that they must play the majority of the regular season. The majority of the winners did not miss a single game.

Along with this, expect most players to be around 27 or 28 years old, or what is considered their "prime years," when they win the award.

Another factor that we can use for future research is connecting with how many games a player's team won when they won the award. We can find the distribution of this and find the average number of wins an MVP has won with their team.

We can also use more of the player stats we downloaded when parsing their URLs and pages. We could compare other stats such as weight, height, and more.

Overall, we can expect that future NBA MVP winners are players in their prime years averaging over 20 points a game at around 50 percent shooting, who also end up playing majority or all of the season's games.

In additon to this, it's likely the player's team had a significant number of victories as well. This criterion can help shape who wins this year's NBA MVP award, which likely comes down to Milwaukee Bucks' Giannis Antetokounmpo and Houston Rockets' James Harden.


Giannis ended up winning the MVP award over Harden. Let's see how their stats compare and how it correlates with our results.
1Giannis Antetokounmpo2018-1924727232.810.017.3.5780.72.8.2569.314.5.641.5996.99.5.7292.210.312.
2James Harden2018-1929787836.810.824.5.4424.813.2.3686.011.3.528.5419.711.0.8790.
Giannis and Harden are both outliers in average age. Giannis is 24 while Harden is 29, which is outside the 27-28 average. However, you can argue Harden is in his prime currently, while Giannis may have yet to reach his prime (which is scary).

Giannis shot nearly 58% from the field while averaging about 28 points per game, compared to Harden's 44% shooting on 36 points per game. Harden averaged many more points per game, but Giannis was way more efficient.

This falls in line with most of the MVP winners averaging over 20 points per game with 50% or better shooting, like Giannis. Harden did exceed the points part, but was below-average in shooting percentage compared to the MVP winners.

Harden, however, did beat out Giannis in games played. Harden fell right at the average, while Giannis was slightly below the average. What helped Giannis is that he averaged more rebounds, close to the assist mark, and was more efficient.

We'll see how these stats and trends compare for future NBA MVP award winners and voting. 

Shop Now