-->

For this project, we wanted to analyze the NBA players who have won the Most Valuable Player award, or the MVP award. Generally, this award is given to the best basketball player for a particular season.

Media votes at the end of the regular season on who wins the award. We wanted to get a better idea of what criteria and factors play a role in what makes a player an MVP, along with what the voters consider to be the "most valuable."

### Data Collection

For our data source, we chose Sports Reference. Specifically, we chose Basketball-Reference.com. They have all the statistics regarding the NBA and basketball, from team records, player stats, award winners, game results, and much more.

They had a page about the history of NBA MVP winners, and it was the perfect source to pull from to gather data and analyze.

The page featured info such as player info, age, and voting numbers. It also showed accompanying stats such as points, rebounds, assists, blocks, steals, shooting percentage, and more.

We first had to crawl for the data. We used a similar crawler tool from a class lecture (INFO-I369 Performance Analytics class taught at Indiana University Bloomington), which also used Basketball-Reference.com.

#Downloading page data and HTML

from urllib.request import Request, urlopen

#NBA MVP award winners

##save file to disk
f = open('nba-mvp.html','w')
f.write(html.decode('utf-8'))
f.close()


Along with the crawler, we had to find a way to download the URLs of all the NBA player pages from the list of MVP winners. We wanted to do this to be able to use player data and info to compare statistics.

To do this, we needed to go through the table and download the URLs from the column featuring the players' names. The names are all accompanied by an URL to their individual profile page on the Basketball-Reference website.

To start off, we had to read from the HTML file we downloaded earlier via the crawler. We then created an empty list to store the URLs of the players. We then went through the HTML and found all elements with "td" for the table.

From there, the next piece of code in the for loop is an if statement that looks only for the players with "NBA" in the table, since the table also features "ABA" MVP winners (a different basketball league at the time that eventually merged with the NBA).

After this, we were able to get through the column of the players and add the elements to the list. After this, we noticed that there were duplicates in the table since there were players with multiple MVP awards. The next step was to remove the duplicates.

The code we used was similar to an example in a lecture of downloading URLs via structure of the URL and for-loop, which also used Basketball-Reference. However, the structure of player URLs made it a bit more difficult.

The structure involved a string for the player's name along with a separate number ID. With this, we had to "re.findall" for both formats and combine them into a dictionary. With this, we are able to match the ID with the player string to create the URL structure and download the player pages.

#Script to download pages of the NBA players and store into a folder

from urllib.request import Request, urlopen
import time
import random
import re
from bs4 import BeautifulSoup

#from file
filein = 'nba-mvp.html'
soup = BeautifulSoup(open(filein), 'lxml')

#empty list for urls
player_urls = []

#Searching through table
entries = soup.find_all('tr', attrs={'class' : ''})

for entry in entries:

#table extracting from contained td elements
columns = entry.find_all('td')

#Searching in NBA MVP winner table
if len (columns)>3 and columns[0].get_text() == 'NBA':

#Searching through column of player names contained with links
player = columns[1]
for p in player:
player_urls.append(p)

#Remove duplicates from list (some players won award more than once)
seen = set()
players_no_duplicate = []
for item in player_urls:
if item not in seen:
players_no_duplicate.append(item)

new_players = players_no_duplicate
str1 = ''.join(str(e) for e in new_players)

#HTML structure of player pages
mvp_players = re.findall('"/players/[a-z]/(.*?).html', str1)
mvp_players2 = re.findall('"/players/(.*?)/', str1)

#Combining lists into dictionary to make for loop work in order to write files
mvp_dict = dict(zip(mvp_players, mvp_players2))

#For loop saves HTML files of player's pages
for player, letter in mvp_dict.items():

#Rest between parsing
tmp = random.random()*5.0
print ('Sleep for ', tmp, ' seconds')
time.sleep(tmp)

#URL structure for players
url = 'https://www.basketball-reference.com/players/'+ str(letter) + '/' + str(player) +'.html'

fileout = 'NBA-MVPs/'+ (player) +'.html'
print ('Save to : ', fileout, '\n')

#save file to disk
f = open(fileout,'w')
f.write(html.decode('utf-8'))
f.close()


Sleep for  3.4936360349325963  seconds
Save to :  NBA-MVPs/abdulka01.html

Sleep for  3.666410242661151  seconds
Save to :  NBA-MVPs/cousybo01.html

Sleep for  1.3745451959709787  seconds
Save to :  NBA-MVPs/waltobi01.html

Sleep for  3.995848551646914  seconds
Save to :  NBA-MVPs/ervinju01.html

Sleep for  3.8237503092385996  seconds
Save to :  NBA-MVPs/reedwi01.html

Sleep for  1.3029917173148626  seconds
Save to :  NBA-MVPs/nowitdi01.html

Sleep for  3.283549197513305  seconds
Save to :  NBA-MVPs/cowenda01.html

Sleep for  0.7194799968133597  seconds
Save to :  NBA-MVPs/iversal01.html

Sleep for  1.953568381269653  seconds
Save to :  NBA-MVPs/rosede01.html

Sleep for  2.216694743687364  seconds
Save to :  NBA-MVPs/birdla01.html

Sleep for  0.2951086570759498  seconds

Sleep for  2.0548438859024802  seconds
Save to :  NBA-MVPs/curryst01.html

Sleep for  1.5588188715655682  seconds
Save to :  NBA-MVPs/pettibo01.html

Sleep for  4.47021686710008  seconds
Save to :  NBA-MVPs/nashst01.html

Sleep for  0.3204173311522224  seconds
Save to :  NBA-MVPs/duncati01.html

Sleep for  2.3266782946652045  seconds
Save to :  NBA-MVPs/unselwe01.html

Sleep for  3.984902952381843  seconds
Save to :  NBA-MVPs/roberos01.html

Sleep for  0.4027238605708261  seconds
Save to :  NBA-MVPs/russebi01.html

Sleep for  2.9228123790731297  seconds
Save to :  NBA-MVPs/duranke01.html

Sleep for  4.130703470454606  seconds
Save to :  NBA-MVPs/johnsma02.html

Sleep for  0.32157602146052067  seconds
Save to :  NBA-MVPs/malonmo01.html

Sleep for  3.9674057026172194  seconds
Save to :  NBA-MVPs/olajuha01.html

Sleep for  2.342974653582497  seconds
Save to :  NBA-MVPs/westbru01.html

Sleep for  0.11401640754318487  seconds
Save to :  NBA-MVPs/jamesle01.html

Sleep for  2.3864208976883443  seconds
Save to :  NBA-MVPs/garneke01.html

Sleep for  1.4518620893393896  seconds
Save to :  NBA-MVPs/hardeja01.html

Sleep for  2.665840647859949  seconds
Save to :  NBA-MVPs/malonka01.html

Sleep for  1.3516385494845196  seconds
Save to :  NBA-MVPs/barklch01.html

Sleep for  2.632243605961218  seconds
Save to :  NBA-MVPs/chambwi01.html

Sleep for  1.3766649178931374  seconds
Save to :  NBA-MVPs/robinda01.html

Sleep for  2.5778407697494288  seconds
Save to :  NBA-MVPs/jordami01.html

Sleep for  2.2448373568042723  seconds
Save to :  NBA-MVPs/bryanko01.html

Sleep for  3.6874784692124765  seconds
Save to :  NBA-MVPs/onealsh01.html


The next step was parsing through the data. We used a similar example from class in going through the table, along with its columns and rows.

Using the beginning of the code from the example right above, we were able to go through the table and parse data. Along with this, we were able to recreate the table to analyze and download the data of the MVP players' statistics.

#TASK A

#TABLE PARSING DATA FOR NBA MVP AWARD WINNERS

from bs4 import BeautifulSoup

#from file
filein = 'nba-mvp.html'
soup = BeautifulSoup(open(filein), 'lxml')

entries = soup.find_all('tr', attrs={'class' : ''})

for entry in entries:
#print entry
#table extracing from contained th and td elements
seasons = entry.find_all('th')
columns = entry.find_all('td')

if len (columns)>3 and columns[0].get_text() == 'NBA':

#Year MVP won
year = seasons[0].get_text()

#Team player was on
team = columns[4].get_text()

#NBA player's name
player = columns[1].get_text()

#Player age
age = columns[3].get_text()

#Games played
games = columns[5].get_text()

#Minutes per game
minutes = columns[6].get_text()

#Points per game
points = columns[7].get_text()

#Rebounds per game
rebounds = columns[8].get_text()

#assists per game
assists = columns[9].get_text()

#steals per game
steals = columns[10].get_text()

#blocks per game
blocks = columns[11].get_text()

#field goal percentage
field_goal = columns[12].get_text()

#3-point percentage
three_point = columns[13].get_text()

#Free throw percentage
free_throw = columns[14].get_text()

#win shares
win_shares = columns[15].get_text()

#win shares/48
ws48 = columns[16].get_text()

#table of winners and data
tt = ''+year+'|:|'+team+'|:|'+player+'|:|'+age+'|:|'+games+'|:|'+minutes+'|:|'+points+'|:|'+rebounds+'|:|'+assists+'|:|'+steals+'|:|'+blocks+'|:|'+field_goal+'|:|'+three_point+'|:|'+free_throw+'|:|'+win_shares+'|:|'+ws48
print (tt)

2017-18|:|HOU|:|James Harden|:|28|:|72|:|35.4|:|30.4|:|5.4|:|8.8|:|1.8|:|0.7|:|.449|:|.367|:|.858|:|15.4|:|.289
2016-17|:|OKC|:|Russell Westbrook|:|28|:|81|:|34.6|:|31.6|:|10.7|:|10.4|:|1.6|:|0.4|:|.425|:|.343|:|.845|:|13.1|:|.224
2015-16|:|GSW|:|Stephen Curry|:|27|:|79|:|34.2|:|30.1|:|5.4|:|6.7|:|2.1|:|0.2|:|.504|:|.454|:|.908|:|17.9|:|.318
2014-15|:|GSW|:|Stephen Curry|:|26|:|80|:|32.7|:|23.8|:|4.3|:|7.7|:|2.0|:|0.2|:|.487|:|.443|:|.914|:|15.7|:|.288
2013-14|:|OKC|:|Kevin Durant|:|25|:|81|:|38.5|:|32.0|:|7.4|:|5.5|:|1.3|:|0.7|:|.503|:|.391|:|.873|:|19.2|:|.295
2012-13|:|MIA|:|LeBron James|:|28|:|76|:|37.9|:|26.8|:|8.0|:|7.3|:|1.7|:|0.9|:|.565|:|.406|:|.753|:|19.3|:|.322
2011-12|:|MIA|:|LeBron James|:|27|:|62|:|37.5|:|27.1|:|7.9|:|6.2|:|1.9|:|0.8|:|.531|:|.362|:|.771|:|14.5|:|.298
2010-11|:|CHI|:|Derrick Rose|:|22|:|81|:|37.4|:|25.0|:|4.1|:|7.7|:|1.0|:|0.6|:|.445|:|.332|:|.858|:|13.1|:|.208
2009-10|:|CLE|:|LeBron James|:|25|:|76|:|39.0|:|29.7|:|7.3|:|8.6|:|1.6|:|1.0|:|.503|:|.333|:|.767|:|18.5|:|.299
2008-09|:|CLE|:|LeBron James|:|24|:|81|:|37.7|:|28.4|:|7.6|:|7.2|:|1.7|:|1.1|:|.489|:|.344|:|.780|:|20.3|:|.318
2007-08|:|LAL|:|Kobe Bryant|:|29|:|82|:|38.9|:|28.3|:|6.3|:|5.4|:|1.8|:|0.5|:|.459|:|.361|:|.840|:|13.8|:|.208
2006-07|:|DAL|:|Dirk Nowitzki|:|28|:|78|:|36.2|:|24.6|:|8.9|:|3.4|:|0.7|:|0.8|:|.502|:|.416|:|.904|:|16.3|:|.278
2005-06|:|PHO|:|Steve Nash|:|31|:|79|:|35.4|:|18.8|:|4.2|:|10.5|:|0.8|:|0.2|:|.512|:|.439|:|.921|:|12.4|:|.212
2004-05|:|PHO|:|Steve Nash|:|30|:|75|:|34.3|:|15.5|:|3.3|:|11.5|:|1.0|:|0.1|:|.502|:|.431|:|.887|:|10.9|:|.203
2003-04|:|MIN|:|Kevin Garnett|:|27|:|82|:|39.4|:|24.2|:|13.9|:|5.0|:|1.5|:|2.2|:|.499|:|.256|:|.791|:|18.3|:|.272
2002-03|:|SAS|:|Tim Duncan|:|26|:|81|:|39.3|:|23.3|:|12.9|:|3.9|:|0.7|:|2.9|:|.513|:|.273|:|.710|:|16.5|:|.248
2001-02|:|SAS|:|Tim Duncan|:|25|:|82|:|40.6|:|25.5|:|12.7|:|3.7|:|0.7|:|2.5|:|.508|:|.100|:|.799|:|17.8|:|.257
2000-01|:|PHI|:|Allen Iverson|:|25|:|71|:|42.0|:|31.1|:|3.8|:|4.6|:|2.5|:|0.3|:|.420|:|.320|:|.814|:|11.8|:|.190
1999-00|:|LAL|:|Shaquille O'Neal|:|27|:|79|:|40.0|:|29.7|:|13.6|:|3.8|:|0.5|:|3.0|:|.574|:|.000|:|.524|:|18.6|:|.283
1998-99|:|UTA|:|Karl Malone|:|35|:|49|:|37.4|:|23.8|:|9.4|:|4.1|:|1.3|:|0.6|:|.493|:|.000|:|.788|:|9.6|:|.252
1997-98|:|CHI|:|Michael Jordan|:|34|:|82|:|38.8|:|28.7|:|5.8|:|3.5|:|1.7|:|0.5|:|.465|:|.238|:|.784|:|15.8|:|.238
1996-97|:|UTA|:|Karl Malone|:|33|:|82|:|36.6|:|27.4|:|9.9|:|4.5|:|1.4|:|0.6|:|.550|:|.000|:|.755|:|16.7|:|.268
1995-96|:|CHI|:|Michael Jordan|:|32|:|82|:|37.7|:|30.4|:|6.6|:|4.3|:|2.2|:|0.5|:|.495|:|.427|:|.834|:|20.4|:|.317
1994-95|:|SAS|:|David Robinson|:|29|:|81|:|38.0|:|27.6|:|10.8|:|2.9|:|1.7|:|3.2|:|.530|:|.300|:|.774|:|17.5|:|.273
1993-94|:|HOU|:|Hakeem Olajuwon|:|31|:|80|:|41.0|:|27.3|:|11.9|:|3.6|:|1.6|:|3.7|:|.528|:|.421|:|.716|:|14.3|:|.210
1992-93|:|PHO|:|Charles Barkley|:|29|:|76|:|37.6|:|25.6|:|12.2|:|5.1|:|1.6|:|1.0|:|.520|:|.305|:|.765|:|14.4|:|.242
1991-92|:|CHI|:|Michael Jordan|:|28|:|80|:|38.8|:|30.1|:|6.4|:|6.1|:|2.3|:|0.9|:|.519|:|.270|:|.832|:|17.7|:|.274
1990-91|:|CHI|:|Michael Jordan|:|27|:|82|:|37.0|:|31.5|:|6.0|:|5.5|:|2.7|:|1.0|:|.539|:|.312|:|.851|:|20.3|:|.321
1989-90|:|LAL|:|Magic Johnson|:|30|:|79|:|37.2|:|22.3|:|6.6|:|11.5|:|1.7|:|0.4|:|.480|:|.384|:|.890|:|16.5|:|.270
1988-89|:|LAL|:|Magic Johnson|:|29|:|77|:|37.5|:|22.5|:|7.9|:|12.8|:|1.8|:|0.3|:|.509|:|.314|:|.911|:|16.1|:|.267
1987-88|:|CHI|:|Michael Jordan|:|24|:|82|:|40.4|:|35.0|:|5.5|:|5.9|:|3.2|:|1.6|:|.535|:|.132|:|.841|:|21.2|:|.308
1986-87|:|LAL|:|Magic Johnson|:|27|:|80|:|36.3|:|23.9|:|6.3|:|12.2|:|1.7|:|0.5|:|.522|:|.205|:|.848|:|15.9|:|.263
1985-86|:|BOS|:|Larry Bird|:|29|:|82|:|38.0|:|25.8|:|9.8|:|6.8|:|2.0|:|0.6|:|.496|:|.423|:|.896|:|15.8|:|.244
1984-85|:|BOS|:|Larry Bird|:|28|:|80|:|39.5|:|28.7|:|10.5|:|6.6|:|1.6|:|1.2|:|.522|:|.427|:|.882|:|15.7|:|.238
1983-84|:|BOS|:|Larry Bird|:|27|:|79|:|38.3|:|24.2|:|10.1|:|6.6|:|1.8|:|0.9|:|.492|:|.247|:|.888|:|13.6|:|.215
1982-83|:|PHI|:|Moses Malone|:|27|:|78|:|37.5|:|24.5|:|15.3|:|1.3|:|1.1|:|2.0|:|.501|:|.000|:|.761|:|15.1|:|.248
1981-82|:|HOU|:|Moses Malone|:|26|:|81|:|42.0|:|31.1|:|14.7|:|1.8|:|0.9|:|1.5|:|.519|:|.000|:|.762|:|15.4|:|.218
1980-81|:|PHI|:|Julius Erving|:|30|:|82|:|35.0|:|24.6|:|8.0|:|4.4|:|2.1|:|1.8|:|.521|:|.222|:|.787|:|13.8|:|.231
1979-80|:|LAL|:|Kareem Abdul-Jabbar|:|32|:|82|:|38.3|:|24.8|:|10.8|:|4.5|:|1.0|:|3.4|:|.604|:|.000|:|.765|:|14.8|:|.227
1978-79|:|HOU|:|Moses Malone|:|23|:|82|:|41.3|:|24.8|:|17.6|:|1.8|:|1.0|:|1.5|:|.540|:||:|.739|:|14.1|:|.200
1977-78|:|POR|:|Bill Walton|:|25|:|58|:|33.3|:|18.9|:|13.2|:|5.0|:|1.0|:|2.5|:|.522|:||:|.720|:|8.4|:|.209
1976-77|:|LAL|:|Kareem Abdul-Jabbar|:|29|:|82|:|36.8|:|26.2|:|13.3|:|3.9|:|1.2|:|3.2|:|.579|:||:|.701|:|17.8|:|.283
1975-76|:|LAL|:|Kareem Abdul-Jabbar|:|28|:|82|:|41.2|:|27.7|:|16.9|:|5.0|:|1.5|:|4.1|:|.529|:||:|.703|:|17.0|:|.242
1973-74|:|MIL|:|Kareem Abdul-Jabbar|:|26|:|81|:|43.8|:|27.0|:|14.5|:|4.8|:|1.4|:|3.5|:|.539|:||:|.702|:|18.4|:|.250
1972-73|:|BOS|:|Dave Cowens|:|24|:|82|:|41.8|:|20.5|:|16.2|:|4.1|:||:||:|.452|:||:|.779|:|12.0|:|.168
1971-72|:|MIL|:|Kareem Abdul-Jabbar|:|24|:|81|:|44.2|:|34.8|:|16.6|:|4.6|:||:||:|.574|:||:|.689|:|25.4|:|.340
1970-71|:|MIL|:|Kareem Abdul-Jabbar|:|23|:|82|:|40.1|:|31.7|:|16.0|:|3.3|:||:||:|.577|:||:|.690|:|22.3|:|.326
1969-70|:|NYK|:|Willis Reed|:|27|:|81|:|38.1|:|21.7|:|13.9|:|2.0|:||:||:|.507|:||:|.756|:|14.6|:|.227
1968-69|:|BAL|:|Wes Unseld|:|22|:|82|:|36.2|:|13.8|:|18.2|:|2.6|:||:||:|.476|:||:|.605|:|10.8|:|.175
1967-68|:|PHI|:|Wilt Chamberlain|:|31|:|82|:|46.8|:|24.3|:|23.8|:|8.6|:||:||:|.595|:||:|.380|:|20.4|:|.255
1966-67|:|PHI|:|Wilt Chamberlain|:|30|:|81|:|45.5|:|24.1|:|24.2|:|7.8|:||:||:|.683|:||:|.441|:|21.9|:|.285
1965-66|:|PHI|:|Wilt Chamberlain|:|29|:|79|:|47.3|:|33.5|:|24.6|:|5.2|:||:||:|.540|:||:|.513|:|21.4|:|.275
1964-65|:|BOS|:|Bill Russell|:|30|:|78|:|44.4|:|14.1|:|24.1|:|5.3|:||:||:|.438|:||:|.573|:|16.9|:|.234
1963-64|:|CIN|:|Oscar Robertson|:|25|:|79|:|45.1|:|31.4|:|9.9|:|11.0|:||:||:|.483|:||:|.853|:|20.6|:|.278
1962-63|:|BOS|:|Bill Russell|:|28|:|78|:|44.9|:|16.8|:|23.6|:|4.5|:||:||:|.432|:||:|.555|:|13.5|:|.185
1961-62|:|BOS|:|Bill Russell|:|27|:|76|:|45.2|:|18.9|:|23.6|:|4.5|:||:||:|.457|:||:|.595|:|15.5|:|.217
1960-61|:|BOS|:|Bill Russell|:|26|:|78|:|44.3|:|16.9|:|23.9|:|3.4|:||:||:|.426|:||:|.550|:|13.0|:|.181
1959-60|:|PHW|:|Wilt Chamberlain|:|23|:|72|:|46.4|:|37.6|:|27.0|:|2.3|:||:||:|.461|:||:|.582|:|17.0|:|.245
1958-59|:|STL|:|Bob Pettit|:|26|:|72|:|39.9|:|29.2|:|16.4|:|3.1|:||:||:|.438|:||:|.759|:|14.8|:|.246
1957-58|:|BOS|:|Bill Russell|:|23|:|69|:|38.3|:|16.6|:|22.7|:|2.9|:||:||:|.442|:||:|.519|:|11.3|:|.206
1956-57|:|BOS|:|Bob Cousy|:|28|:|64|:|36.9|:|20.6|:|4.8|:|7.5|:||:||:|.378|:||:|.821|:|8.8|:|.178
1955-56|:|STL|:|Bob Pettit|:|23|:|72|:|38.8|:|25.7|:|16.2|:|2.6|:||:||:|.429|:||:|.736|:|13.8|:|.236


### Data Analysis and Visualizations

After parsing and collecting data, we were now able to analyze and visualize it. We started off by creating two scatterplots to compare and analyze data about the MVP players.

We wanted to see if any two factors correlate in regards to the players and their performance, and if it has impact on winning the award.

To create the first scatter plot, we had to use and import matplotlib. From there, we created a function that allowed us to read data from a file and return it. This is through data files after parsing through the data in the previous code.

After this, we were able to obtain numbers such as alpha, beta, and correlation coefficient. We were then able to input the variables into the code used from the lecture for the scatterplot to produce the visualizations for analysis.
#TASK B

import matplotlib.pyplot as plt
%matplotlib inline
import random
import numpy as np
from scipy import stats

year = []
stat = []
world_record = 0.0

with open(filein) as f:
for line in f:
if not line.startswith("#"):
if int(line.strip().split(' ')[0]) and int(line.strip().split(' ')[0]):
data = line.strip()
data_col = data.split(' ')
year.append(int(data_col[0]))
stat.append(float(data_col[1]))
else:
world_record = float(line.strip()[1:])

return year, stat


### Scatter Plot 1: Player's Age vs Games Played

#Games played in by player that season
filein = 'Data/nba_gp.dat'

#Ages of NBA players during MVP season
filein = 'Data/nba_age.dat'

#Games played
print ('Results for Games Played:')
slope_m, intercept_m, r_value_m, p_value_m, std_err_m = stats.linregress(year_gp, stat_gp)
print ('alpha : ', intercept_m)
print ('beta : ', slope_m)
print ('correlation coefficient : ', r_value_m)

#Age of player
print ('Results for Player Ages:')
slope_w, intercept_w, r_value_w, p_value_w, std_err_w = stats.linregress(year_age, stat_age)
print ('alpha : ', intercept_w)
print ('beta : ', slope_w)
print ('correlation coefficient : ', r_value_w)

Results for Games Played:
alpha :  43.869223630312334
beta :  0.017137096774193554
correlation coefficient :  0.05025295352158669
Results for Player Ages:
alpha :  -35.58696556579622
beta :  0.031634024577572965
correlation coefficient :  0.19713802712711728


This first scatter plot looks at the correlation between games played by the NBA players versus how old they are. Based on the scatterplot, the interpretation shows some of the MVP winners higher in age played fewer games.

However, there were some younger players who missed more or as many. Other factors that can affect the games played is how long the season is, as a couple of the NBA seasons had lockouts which led to shorter seasons and fewer games played.

From the interpretation, the correlation may not be strong enough to be a cause.

plt.figure(figsize=(10,5))
plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})

#Games played
plt.plot(year_gp,stat_gp, marker='o', color ='blue', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2050- 1950 / 10000.0))
best_fit_y = intercept_m + slope_m * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='blue', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

#Age
plt.plot(year_age,stat_age, marker='s', color ='red', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2052- 1950 / 10000.0))
best_fit_y = intercept_w + slope_w * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='red', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

plt.ylabel('Age / Games Played')
plt.xlabel('year')

plt.show()

### Scatter Plot 2: Points Per Game vs Field Goal Percentage

#Points per game by NBA player in MVP season
filein = 'Data/nba_pts.dat'

# Field goal percentage of NBA player during MVP season
filein = 'Data/nba_fg.dat'

#Points
print ('Results for Games Played:')
slope_m, intercept_m, r_value_m, p_value_m, std_err_m = stats.linregress(year_pts, stat_pts)
print ('alpha : ', intercept_m)
print ('beta : ', slope_m)
print ('correlation coefficient : ', r_value_m)

#Field goals
print ('Results for Player Ages:')
slope_w, intercept_w, r_value_w, p_value_w, std_err_w = stats.linregress(year_fg, stat_fg)
print ('alpha : ', intercept_w)
print ('beta : ', slope_w)
print ('correlation coefficient : ', r_value_w)

Results for Games Played:
alpha :  -101.50359383000512
beta :  0.06416090629800307
correlation coefficient :  0.22211254723148693
Results for Player Ages:
alpha :  20.190929019457297
beta :  0.015192972350230389
correlation coefficient :  0.0538293563809513


This second scatter plot takes a look at the correlation between how many points a player averaged during his MVP season versus the field goal percentage they had that season.

The field goal percentage shows how accurate a player is as a scorer. Seeing how field goal percentage correlates with points per game can give us a better idea of how NBA players are picked as MVPs, as these stats play a role into them winning the award.

It also shows us if or how efficient a player in correlation to how many points he scores per game. One player could score many points per game and have a poor field goal percentage, and another vice versa.

The chart shows with the line that average MVP winner averages in the mid-to-high 20s for points per game while hovering around 50% field goal percentage. It can give us an idea of what averages it can take to win MVP in the future.

plt.figure(figsize=(10,5))
plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})

#Points
plt.plot(year_pts,stat_pts, marker='o', color ='blue', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2050- 1950 / 10000.0))
best_fit_y = intercept_m + slope_m * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='blue', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

#Field goal
plt.plot(year_fg,stat_fg, marker='s', color ='red', markersize=5, linewidth=0)
best_fit_x = np.arange(1950, (2052- 1950 / 10000.0))
best_fit_y = intercept_w + slope_w * best_fit_x
plt.plot(best_fit_x, best_fit_y, color ='red', markersize=0, linewidth=3, linestyle='-', alpha = 0.5)

#(Points per game / field goal percentage)
plt.ylabel('PPG / FG%')
plt.xlabel('year')

plt.show()

### Distributions

For implementing this part, we utilized code from before in regards to the data files. This will be used to make it work with the code to create the distribution histograms.

Utilizing points and games played stats with the "stat_age" and "stat_gp" variables allows us to see distributions and inferences for the NBA MVP player stats.

Along with this, we also added a function that lets us measure the probability distribution. We also had code that allows us to produce these histograms and distribution visualizations.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#####################################
##Measure probability distribution. computes also average
## value and standard deviation
#####
def measure_probability_distribution (outcomes):

average_value = 0.0
variance = 0.0

pdf = {}
norm = 0.0

##count number of observations
for x in outcomes:
if x not in pdf:
pdf[x] = 0.0
pdf[x] += 1.0
norm += 1.0

average_value += x
variance += x*x

average_value /= norm
variance /= norm
variance = variance - average_value * average_value

##normalize pdf
for x in pdf:
pdf[x] /= norm

return pdf, average_value, variance
#####################################

### Distribution 1: NBA Players' Ages

Our intepretation of this is that most NBA players who have won the MVP award have done so around their age 27-28 season. With this, we can expect the star players around this age as the most likely to win the award.

The average age is around 27 years old. This makes sense as NBA players are believed to be in their prime around this age.

from scipy.stats import poisson

pdf, av, var = measure_probability_distribution (stat_age)

##visualize histogram

plt.figure(figsize=(10,10))
plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})

title = '$\\langle x \\rangle = ' + '% .2f' % av + ' \\quad \\sigma^2 = ' + '% .2f' % var + '$'
plt.title(title, fontsize = 20)

plt.xlabel('NBA players ages')
plt.ylabel('probability distribution')

##construct two lists for  visualization
x = []
Px = []
for q in pdf:
x.append(q)
Px.append(pdf[q])

plt.bar(x, Px, color = 'red', align='center', alpha=0.5)

plt.plot(x, poisson.pmf(x, av), linestyle='-', linewidth=0.0,color='k', label='poisson pmf')

plt.show()

### Distribution 2: Games Played

Our interpretation of this distribution graph shows that players are expected to play the majority of the season if they want to be in contention for the NBA MVP award.

There are some outliers (likely due to lockout seasons when the NBA had only 50 games played or 66 games played during the regular season), but the majority of the NBA players have played more than 70 games.

The average is around 78 games played. There are 82 in a season, in which the majority of the NBA MVP winners have played. With this, playing the majority of all of the season is an important factor in winning.

from scipy.stats import poisson

pdf, av, var = measure_probability_distribution (stat_gp)

##visualize histogram

plt.figure(figsize=(10,10))
plt.rc('text', usetex=True)
plt.rc('font', size=24, **{'family':'DejaVu Sans','sans-serif':['Helvetica']})

title = '$\\langle x \\rangle = ' + '% .2f' % av + ' \\quad \\sigma^2 = ' + '% .2f' % var + '$'
plt.title(title, fontsize = 20)

plt.xlabel('Games Played')
plt.ylabel('probability distribution')

##construct two lists for  visualization
x = []
Px = []
for q in pdf:
x.append(q)
Px.append(pdf[q])

plt.bar(x, Px, color = 'red', align='center', alpha=0.5)

plt.plot(x, poisson.pmf(x, av), linestyle='-', linewidth=0.0,color='k', label='poisson pmf')

plt.show()

### Conclusion

We can conclude from analysis and visualizations that there are factors that play a role in NBA players winning the MVP award, and what factors voters consider into picking who wins.

Many of the players average over 20 points per game at around 50 percent shooting. Along with this, the players are healthy and durable in that they must play the majority of the regular season. The majority of the winners did not miss a single game.

Along with this, expect most players to be around 27 or 28 years old, or what is considered their "prime years," when they win the award.

Another factor that we can use for future research is connecting with how many games a player's team won when they won the award. We can find the distribution of this and find the average number of wins an MVP has won with their team.

We can also use more of the player stats we downloaded when parsing their URLs and pages. We could compare other stats such as weight, height, and more.

Overall, we can expect that future NBA MVP winners are players in their prime years averaging over 20 points a game at around 50 percent shooting, who also end up playing majority or all of the season's games.

In additon to this, it's likely the player's team had a significant number of victories as well. This criterion can help shape who wins this year's NBA MVP award, which likely comes down to Milwaukee Bucks' Giannis Antetokounmpo and Houston Rockets' James Harden.

UPDATE 2020

Giannis ended up winning the MVP award over Harden. Let's see how their stats compare and how it correlates with our results.
RkPlayerSeasonAgeGGSMPFGFGAFG%3P3PA3P%2P2PA2P%eFG%FTFTAFT%ORBDRBTRBASTSTLBLKTOVPFPTS
1Giannis Antetokounmpo2018-1924727232.810.017.3.5780.72.8.2569.314.5.641.5996.99.5.7292.210.312.55.91.31.53.73.227.7
2James Harden2018-1929787836.810.824.5.4424.813.2.3686.011.3.528.5419.711.0.8790.85.86.67.52.00.75.03.136.1
Giannis and Harden are both outliers in average age. Giannis is 24 while Harden is 29, which is outside the 27-28 average. However, you can argue Harden is in his prime currently, while Giannis may have yet to reach his prime (which is scary).

Giannis shot nearly 58% from the field while averaging about 28 points per game, compared to Harden's 44% shooting on 36 points per game. Harden averaged many more points per game, but Giannis was way more efficient.

This falls in line with most of the MVP winners averaging over 20 points per game with 50% or better shooting, like Giannis. Harden did exceed the points part, but was below-average in shooting percentage compared to the MVP winners.

Harden, however, did beat out Giannis in games played. Harden fell right at the average, while Giannis was slightly below the average. What helped Giannis is that he averaged more rebounds, close to the assist mark, and was more efficient.

We'll see how these stats and trends compare for future NBA MVP award winners and voting.