Week 5: Advertising and Promotion


The Dodgers is a professional baseball team and plays in the Major Baseball League. The team owns a 56,000-seat stadium and is interested in increasing the attendance of their fans during home games.At the moment the team management would like to know if bobblehead promotions increase the attendance of the team’s fans? This is a case study based on Miller (2014 Chapter 2).

include_graphics(c("los_angeles-dodgers-stadium.jpg",
                 "Los-Angeles-Dodgers-Promo.jpg",
                 "adrian_bobble.jpg"))
56,000-seat Dodgers stadium (left),   shirts and caps (middle),  bobblehead (right)56,000-seat Dodgers stadium (left),   shirts and caps (middle),  bobblehead (right)56,000-seat Dodgers stadium (left),   shirts and caps (middle),  bobblehead (right)

Figure 1: 56,000-seat Dodgers stadium (left), shirts and caps (middle), bobblehead (right)

The 2012 season data in the events table of SQLite database data/dodgers.sqlite contain for each of 81 home play the

Prerequisites

We will use R, RStudio, R Markdown for the next three weeks to fit statistical models to various data and analyze them. Read Wickham and Grolemund (2017) online

All materials for the next three weeks will be available on Google drive.

March 1: Exploratory data analysis

  1. Connect to data/dodgers.sqlite. Read table events into a variable in R.

    • Read Baumer, Kaplan, and Horton (2017, Chapters 1, 4, 5, 15) (Second edition online) for getting data from and writing them to various SQL databases.

    • Because we do not want to hassle with user permissions, we will use SQLite for practice. I recommend PostgreSQL for real projects.

    • Open RStudio terminal, connect to database dodgers.sqlite with sqlite3. Explore it (there is only one table, events, at this time) with commands

      • .help
      • .databases
      • .tables
      • .schema <table_name>
      • .headers on
      • .mode column
      • SELECT ...
      • .quit
    • Databases are great to store and retrieve large data, especially, when they are indexed with respect to variables/columns along with we do search and match extensively.

    • R (likewise, Python) allows one to seeminglessly read from and write to databases. For fast analysis, keep data in a database, index tables for fast retrieval, use R or Python to fit models to data.

# Ctrl-shift-i
#library(RPostgreSQL)
library(RSQLite)  ## if package is not on the computer, then install it only once using Tools > Install packages...
con <- dbConnect(SQLite(), "../data/dodgers.sqlite") # read Modern Data Science with R for different ways to connect a database.

## dbListTables(con)

events <- tbl(con, "events") %>% 
  collect() %>% 
  mutate(day_of_week = factor(day_of_week, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")),
         month = factor(month, levels = c("APR","MAY","JUN","JUL","AUG","SEP","OCT")))
  
# events %>% distinct(month)
# events$day_of_week %>% class()
# events$day_of_week %>% levels()
# events
  1. What are the number of plays on each week day and in each month of a year?

  2. Check the orders of the levels of the day_of_week and month factors. If necessary, put them in the logical order.

  3. How many times were bobblehead promotions run on each week day?

  4. How did the attendance vary across week days? Draw boxplots. On which day of week was the attendance the highest on average?

  5. Is there an association between attendance and

    • whether the game is played in day light or night?
    • Between attendance and whether skies are clear or cloudy?
  6. Is there an association between attendance and temperature?

    • If yes, is there a positive or negative association?
    • Do the associations differ on clear and cloud days or day or night times?

Next time: A linear regression model

Regress attendance on month, day of the week, and bobblehead promotion.

  1. Is there any evidence for a relationship between attendance and other variables? Why or why not?

  2. Does the bobblehead promotion have a statistically significant effect on the attendance?

  3. Do month and day of week variables help to explain the number of attendants?

  4. How many fans are expected to be drawn alone by a bobblehead promotion to a home game? Give a 90% confidence interval.

  5. How good does the model fit to the data? Why? Comment on residual standard error and R\(^2\). Plot observed attendance against predicted attendance.

  6. Predict the number of attendees to a typical home game on a Wednesday in June if a bobblehead promotion is extended. Give a 90% prediction interval.

Project (will be graded)

Include all variables and conduct a full regression analysis of the problem. Submit your R markdown and html files to course homepage on moodle.

Bibliography

Baumer, B. S., D. T. Kaplan, and N. J. Horton. 2017. Modern Data Science with R. Chapman & Hall/CRC Texts in Statistical Science. CRC Press. https://books.google.com.tr/books?id=NrddDgAAQBAJ.
Miller, T. W. 2014. Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science. FT Press Analytics. Pearson Education. https://books.google.com.tr/books?id=PU6nBAAAQBAJ.
Wickham, H., and G. Grolemund. 2017. R for Data Science. O’Reilly Media. https://books.google.com.tr/books?id=aZRYrgEACAAJ.