Cyclistic Bike Share Case Study is part of the Google Data Analytics Course I have done. In this case I worked as a Junior Data Analyst in the marketing analyst team for a fictional company - Cyclistic, based in Chicago. I followed the steps of the data analysis process: ask, prepare, process, analyze, share, and act.
The director of marketing considers the company’s future success depends on maximizing the number of annual memberships. My team wants to understand how casual riders and annual members use Cyclistic bikes differently.
Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. Cyclistic was launched in 2016 and throught the years has grown to a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Cyclistic’s marketing strategy relied on building general awareness and appealing to broad castomer segments by offering flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
How do annual members and casual riders use Cyclistic bikes differently?
Maximizing the number of annual memberships as key to future growth.
Key stakehoders:
I used Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license. This is a public data but the data-privacy issues prohibit using riders’ personally identifiable information. This means that I won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
I choosen the most recent year of Cyclistic trip data (from October 2022 to September 2023) and downloaded each files from divvy-data as a zip Archive file. Then, data have been unzipped and saved as 12 CSV files, there is 1 file for each month. I created a folder on my desktop with 2 subfolders within - one for CSV files and another for XLS files. I used appropriate file-naming conventios for all of them. Next, I moved the downloaded files to the appropriate subfolder. Thanks to that, I have a copy of original data.
This is structured data, organized in rows and columns. Each record represents one trip, and each trip has a unique field that identifies it: ride_id.
I used ROCCC in terms of bias and credibility of data source.
Reliable and original This is public data that contains accurate, complete and unbiased info on Cyclistic’s historical bike trips.
Comprehensive and current This data source contain all the data needed to understand the different ways members and casual riders use Cyclistic bikes. The data is from the most relevant 12 months.
Cited This source is publicly available data provided by Cyclistic and the City of Chicago.
In order to answer the key business questions, I have conducted the data analysis process using R and Tableau tool. Initially, I wanted to use Microsoft Excel or BigQuery tool to clean, manipulate and analyze the data, but because of memory and processor limitation, I finally decided for cleaning, verification and transforming data in R.
library(tidyverse)
## Warning: pakiet 'tidyverse' został zbudowany w wersji R 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
## Warning: pakiet 'janitor' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'janitor'
##
## Następujące obiekty zostały zakryte z 'package:stats':
##
## chisq.test, fisher.test
library(skimr)
## Warning: pakiet 'skimr' został zbudowany w wersji R 4.3.2
library(lubridate)
library(readr)
library(knitr)
library(hms)
## Warning: pakiet 'hms' został zbudowany w wersji R 4.3.2
##
## Dołączanie pakietu: 'hms'
##
## Następujący obiekt został zakryty z 'package:lubridate':
##
## hms
oct22_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202210-divvy-tripdata.csv")
## Rows: 558685 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nov22_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202211-divvy-tripdata.csv")
## Rows: 337735 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dec22_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202212-divvy-tripdata.csv")
## Rows: 181806 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jan23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202301-divvy-tripdata.csv")
## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
feb23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202302-divvy-tripdata.csv")
## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mar23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202303-divvy-tripdata.csv")
## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
apr23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202304-divvy-tripdata.csv")
## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
may23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202305-divvy-tripdata.csv")
## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jun23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202306-divvy-tripdata.csv")
## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jul23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202307-divvy-tripdata.csv")
## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
aug23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202308-divvy-tripdata.csv")
## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sep23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202309-divvy-tripdata.csv")
## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tripdata_df <- bind_rows(oct22_df, nov22_df, dec22_df, jan23_df, feb23_df, mar23_df, apr23_df, may23_df, jun23_df, jul23_df, aug23_df, sep23_df)
head(tripdata_df)
## # A tibble: 6 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 A50255C1E17942AB classic_bike 2022-10-14 17:13:30 2022-10-14 17:19:39
## 2 DB692A70BD2DD4E3 electric_bike 2022-10-01 16:29:26 2022-10-01 16:49:06
## 3 3C02727AAF60F873 electric_bike 2022-10-19 18:55:40 2022-10-19 19:03:30
## 4 47E653FDC2D99236 electric_bike 2022-10-31 07:52:36 2022-10-31 07:58:49
## 5 8B5407BE535159BF classic_bike 2022-10-13 18:41:03 2022-10-13 19:26:18
## 6 A177C92E9F021B99 electric_bike 2022-10-13 15:53:27 2022-10-13 15:59:17
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
colnames(tripdata_df)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
str(tripdata_df)
## spc_tbl_ [5,674,399 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:5674399] "A50255C1E17942AB" "DB692A70BD2DD4E3" "3C02727AAF60F873" "47E653FDC2D99236" ...
## $ rideable_type : chr [1:5674399] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:5674399], format: "2022-10-14 17:13:30" "2022-10-01 16:29:26" ...
## $ ended_at : POSIXct[1:5674399], format: "2022-10-14 17:19:39" "2022-10-01 16:49:06" ...
## $ start_station_name: chr [1:5674399] "Noble St & Milwaukee Ave" "Damen Ave & Charleston St" "Hoyne Ave & Balmoral Ave" "Rush St & Cedar St" ...
## $ start_station_id : chr [1:5674399] "13290" "13288" "655" "KA1504000133" ...
## $ end_station_name : chr [1:5674399] "Larrabee St & Division St" "Damen Ave & Cullerton St" "Western Ave & Leland Ave" "Orleans St & Chestnut St (NEXT Apts)" ...
## $ end_station_id : chr [1:5674399] "KA1504000079" "13089" "TA1307000140" "620" ...
## $ start_lat : num [1:5674399] 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num [1:5674399] -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:5674399] 41.9 41.9 42 41.9 41.9 ...
## $ end_lng : num [1:5674399] -87.6 -87.7 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:5674399] "member" "casual" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
skim_without_charts(tripdata_df)
Name | tripdata_df |
Number of rows | 5674399 |
Number of columns | 13 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 4 |
POSIXct | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
ride_id | 0 | 1.00 | 16 | 16 | 0 | 5674399 | 0 |
rideable_type | 0 | 1.00 | 11 | 13 | 0 | 3 | 0 |
start_station_name | 873186 | 0.85 | 3 | 64 | 0 | 1577 | 0 |
start_station_id | 873318 | 0.85 | 3 | 36 | 0 | 1485 | 0 |
end_station_name | 926160 | 0.84 | 3 | 64 | 0 | 1586 | 0 |
end_station_id | 926301 | 0.84 | 3 | 35 | 0 | 1491 | 0 |
member_casual | 0 | 1.00 | 6 | 6 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|
start_lat | 0 | 1 | 41.90 | 0.05 | 41.63 | 41.88 | 41.90 | 41.93 | 42.07 |
start_lng | 0 | 1 | -87.65 | 0.03 | -87.94 | -87.66 | -87.64 | -87.63 | -87.46 |
end_lat | 6642 | 1 | 41.90 | 0.07 | 0.00 | 41.88 | 41.90 | 41.93 | 42.18 |
end_lng | 6642 | 1 | -87.65 | 0.13 | -88.16 | -87.66 | -87.64 | -87.63 | 0.00 |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
started_at | 0 | 1 | 2022-10-01 00:00:15 | 2023-09-30 23:59:57 | 2023-06-04 06:55:17 | 4783277 |
ended_at | 0 | 1 | 2022-10-01 00:01:05 | 2023-10-10 04:56:16 | 2023-06-04 07:26:26 | 4794256 |
summary(tripdata_df)
## ride_id rideable_type started_at
## Length:5674399 Length:5674399 Min. :2022-10-01 00:00:15.00
## Class :character Class :character 1st Qu.:2023-02-23 13:11:26.50
## Mode :character Mode :character Median :2023-06-04 06:55:17.00
## Mean :2023-05-06 18:58:12.32
## 3rd Qu.:2023-08-01 18:28:38.50
## Max. :2023-09-30 23:59:57.00
##
## ended_at start_station_name start_station_id
## Min. :2022-10-01 00:01:05.00 Length:5674399 Length:5674399
## 1st Qu.:2023-02-23 13:25:44.50 Class :character Class :character
## Median :2023-06-04 07:26:26.00 Mode :character Mode :character
## Mean :2023-05-06 19:16:37.35
## 3rd Qu.:2023-08-01 18:45:51.00
## Max. :2023-10-10 04:56:16.00
##
## end_station_name end_station_id start_lat start_lng
## Length:5674399 Length:5674399 Min. :41.63 Min. :-87.94
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.07 Max. :-87.46
##
## end_lat end_lng member_casual
## Min. : 0.00 Min. :-88.16 Length:5674399
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character
## Median :41.90 Median :-87.64 Mode :character
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.18 Max. : 0.00
## NA's :6642 NA's :6642
cyclistic_df <- tripdata_df
+ ride_length calculated the total ride length for each trip as subtracting the column "started_at" from the column "ended_at"; converted it to numeric and to minutes;
cyclistic_df$ride_length <- difftime(tripdata_df$ended_at, tripdata_df$started_at, units = "mins")
cyclistic_df <- cyclistic_df %>%
mutate(ride_length = as.numeric(ride_length))
is.numeric(cyclistic_df$ride_length)
## [1] TRUE
+ day_of_week - the default format is yyyy-mm-dd
cyclistic_df$date <- as.Date(cyclistic_df$started_at)
cyclistic_df$day_of_week_num <- wday(cyclistic_df$date, week_start = 1)
days_en <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
cyclistic_df$day_of_week <- days_en[cyclistic_df$day_of_week_num]
cyclistic_df$day_of_week_num <- NULL
+ month
cyclistic_df$month <- format(as.Date(cyclistic_df$date), "%m")
+ day
cyclistic_df$day <- format(as.Date(cyclistic_df$date), "%d")
+ year
cyclistic_df$year <- format(as.Date(cyclistic_df$date), "%Y")
+ time - formatted as HH:MM:SS
cyclistic_df$time <- format(as.Date(cyclistic_df$date), "%H:%M:%S")
cyclistic_df$time <- as_hms((cyclistic_df$started_at))
+ hour
cyclistic_df$hour <- hour(cyclistic_df$time)
+ season
cyclistic_df <- cyclistic_df %>% mutate(season=
case_when(month == "03"~ "Spring",
month == "04"~ "Spring",
month == "05"~ "Spring",
month == "06"~ "Summer",
month == "07"~ "Summer",
month == "08"~ "Summer",
month == "09"~ "Fall",
month == "10"~ "Fall",
month == "11"~ "Fall",
month == "12"~ "Winter",
month == "01"~ "Winter",
month == "02"~ "Winter")
)
+ time_of_day
cyclistic_df <- cyclistic_df %>% mutate(time_of_day=
case_when(hour == "0" ~ "Night",
hour == "1" ~ "Night",
hour == "2" ~ "Night",
hour == "3" ~ "Night",
hour == "4" ~ "Night",
hour == "5" ~ "Morning",
hour == "6" ~ "Morning",
hour == "7" ~ "Morning",
hour == "8" ~ "Morning",
hour == "9" ~ "Morning",
hour == "10" ~ "Morning",
hour == "11" ~ "Morning",
hour == "12" ~ "Afternoon",
hour == "13" ~ "Afternoon",
hour == "14" ~ "Afternoon",
hour == "15" ~ "Afternoon",
hour == "16" ~ "Afternoon",
hour == "17" ~ "Afternoon",
hour == "18" ~ "Evening",
hour == "19" ~ "Evening",
hour == "20" ~ "Evening",
hour == "21" ~ "Evening",
hour == "22" ~ "Night",
hour == "23" ~ "Night")
)
Cleaning process.
cyclistic_df <- distinct(cyclistic_df)
+ removing rows with NA values (blank rows)
cyclistic_df <- na.omit(cyclistic_df)
+ removing unnecessary columns (ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng)
cyclistic_df <- cyclistic_df %>%
select(-c(ride_id, start_station_id,end_station_id,start_lat,start_lng,end_lat,end_lng))
+ removing all rows where ride_length is less then or equal to 0
cyclistic_df <- cyclistic_df[!(cyclistic_df$ride_length <=0),]
head(cyclistic_df)
## # A tibble: 6 × 16
## rideable_type started_at ended_at start_station_name
## <chr> <dttm> <dttm> <chr>
## 1 classic_bike 2022-10-14 17:13:30 2022-10-14 17:19:39 Noble St & Milwaukee Ave
## 2 electric_bike 2022-10-01 16:29:26 2022-10-01 16:49:06 Damen Ave & Charleston …
## 3 electric_bike 2022-10-19 18:55:40 2022-10-19 19:03:30 Hoyne Ave & Balmoral Ave
## 4 electric_bike 2022-10-31 07:52:36 2022-10-31 07:58:49 Rush St & Cedar St
## 5 classic_bike 2022-10-13 18:41:03 2022-10-13 19:26:18 900 W Harrison St
## 6 electric_bike 2022-10-13 15:53:27 2022-10-13 15:59:17 900 W Harrison St
## # ℹ 12 more variables: end_station_name <chr>, member_casual <chr>,
## # ride_length <dbl>, date <date>, day_of_week <chr>, month <chr>, day <chr>,
## # year <chr>, time <time>, hour <int>, season <chr>, time_of_day <chr>
colnames(cyclistic_df)
## [1] "rideable_type" "started_at" "ended_at"
## [4] "start_station_name" "end_station_name" "member_casual"
## [7] "ride_length" "date" "day_of_week"
## [10] "month" "day" "year"
## [13] "time" "hour" "season"
## [16] "time_of_day"
str(cyclistic_df)
## tibble [4,290,979 × 16] (S3: tbl_df/tbl/data.frame)
## $ rideable_type : chr [1:4290979] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:4290979], format: "2022-10-14 17:13:30" "2022-10-01 16:29:26" ...
## $ ended_at : POSIXct[1:4290979], format: "2022-10-14 17:19:39" "2022-10-01 16:49:06" ...
## $ start_station_name: chr [1:4290979] "Noble St & Milwaukee Ave" "Damen Ave & Charleston St" "Hoyne Ave & Balmoral Ave" "Rush St & Cedar St" ...
## $ end_station_name : chr [1:4290979] "Larrabee St & Division St" "Damen Ave & Cullerton St" "Western Ave & Leland Ave" "Orleans St & Chestnut St (NEXT Apts)" ...
## $ member_casual : chr [1:4290979] "member" "casual" "member" "member" ...
## $ ride_length : num [1:4290979] 6.15 19.67 7.83 6.22 45.25 ...
## $ date : Date[1:4290979], format: "2022-10-14" "2022-10-01" ...
## $ day_of_week : chr [1:4290979] "Friday" "Saturday" "Wednesday" "Monday" ...
## $ month : chr [1:4290979] "10" "10" "10" "10" ...
## $ day : chr [1:4290979] "14" "01" "19" "31" ...
## $ year : chr [1:4290979] "2022" "2022" "2022" "2022" ...
## $ time : 'hms' num [1:4290979] 17:13:30 16:29:26 18:55:40 07:52:36 ...
## ..- attr(*, "units")= chr "secs"
## $ hour : int [1:4290979] 17 16 18 7 18 15 15 17 9 12 ...
## $ season : chr [1:4290979] "Fall" "Fall" "Fall" "Fall" ...
## $ time_of_day : chr [1:4290979] "Afternoon" "Afternoon" "Evening" "Morning" ...
## - attr(*, "na.action")= 'omit' Named int [1:1382948] 2848 2849 2850 2851 2852 2854 2856 4359 4360 4361 ...
## ..- attr(*, "names")= chr [1:1382948] "2848" "2849" "2850" "2851" ...
skim_without_charts(cyclistic_df)
Name | cyclistic_df |
Number of rows | 4290979 |
Number of columns | 16 |
_______________________ | |
Column type frequency: | |
character | 10 |
Date | 1 |
difftime | 1 |
numeric | 2 |
POSIXct | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
rideable_type | 0 | 1 | 11 | 13 | 0 | 3 | 0 |
start_station_name | 0 | 1 | 3 | 64 | 0 | 1519 | 0 |
end_station_name | 0 | 1 | 3 | 64 | 0 | 1541 | 0 |
member_casual | 0 | 1 | 6 | 6 | 0 | 2 | 0 |
day_of_week | 0 | 1 | 6 | 9 | 0 | 7 | 0 |
month | 0 | 1 | 2 | 2 | 0 | 12 | 0 |
day | 0 | 1 | 2 | 2 | 0 | 31 | 0 |
year | 0 | 1 | 4 | 4 | 0 | 2 | 0 |
season | 0 | 1 | 4 | 6 | 0 | 4 | 0 |
time_of_day | 0 | 1 | 5 | 9 | 0 | 4 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
date | 0 | 1 | 2022-10-01 | 2023-09-30 | 2023-06-03 | 365 |
Variable type: difftime
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
time | 0 | 1 | 0 secs | 86399 secs | 15:25:20 | 85759 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|
ride_length | 0 | 1 | 15.97 | 35.51 | 0.02 | 5.63 | 9.83 | 17.55 | 12136.3 |
hour | 0 | 1 | 14.08 | 4.88 | 0.00 | 11.00 | 15.00 | 18.00 | 23.0 |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
started_at | 0 | 1 | 2022-10-01 00:00:15 | 2023-09-30 23:59:57 | 2023-06-03 16:23:05 | 3753750 |
ended_at | 0 | 1 | 2022-10-01 00:02:52 | 2023-10-01 18:32:53 | 2023-06-03 16:46:01 | 3763883 |
summary(cyclistic_df)
## rideable_type started_at
## Length:4290979 Min. :2022-10-01 00:00:15.00
## Class :character 1st Qu.:2023-02-23 15:26:44.50
## Mode :character Median :2023-06-03 16:23:05.00
## Mean :2023-05-06 20:46:11.23
## 3rd Qu.:2023-08-01 20:44:07.50
## Max. :2023-09-30 23:59:57.00
## ended_at start_station_name end_station_name
## Min. :2022-10-01 00:02:52.00 Length:4290979 Length:4290979
## 1st Qu.:2023-02-23 15:36:42.50 Class :character Class :character
## Median :2023-06-03 16:46:01.00 Mode :character Mode :character
## Mean :2023-05-06 21:02:09.30
## 3rd Qu.:2023-08-01 21:02:45.50
## Max. :2023-10-01 18:32:53.00
## member_casual ride_length date day_of_week
## Length:4290979 Min. : 0.017 Min. :2022-10-01 Length:4290979
## Class :character 1st Qu.: 5.633 1st Qu.:2023-02-23 Class :character
## Mode :character Median : 9.833 Median :2023-06-03 Mode :character
## Mean : 15.968 Mean :2023-05-06
## 3rd Qu.: 17.550 3rd Qu.:2023-08-01
## Max. :12136.300 Max. :2023-09-30
## month day year time
## Length:4290979 Length:4290979 Length:4290979 Length:4290979
## Class :character Class :character Class :character Class1:hms
## Mode :character Mode :character Mode :character Class2:difftime
## Mode :numeric
##
##
## hour season time_of_day
## Min. : 0.00 Length:4290979 Length:4290979
## 1st Qu.:11.00 Class :character Class :character
## Median :15.00 Mode :character Mode :character
## Mean :14.08
## 3rd Qu.:18.00
## Max. :23.00
In this phase, I performed calculations and analyzed data to uncover trends and patterns between the casual and the member users.
nrow(cyclistic_df)
## [1] 4290979
cyclistic_df %>%
group_by(member_casual) %>%
count(member_casual)
## # A tibble: 2 × 2
## # Groups: member_casual [2]
## member_casual n
## <chr> <int>
## 1 casual 1548865
## 2 member 2742114
cyclistic_df %>%
group_by(member_casual) %>%
summarise(average_ride_length = mean(ride_length), median_length = median(ride_length),
max_ride_length = max(ride_length), min_ride_length = min(ride_length))
## # A tibble: 2 × 5
## member_casual average_ride_length median_length max_ride_length
## <chr> <dbl> <dbl> <dbl>
## 1 casual 22.8 12.8 12136.
## 2 member 12.1 8.62 1498.
## # ℹ 1 more variable: min_ride_length <dbl>
cyclistic_df %>%
group_by(rideable_type) %>%
count(rideable_type)
## # A tibble: 3 × 2
## # Groups: rideable_type [3]
## rideable_type n
## <chr> <int>
## 1 classic_bike 2573959
## 2 docked_bike 96186
## 3 electric_bike 1620834
cyclistic_df %>%
group_by(member_casual, rideable_type) %>%
count(rideable_type)
## # A tibble: 5 × 3
## # Groups: member_casual, rideable_type [5]
## member_casual rideable_type n
## <chr> <chr> <int>
## 1 casual classic_bike 834698
## 2 casual docked_bike 96186
## 3 casual electric_bike 617981
## 4 member classic_bike 1739261
## 5 member electric_bike 1002853
cyclistic_df %>%
group_by(day_of_week) %>%
count(day_of_week)
## # A tibble: 7 × 2
## # Groups: day_of_week [7]
## day_of_week n
## <chr> <int>
## 1 Friday 622829
## 2 Monday 555925
## 3 Saturday 677915
## 4 Sunday 558920
## 5 Thursday 643295
## 6 Tuesday 608448
## 7 Wednesday 623647
cyclistic_df %>%
group_by(member_casual) %>%
count(day_of_week)
## # A tibble: 14 × 3
## # Groups: member_casual [2]
## member_casual day_of_week n
## <chr> <chr> <int>
## 1 casual Friday 230947
## 2 casual Monday 176484
## 3 casual Saturday 323663
## 4 casual Sunday 256547
## 5 casual Thursday 201629
## 6 casual Tuesday 176835
## 7 casual Wednesday 182760
## 8 member Friday 391882
## 9 member Monday 379441
## 10 member Saturday 354252
## 11 member Sunday 302373
## 12 member Thursday 441666
## 13 member Tuesday 431613
## 14 member Wednesday 440887
cyclistic_df %>%
group_by(time_of_day) %>%
count(time_of_day)
## # A tibble: 4 × 2
## # Groups: time_of_day [4]
## time_of_day n
## <chr> <int>
## 1 Afternoon 1909465
## 2 Evening 935698
## 3 Morning 1142757
## 4 Night 303059
cyclistic_df %>%
group_by(member_casual) %>%
count(time_of_day)
## # A tibble: 8 × 3
## # Groups: member_casual [2]
## member_casual time_of_day n
## <chr> <chr> <int>
## 1 casual Afternoon 730677
## 2 casual Evening 344873
## 3 casual Morning 327329
## 4 casual Night 145986
## 5 member Afternoon 1178788
## 6 member Evening 590825
## 7 member Morning 815428
## 8 member Night 157073
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Morning") %>%
count(time_of_day)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual time_of_day n
## <chr> <chr> <int>
## 1 casual Morning 327329
## 2 member Morning 815428
cyclistic_df %>%
filter(time_of_day == "Morning") %>%
count(time_of_day)
## # A tibble: 1 × 2
## time_of_day n
## <chr> <int>
## 1 Morning 1142757
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Afternoon") %>%
count(time_of_day)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual time_of_day n
## <chr> <chr> <int>
## 1 casual Afternoon 730677
## 2 member Afternoon 1178788
cyclistic_df %>%
filter(time_of_day == "Afternoon") %>%
count(time_of_day)
## # A tibble: 1 × 2
## time_of_day n
## <chr> <int>
## 1 Afternoon 1909465
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Evening") %>%
count(time_of_day)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual time_of_day n
## <chr> <chr> <int>
## 1 casual Evening 344873
## 2 member Evening 590825
cyclistic_df %>%
filter(time_of_day == "Evening") %>%
count(time_of_day)
## # A tibble: 1 × 2
## time_of_day n
## <chr> <int>
## 1 Evening 935698
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Night") %>%
count(time_of_day)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual time_of_day n
## <chr> <chr> <int>
## 1 casual Night 145986
## 2 member Night 157073
cyclistic_df %>%
filter(time_of_day == "Night") %>%
count(time_of_day)
## # A tibble: 1 × 2
## time_of_day n
## <chr> <int>
## 1 Night 303059
*Hour
cyclistic_df %>%
count(hour) %>%
print(n=24)
## # A tibble: 24 × 2
## hour n
## <int> <int>
## 1 0 49562
## 2 1 30897
## 3 2 17323
## 4 3 9737
## 5 4 9637
## 6 5 34089
## 7 6 105262
## 8 7 191937
## 9 8 240884
## 10 9 175533
## 11 10 177928
## 12 11 217124
## 13 12 250837
## 14 13 253258
## 15 14 259014
## 16 15 306870
## 17 16 390746
## 18 17 448740
## 19 18 359978
## 20 19 256332
## 21 20 178712
## 22 21 140676
## 23 22 111298
## 24 23 74605
cyclistic_df %>%
group_by(member_casual) %>%
count(hour) %>%
print(n=48)
## # A tibble: 48 × 3
## # Groups: member_casual [2]
## member_casual hour n
## <chr> <int> <int>
## 1 casual 0 25891
## 2 casual 1 16711
## 3 casual 2 9571
## 4 casual 3 4909
## 5 casual 4 3675
## 6 casual 5 8092
## 7 casual 6 22403
## 8 casual 7 38680
## 9 casual 8 52619
## 10 casual 9 52650
## 11 casual 10 67124
## 12 casual 11 85761
## 13 casual 12 101149
## 14 casual 13 105444
## 15 casual 14 110026
## 16 casual 15 121802
## 17 casual 16 139888
## 18 casual 17 152368
## 19 casual 18 128314
## 20 casual 19 93939
## 21 casual 20 67223
## 22 casual 21 55397
## 23 casual 22 49429
## 24 casual 23 35800
## 25 member 0 23671
## 26 member 1 14186
## 27 member 2 7752
## 28 member 3 4828
## 29 member 4 5962
## 30 member 5 25997
## 31 member 6 82859
## 32 member 7 153257
## 33 member 8 188265
## 34 member 9 122883
## 35 member 10 110804
## 36 member 11 131363
## 37 member 12 149688
## 38 member 13 147814
## 39 member 14 148988
## 40 member 15 185068
## 41 member 16 250858
## 42 member 17 296372
## 43 member 18 231664
## 44 member 19 162393
## 45 member 20 111489
## 46 member 21 85279
## 47 member 22 61869
## 48 member 23 38805
cyclistic_df %>%
count(month)
## # A tibble: 12 × 2
## month n
## <chr> <int>
## 1 01 148280
## 2 02 149552
## 3 03 200433
## 4 04 324173
## 5 05 463187
## 6 06 534719
## 7 07 573876
## 8 08 584821
## 9 09 506555
## 10 10 414238
## 11 11 255752
## 12 12 135393
cyclistic_df %>%
group_by(member_casual) %>%
count(month) %>%
print(n = 24)
## # A tibble: 24 × 3
## # Groups: member_casual [2]
## member_casual month n
## <chr> <chr> <int>
## 1 casual 01 29618
## 2 casual 02 32774
## 3 casual 03 46786
## 4 casual 04 110526
## 5 casual 05 177025
## 6 casual 06 219778
## 7 casual 07 245254
## 8 casual 08 233819
## 9 casual 09 196938
## 10 casual 10 151312
## 11 casual 11 73533
## 12 casual 12 31502
## 13 member 01 118662
## 14 member 02 116778
## 15 member 03 153647
## 16 member 04 213647
## 17 member 05 286162
## 18 member 06 314941
## 19 member 07 328622
## 20 member 08 351002
## 21 member 09 309617
## 22 member 10 262926
## 23 member 11 182219
## 24 member 12 103891
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Spring") %>%
count(season)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual season n
## <chr> <chr> <int>
## 1 casual Spring 334337
## 2 member Spring 653456
cyclistic_df %>%
filter(season == "Spring") %>%
count(season)
## # A tibble: 1 × 2
## season n
## <chr> <int>
## 1 Spring 987793
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Summer") %>%
count(season)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual season n
## <chr> <chr> <int>
## 1 casual Summer 698851
## 2 member Summer 994565
cyclistic_df %>%
filter(season == "Summer") %>%
count(season)
## # A tibble: 1 × 2
## season n
## <chr> <int>
## 1 Summer 1693416
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Fall") %>%
count(season)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual season n
## <chr> <chr> <int>
## 1 casual Fall 421783
## 2 member Fall 754762
cyclistic_df %>%
filter(season == "Fall") %>%
count(season)
## # A tibble: 1 × 2
## season n
## <chr> <int>
## 1 Fall 1176545
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Winter") %>%
count(season)
## # A tibble: 2 × 3
## # Groups: member_casual [2]
## member_casual season n
## <chr> <chr> <int>
## 1 casual Winter 93894
## 2 member Winter 339331
cyclistic_df %>%
filter(season == "Winter") %>%
count(season)
## # A tibble: 1 × 2
## season n
## <chr> <int>
## 1 Winter 433225
cyclistic_df %>%
group_by(season, member_casual) %>%
count(season)
## # A tibble: 8 × 3
## # Groups: season, member_casual [8]
## season member_casual n
## <chr> <chr> <int>
## 1 Fall casual 421783
## 2 Fall member 754762
## 3 Spring casual 334337
## 4 Spring member 653456
## 5 Summer casual 698851
## 6 Summer member 994565
## 7 Winter casual 93894
## 8 Winter member 339331
cyclistic_df %>%
group_by(season) %>%
count(season)
## # A tibble: 4 × 2
## # Groups: season [4]
## season n
## <chr> <int>
## 1 Fall 1176545
## 2 Spring 987793
## 3 Summer 1693416
## 4 Winter 433225
cyclistic_df %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 16.0
cyclistic_df %>%
group_by(member_casual) %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 22.8
## 2 member 12.1
cyclistic_df %>%
group_by(member_casual, rideable_type) %>%
summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 3
## # Groups: member_casual [2]
## member_casual rideable_type average_ride_length
## <chr> <chr> <dbl>
## 1 casual classic_bike 25.4
## 2 casual docked_bike 51.7
## 3 casual electric_bike 14.8
## 4 member classic_bike 13.0
## 5 member electric_bike 10.5
cyclistic_df %>%
group_by(rideable_type) %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 3 × 2
## rideable_type average_ride_length
## <chr> <dbl>
## 1 classic_bike 17.0
## 2 docked_bike 51.7
## 3 electric_bike 12.1
cyclistic_df %>%
group_by(hour, member_casual) %>%
summarise(average_ride_length = mean(ride_length)) %>%
print(n=48)
## `summarise()` has grouped output by 'hour'. You can override using the
## `.groups` argument.
## # A tibble: 48 × 3
## # Groups: hour [24]
## hour member_casual average_ride_length
## <int> <chr> <dbl>
## 1 0 casual 20.6
## 2 0 member 11.5
## 3 1 casual 20.5
## 4 1 member 11.8
## 5 2 casual 20.8
## 6 2 member 12.1
## 7 3 casual 19.6
## 8 3 member 12.6
## 9 4 casual 17.3
## 10 4 member 11.8
## 11 5 casual 14.6
## 12 5 member 9.96
## 13 6 casual 15.7
## 14 6 member 10.4
## 15 7 casual 14.7
## 16 7 member 10.8
## 17 8 casual 16.7
## 18 8 member 11.0
## 19 9 casual 23.0
## 20 9 member 11.2
## 21 10 casual 26.6
## 22 10 member 12.0
## 23 11 casual 27.3
## 24 11 member 12.2
## 25 12 casual 26.7
## 26 12 member 12.1
## 27 13 casual 26.3
## 28 13 member 12.1
## 29 14 casual 26.3
## 30 14 member 12.5
## 31 15 casual 24.8
## 32 15 member 12.4
## 33 16 casual 22.7
## 34 16 member 12.7
## 35 17 casual 21.5
## 36 17 member 12.9
## 37 18 casual 21.4
## 38 18 member 12.9
## 39 19 casual 21.3
## 40 19 member 12.6
## 41 20 casual 20.6
## 42 20 member 12.3
## 43 21 casual 19.9
## 44 21 member 12.0
## 45 22 casual 20.2
## 46 22 member 12.1
## 47 23 casual 20.1
## 48 23 member 12.0
cyclistic_df %>%
group_by(hour) %>%
summarise(average_ride_length = mean(ride_length)) %>%
print(n=24)
## # A tibble: 24 × 2
## hour average_ride_length
## <int> <dbl>
## 1 0 16.3
## 2 1 16.5
## 3 2 16.9
## 4 3 16.2
## 5 4 13.9
## 6 5 11.1
## 7 6 11.6
## 8 7 11.6
## 9 8 12.2
## 10 9 14.7
## 11 10 17.5
## 12 11 18.2
## 13 12 18.0
## 14 13 18.1
## 15 14 18.4
## 16 15 17.3
## 17 16 16.3
## 18 17 15.8
## 19 18 15.9
## 20 19 15.8
## 21 20 15.5
## 22 21 15.1
## 23 22 15.7
## 24 23 15.9
cyclistic_df %>%
group_by(member_casual, day_of_week) %>%
summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 3
## # Groups: member_casual [2]
## member_casual day_of_week average_ride_length
## <chr> <chr> <dbl>
## 1 casual Friday 22.2
## 2 casual Monday 22.4
## 3 casual Saturday 25.8
## 4 casual Sunday 26.2
## 5 casual Thursday 20.0
## 6 casual Tuesday 20.2
## 7 casual Wednesday 19.3
## 8 member Friday 12.1
## 9 member Monday 11.5
## 10 member Saturday 13.6
## 11 member Sunday 13.6
## 12 member Thursday 11.6
## 13 member Tuesday 11.6
## 14 member Wednesday 11.6
cyclistic_df %>%
group_by(day_of_week) %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 7 × 2
## day_of_week average_ride_length
## <chr> <dbl>
## 1 Friday 15.8
## 2 Monday 15.0
## 3 Saturday 19.4
## 4 Sunday 19.4
## 5 Thursday 14.2
## 6 Tuesday 14.1
## 7 Wednesday 13.8
Time of day
Overall
cyclistic_df %>%
group_by(time_of_day, member_casual) %>%
summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'time_of_day'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 3
## # Groups: time_of_day [4]
## time_of_day member_casual average_ride_length
## <chr> <chr> <dbl>
## 1 Afternoon casual 24.4
## 2 Afternoon member 12.5
## 3 Evening casual 21.0
## 4 Evening member 12.6
## 5 Morning casual 22.2
## 6 Morning member 11.2
## 7 Night casual 20.2
## 8 Night member 12.0
cyclistic_df %>%
group_by(time_of_day) %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 4 × 2
## time_of_day average_ride_length
## <chr> <dbl>
## 1 Afternoon 17.1
## 2 Evening 15.7
## 3 Morning 14.4
## 4 Night 15.9
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Morning") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 22.2
## 2 member 11.2
cyclistic_df %>%
filter(time_of_day == "Morning") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 14.4
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Afternoon") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 24.4
## 2 member 12.5
cyclistic_df %>%
filter(time_of_day == "Afternoon") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 17.1
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Evening") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 21.0
## 2 member 12.6
cyclistic_df %>%
filter(time_of_day == "Evening") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 15.7
cyclistic_df %>%
group_by(member_casual) %>%
filter(time_of_day == "Night") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 20.2
## 2 member 12.0
cyclistic_df %>%
filter(time_of_day == "Night") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 15.9
cyclistic_df %>%
group_by(month, member_casual) %>%
summarise(average_ride_length = mean(ride_length)) %>%
print(n=24) #lets you view entire tibble
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
## # A tibble: 24 × 3
## # Groups: month [12]
## month member_casual average_ride_length
## <chr> <chr> <dbl>
## 1 01 casual 14.9
## 2 01 member 10.0
## 3 02 casual 17.7
## 4 02 member 10.4
## 5 03 casual 16.7
## 6 03 member 10.2
## 7 04 casual 22.6
## 8 04 member 11.6
## 9 05 casual 24.5
## 10 05 member 12.7
## 11 06 casual 24.1
## 12 06 member 12.9
## 13 07 casual 25.2
## 14 07 member 13.4
## 15 08 casual 24.4
## 16 08 member 13.3
## 17 09 casual 23.5
## 18 09 member 12.7
## 19 10 casual 20.5
## 20 10 member 11.7
## 21 11 casual 17.2
## 22 11 member 10.8
## 23 12 casual 14.8
## 24 12 member 10.2
cyclistic_df %>%
group_by(month) %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 12 × 2
## month average_ride_length
## <chr> <dbl>
## 1 01 11.0
## 2 02 12.0
## 3 03 11.7
## 4 04 15.3
## 5 05 17.2
## 6 06 17.5
## 7 07 18.4
## 8 08 17.7
## 9 09 16.9
## 10 10 14.9
## 11 11 12.7
## 12 12 11.3
Season
Overall
cyclistic_df %>%
group_by(season, member_casual) %>%
summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'season'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 3
## # Groups: season [4]
## season member_casual average_ride_length
## <chr> <chr> <dbl>
## 1 Fall casual 21.3
## 2 Fall member 11.9
## 3 Spring casual 22.8
## 4 Spring member 11.7
## 5 Summer casual 24.6
## 6 Summer member 13.2
## 7 Winter casual 15.8
## 8 Winter member 10.2
cyclistic_df %>%
group_by(season) %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 4 × 2
## season average_ride_length
## <chr> <dbl>
## 1 Fall 15.3
## 2 Spring 15.5
## 3 Summer 17.9
## 4 Winter 11.4
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Spring") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 22.8
## 2 member 11.7
cyclistic_df %>%
filter(season == "Spring") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 15.5
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Summer") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 24.6
## 2 member 13.2
cyclistic_df %>%
filter(season == "Summer") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 17.9
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Fall") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 21.3
## 2 member 11.9
cyclistic_df %>%
filter(season == "Fall") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 15.3
cyclistic_df %>%
group_by(member_casual) %>%
filter(season == "Winter") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 15.8
## 2 member 10.2
cyclistic_df %>%
filter(season == "Winter") %>%
summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
## average_ride_length
## <dbl>
## 1 11.4
I provided three, data-driven recommendations to the stakeholders (Cyclistic executives and the marketing team) based on my analysis. Main business goal is converting casual riders into riders with annual membership.
Create special weekend promotions or limited-time offers that encourage casual riders to try membership benefits.
Promote the value of classic bikes through membership benefits.