Introduction

Cyclistic Bike Share Case Study is part of the Google Data Analytics Course I have done. In this case I worked as a Junior Data Analyst in the marketing analyst team for a fictional company - Cyclistic, based in Chicago. I followed the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Scenario

The director of marketing considers the company’s future success depends on maximizing the number of annual memberships. My team wants to understand how casual riders and annual members use Cyclistic bikes differently.

About the company

Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. Cyclistic was launched in 2016 and throught the years has grown to a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Cyclistic’s marketing strategy relied on building general awareness and appealing to broad castomer segments by offering flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Ask

Business question:

How do annual members and casual riders use Cyclistic bikes differently?

Business task

Maximizing the number of annual memberships as key to future growth.

Stakeholders

Key stakehoders:

  • Lily Moreno - the director of marketing who is responsible for the development of campaigns and initiatives to promote the bike-share program;
  • Cyclistic marketing analytics team - a team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy;
  • Cyclistic executive team - the notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Prepare

Data source

I used Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license. This is a public data but the data-privacy issues prohibit using riders’ personally identifiable information. This means that I won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Data organization

I choosen the most recent year of Cyclistic trip data (from October 2022 to September 2023) and downloaded each files from divvy-data as a zip Archive file. Then, data have been unzipped and saved as 12 CSV files, there is 1 file for each month. I created a folder on my desktop with 2 subfolders within - one for CSV files and another for XLS files. I used appropriate file-naming conventios for all of them. Next, I moved the downloaded files to the appropriate subfolder. Thanks to that, I have a copy of original data.

This is structured data, organized in rows and columns. Each record represents one trip, and each trip has a unique field that identifies it: ride_id.

ROCCC

I used ROCCC in terms of bias and credibility of data source.

  • Reliable and original This is public data that contains accurate, complete and unbiased info on Cyclistic’s historical bike trips.

  • Comprehensive and current This data source contain all the data needed to understand the different ways members and casual riders use Cyclistic bikes. The data is from the most relevant 12 months.

  • Cited This source is publicly available data provided by Cyclistic and the City of Chicago.

Process

In order to answer the key business questions, I have conducted the data analysis process using R and Tableau tool. Initially, I wanted to use Microsoft Excel or BigQuery tool to clean, manipulate and analyze the data, but because of memory and processor limitation, I finally decided for cleaning, verification and transforming data in R.

Data cleaning and manipulation

R

  1. Install and load necessary libraries and packages.
library(tidyverse)
## Warning: pakiet 'tidyverse' został zbudowany w wersji R 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
## Warning: pakiet 'janitor' został zbudowany w wersji R 4.3.2
## 
## Dołączanie pakietu: 'janitor'
## 
## Następujące obiekty zostały zakryte z 'package:stats':
## 
##     chisq.test, fisher.test
library(skimr)
## Warning: pakiet 'skimr' został zbudowany w wersji R 4.3.2
library(lubridate)
library(readr)
library(knitr)
library(hms)
## Warning: pakiet 'hms' został zbudowany w wersji R 4.3.2
## 
## Dołączanie pakietu: 'hms'
## 
## Następujący obiekt został zakryty z 'package:lubridate':
## 
##     hms
  1. Upload csv file for each month into R.
oct22_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202210-divvy-tripdata.csv")
## Rows: 558685 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nov22_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202211-divvy-tripdata.csv")
## Rows: 337735 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dec22_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202212-divvy-tripdata.csv")
## Rows: 181806 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jan23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202301-divvy-tripdata.csv")
## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
feb23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202302-divvy-tripdata.csv")
## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mar23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202303-divvy-tripdata.csv")
## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
apr23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202304-divvy-tripdata.csv")
## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
may23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202305-divvy-tripdata.csv")
## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jun23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202306-divvy-tripdata.csv")
## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jul23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202307-divvy-tripdata.csv")
## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
aug23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202308-divvy-tripdata.csv")
## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sep23_df <- read_csv("C:/Users/Asia/Documents/Project_cyclist/divvy-tripdata/202309-divvy-tripdata.csv")
## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Created one data frame called “tripdata_df” containing 12 months.
tripdata_df <- bind_rows(oct22_df, nov22_df, dec22_df, jan23_df, feb23_df, mar23_df, apr23_df, may23_df, jun23_df, jul23_df, aug23_df, sep23_df)
  1. Checked properties of new created data frame.
head(tripdata_df)
## # A tibble: 6 × 13
##   ride_id          rideable_type started_at          ended_at           
##   <chr>            <chr>         <dttm>              <dttm>             
## 1 A50255C1E17942AB classic_bike  2022-10-14 17:13:30 2022-10-14 17:19:39
## 2 DB692A70BD2DD4E3 electric_bike 2022-10-01 16:29:26 2022-10-01 16:49:06
## 3 3C02727AAF60F873 electric_bike 2022-10-19 18:55:40 2022-10-19 19:03:30
## 4 47E653FDC2D99236 electric_bike 2022-10-31 07:52:36 2022-10-31 07:58:49
## 5 8B5407BE535159BF classic_bike  2022-10-13 18:41:03 2022-10-13 19:26:18
## 6 A177C92E9F021B99 electric_bike 2022-10-13 15:53:27 2022-10-13 15:59:17
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
colnames(tripdata_df)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(tripdata_df)
## spc_tbl_ [5,674,399 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:5674399] "A50255C1E17942AB" "DB692A70BD2DD4E3" "3C02727AAF60F873" "47E653FDC2D99236" ...
##  $ rideable_type     : chr [1:5674399] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXct[1:5674399], format: "2022-10-14 17:13:30" "2022-10-01 16:29:26" ...
##  $ ended_at          : POSIXct[1:5674399], format: "2022-10-14 17:19:39" "2022-10-01 16:49:06" ...
##  $ start_station_name: chr [1:5674399] "Noble St & Milwaukee Ave" "Damen Ave & Charleston St" "Hoyne Ave & Balmoral Ave" "Rush St & Cedar St" ...
##  $ start_station_id  : chr [1:5674399] "13290" "13288" "655" "KA1504000133" ...
##  $ end_station_name  : chr [1:5674399] "Larrabee St & Division St" "Damen Ave & Cullerton St" "Western Ave & Leland Ave" "Orleans St & Chestnut St (NEXT Apts)" ...
##  $ end_station_id    : chr [1:5674399] "KA1504000079" "13089" "TA1307000140" "620" ...
##  $ start_lat         : num [1:5674399] 41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num [1:5674399] -87.7 -87.7 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num [1:5674399] 41.9 41.9 42 41.9 41.9 ...
##  $ end_lng           : num [1:5674399] -87.6 -87.7 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr [1:5674399] "member" "casual" "member" "member" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
skim_without_charts(tripdata_df)
Data summary
Name tripdata_df
Number of rows 5674399
Number of columns 13
_______________________
Column type frequency:
character 7
numeric 4
POSIXct 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ride_id 0 1.00 16 16 0 5674399 0
rideable_type 0 1.00 11 13 0 3 0
start_station_name 873186 0.85 3 64 0 1577 0
start_station_id 873318 0.85 3 36 0 1485 0
end_station_name 926160 0.84 3 64 0 1586 0
end_station_id 926301 0.84 3 35 0 1491 0
member_casual 0 1.00 6 6 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
start_lat 0 1 41.90 0.05 41.63 41.88 41.90 41.93 42.07
start_lng 0 1 -87.65 0.03 -87.94 -87.66 -87.64 -87.63 -87.46
end_lat 6642 1 41.90 0.07 0.00 41.88 41.90 41.93 42.18
end_lng 6642 1 -87.65 0.13 -88.16 -87.66 -87.64 -87.63 0.00

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
started_at 0 1 2022-10-01 00:00:15 2023-09-30 23:59:57 2023-06-04 06:55:17 4783277
ended_at 0 1 2022-10-01 00:01:05 2023-10-10 04:56:16 2023-06-04 07:26:26 4794256
summary(tripdata_df)
##    ride_id          rideable_type        started_at                    
##  Length:5674399     Length:5674399     Min.   :2022-10-01 00:00:15.00  
##  Class :character   Class :character   1st Qu.:2023-02-23 13:11:26.50  
##  Mode  :character   Mode  :character   Median :2023-06-04 06:55:17.00  
##                                        Mean   :2023-05-06 18:58:12.32  
##                                        3rd Qu.:2023-08-01 18:28:38.50  
##                                        Max.   :2023-09-30 23:59:57.00  
##                                                                        
##     ended_at                      start_station_name start_station_id  
##  Min.   :2022-10-01 00:01:05.00   Length:5674399     Length:5674399    
##  1st Qu.:2023-02-23 13:25:44.50   Class :character   Class :character  
##  Median :2023-06-04 07:26:26.00   Mode  :character   Mode  :character  
##  Mean   :2023-05-06 19:16:37.35                                        
##  3rd Qu.:2023-08-01 18:45:51.00                                        
##  Max.   :2023-10-10 04:56:16.00                                        
##                                                                        
##  end_station_name   end_station_id       start_lat       start_lng     
##  Length:5674399     Length:5674399     Min.   :41.63   Min.   :-87.94  
##  Class :character   Class :character   1st Qu.:41.88   1st Qu.:-87.66  
##  Mode  :character   Mode  :character   Median :41.90   Median :-87.64  
##                                        Mean   :41.90   Mean   :-87.65  
##                                        3rd Qu.:41.93   3rd Qu.:-87.63  
##                                        Max.   :42.07   Max.   :-87.46  
##                                                                        
##     end_lat         end_lng       member_casual     
##  Min.   : 0.00   Min.   :-88.16   Length:5674399    
##  1st Qu.:41.88   1st Qu.:-87.66   Class :character  
##  Median :41.90   Median :-87.64   Mode  :character  
##  Mean   :41.90   Mean   :-87.65                     
##  3rd Qu.:41.93   3rd Qu.:-87.63                     
##  Max.   :42.18   Max.   :  0.00                     
##  NA's   :6642    NA's   :6642
  1. Create a new data frame “cyclistic_df” and added new columns.
cyclistic_df <- tripdata_df
+ ride_length calculated the total ride length for each trip as subtracting the column "started_at" from the column "ended_at"; converted it to numeric and to minutes;
cyclistic_df$ride_length <- difftime(tripdata_df$ended_at, tripdata_df$started_at, units = "mins")
cyclistic_df <- cyclistic_df %>% 
  mutate(ride_length = as.numeric(ride_length))
is.numeric(cyclistic_df$ride_length) 
## [1] TRUE
+ day_of_week - the default format is yyyy-mm-dd
cyclistic_df$date <- as.Date(cyclistic_df$started_at) 
cyclistic_df$day_of_week_num <- wday(cyclistic_df$date, week_start = 1)
days_en <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
cyclistic_df$day_of_week <- days_en[cyclistic_df$day_of_week_num]
cyclistic_df$day_of_week_num <- NULL
+ month
cyclistic_df$month <- format(as.Date(cyclistic_df$date), "%m")
+ day
cyclistic_df$day <- format(as.Date(cyclistic_df$date), "%d") 
+ year
cyclistic_df$year <- format(as.Date(cyclistic_df$date), "%Y") 
+ time - formatted as HH:MM:SS
cyclistic_df$time <- format(as.Date(cyclistic_df$date), "%H:%M:%S")
cyclistic_df$time <- as_hms((cyclistic_df$started_at))
+ hour
cyclistic_df$hour <- hour(cyclistic_df$time)
+ season
cyclistic_df <- cyclistic_df %>% mutate(season=
                                                        case_when(month == "03"~ "Spring",
                                                                  month == "04"~ "Spring",
                                                                  month == "05"~ "Spring",
                                                                  month == "06"~ "Summer",
                                                                  month == "07"~ "Summer",
                                                                  month == "08"~ "Summer",
                                                                  month == "09"~ "Fall",
                                                                  month == "10"~ "Fall",
                                                                  month == "11"~ "Fall",
                                                                  month == "12"~ "Winter",
                                                                  month == "01"~ "Winter",
                                                                  month == "02"~ "Winter")
)
+ time_of_day
cyclistic_df <- cyclistic_df %>% mutate(time_of_day=
                                                        case_when(hour == "0" ~ "Night",
                                                                  hour == "1" ~ "Night",
                                                                  hour == "2" ~ "Night",
                                                                  hour == "3" ~ "Night",
                                                                  hour == "4" ~ "Night",
                                                                  hour == "5" ~ "Morning",
                                                                  hour == "6" ~ "Morning",
                                                                  hour == "7" ~ "Morning",
                                                                  hour == "8" ~ "Morning",
                                                                  hour == "9" ~ "Morning",
                                                                  hour == "10" ~ "Morning",
                                                                  hour == "11" ~ "Morning",
                                                                  hour == "12" ~ "Afternoon",
                                                                  hour == "13" ~ "Afternoon",
                                                                  hour == "14" ~ "Afternoon",
                                                                  hour == "15" ~ "Afternoon",
                                                                  hour == "16" ~ "Afternoon",
                                                                  hour == "17" ~ "Afternoon",
                                                                  hour == "18" ~ "Evening",
                                                                  hour == "19" ~ "Evening",
                                                                  hour == "20" ~ "Evening",
                                                                  hour == "21" ~ "Evening",
                                                                  hour == "22" ~ "Night",
                                                                  hour == "23" ~ "Night")
                                                      
)
  1. Cleaning process.

    • removing duplicate rows
cyclistic_df <- distinct(cyclistic_df) 
+ removing rows with NA values (blank rows)
cyclistic_df <- na.omit(cyclistic_df)
+ removing unnecessary columns (ride_id, start_station_id, end_station_id, start_lat,       start_long, end_lat, end_lng)
cyclistic_df <- cyclistic_df %>%  
  select(-c(ride_id, start_station_id,end_station_id,start_lat,start_lng,end_lat,end_lng)) 
+ removing all rows where ride_length is less then or equal to 0
cyclistic_df <- cyclistic_df[!(cyclistic_df$ride_length <=0),]
  1. Checked properties of completely “cyclistic_df” data frame.
head(cyclistic_df)
## # A tibble: 6 × 16
##   rideable_type started_at          ended_at            start_station_name      
##   <chr>         <dttm>              <dttm>              <chr>                   
## 1 classic_bike  2022-10-14 17:13:30 2022-10-14 17:19:39 Noble St & Milwaukee Ave
## 2 electric_bike 2022-10-01 16:29:26 2022-10-01 16:49:06 Damen Ave & Charleston …
## 3 electric_bike 2022-10-19 18:55:40 2022-10-19 19:03:30 Hoyne Ave & Balmoral Ave
## 4 electric_bike 2022-10-31 07:52:36 2022-10-31 07:58:49 Rush St & Cedar St      
## 5 classic_bike  2022-10-13 18:41:03 2022-10-13 19:26:18 900 W Harrison St       
## 6 electric_bike 2022-10-13 15:53:27 2022-10-13 15:59:17 900 W Harrison St       
## # ℹ 12 more variables: end_station_name <chr>, member_casual <chr>,
## #   ride_length <dbl>, date <date>, day_of_week <chr>, month <chr>, day <chr>,
## #   year <chr>, time <time>, hour <int>, season <chr>, time_of_day <chr>
colnames(cyclistic_df)
##  [1] "rideable_type"      "started_at"         "ended_at"          
##  [4] "start_station_name" "end_station_name"   "member_casual"     
##  [7] "ride_length"        "date"               "day_of_week"       
## [10] "month"              "day"                "year"              
## [13] "time"               "hour"               "season"            
## [16] "time_of_day"
str(cyclistic_df)
## tibble [4,290,979 × 16] (S3: tbl_df/tbl/data.frame)
##  $ rideable_type     : chr [1:4290979] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXct[1:4290979], format: "2022-10-14 17:13:30" "2022-10-01 16:29:26" ...
##  $ ended_at          : POSIXct[1:4290979], format: "2022-10-14 17:19:39" "2022-10-01 16:49:06" ...
##  $ start_station_name: chr [1:4290979] "Noble St & Milwaukee Ave" "Damen Ave & Charleston St" "Hoyne Ave & Balmoral Ave" "Rush St & Cedar St" ...
##  $ end_station_name  : chr [1:4290979] "Larrabee St & Division St" "Damen Ave & Cullerton St" "Western Ave & Leland Ave" "Orleans St & Chestnut St (NEXT Apts)" ...
##  $ member_casual     : chr [1:4290979] "member" "casual" "member" "member" ...
##  $ ride_length       : num [1:4290979] 6.15 19.67 7.83 6.22 45.25 ...
##  $ date              : Date[1:4290979], format: "2022-10-14" "2022-10-01" ...
##  $ day_of_week       : chr [1:4290979] "Friday" "Saturday" "Wednesday" "Monday" ...
##  $ month             : chr [1:4290979] "10" "10" "10" "10" ...
##  $ day               : chr [1:4290979] "14" "01" "19" "31" ...
##  $ year              : chr [1:4290979] "2022" "2022" "2022" "2022" ...
##  $ time              : 'hms' num [1:4290979] 17:13:30 16:29:26 18:55:40 07:52:36 ...
##   ..- attr(*, "units")= chr "secs"
##  $ hour              : int [1:4290979] 17 16 18 7 18 15 15 17 9 12 ...
##  $ season            : chr [1:4290979] "Fall" "Fall" "Fall" "Fall" ...
##  $ time_of_day       : chr [1:4290979] "Afternoon" "Afternoon" "Evening" "Morning" ...
##  - attr(*, "na.action")= 'omit' Named int [1:1382948] 2848 2849 2850 2851 2852 2854 2856 4359 4360 4361 ...
##   ..- attr(*, "names")= chr [1:1382948] "2848" "2849" "2850" "2851" ...
skim_without_charts(cyclistic_df)
Data summary
Name cyclistic_df
Number of rows 4290979
Number of columns 16
_______________________
Column type frequency:
character 10
Date 1
difftime 1
numeric 2
POSIXct 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
rideable_type 0 1 11 13 0 3 0
start_station_name 0 1 3 64 0 1519 0
end_station_name 0 1 3 64 0 1541 0
member_casual 0 1 6 6 0 2 0
day_of_week 0 1 6 9 0 7 0
month 0 1 2 2 0 12 0
day 0 1 2 2 0 31 0
year 0 1 4 4 0 2 0
season 0 1 4 6 0 4 0
time_of_day 0 1 5 9 0 4 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2022-10-01 2023-09-30 2023-06-03 365

Variable type: difftime

skim_variable n_missing complete_rate min max median n_unique
time 0 1 0 secs 86399 secs 15:25:20 85759

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
ride_length 0 1 15.97 35.51 0.02 5.63 9.83 17.55 12136.3
hour 0 1 14.08 4.88 0.00 11.00 15.00 18.00 23.0

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
started_at 0 1 2022-10-01 00:00:15 2023-09-30 23:59:57 2023-06-03 16:23:05 3753750
ended_at 0 1 2022-10-01 00:02:52 2023-10-01 18:32:53 2023-06-03 16:46:01 3763883
summary(cyclistic_df)
##  rideable_type        started_at                    
##  Length:4290979     Min.   :2022-10-01 00:00:15.00  
##  Class :character   1st Qu.:2023-02-23 15:26:44.50  
##  Mode  :character   Median :2023-06-03 16:23:05.00  
##                     Mean   :2023-05-06 20:46:11.23  
##                     3rd Qu.:2023-08-01 20:44:07.50  
##                     Max.   :2023-09-30 23:59:57.00  
##     ended_at                      start_station_name end_station_name  
##  Min.   :2022-10-01 00:02:52.00   Length:4290979     Length:4290979    
##  1st Qu.:2023-02-23 15:36:42.50   Class :character   Class :character  
##  Median :2023-06-03 16:46:01.00   Mode  :character   Mode  :character  
##  Mean   :2023-05-06 21:02:09.30                                        
##  3rd Qu.:2023-08-01 21:02:45.50                                        
##  Max.   :2023-10-01 18:32:53.00                                        
##  member_casual       ride_length             date            day_of_week       
##  Length:4290979     Min.   :    0.017   Min.   :2022-10-01   Length:4290979    
##  Class :character   1st Qu.:    5.633   1st Qu.:2023-02-23   Class :character  
##  Mode  :character   Median :    9.833   Median :2023-06-03   Mode  :character  
##                     Mean   :   15.968   Mean   :2023-05-06                     
##                     3rd Qu.:   17.550   3rd Qu.:2023-08-01                     
##                     Max.   :12136.300   Max.   :2023-09-30                     
##     month               day                year               time         
##  Length:4290979     Length:4290979     Length:4290979     Length:4290979   
##  Class :character   Class :character   Class :character   Class1:hms       
##  Mode  :character   Mode  :character   Mode  :character   Class2:difftime  
##                                                           Mode  :numeric   
##                                                                            
##                                                                            
##       hour          season          time_of_day       
##  Min.   : 0.00   Length:4290979     Length:4290979    
##  1st Qu.:11.00   Class :character   Class :character  
##  Median :15.00   Mode  :character   Mode  :character  
##  Mean   :14.08                                        
##  3rd Qu.:18.00                                        
##  Max.   :23.00

Analyze

In this phase, I performed calculations and analyzed data to uncover trends and patterns between the casual and the member users.

Calculations by member type

  • Total rides
nrow(cyclistic_df)
## [1] 4290979
  • Member type
cyclistic_df %>% 
  group_by(member_casual) %>% 
  count(member_casual)
## # A tibble: 2 × 2
## # Groups:   member_casual [2]
##   member_casual       n
##   <chr>           <int>
## 1 casual        1548865
## 2 member        2742114
  • Average ride length
cyclistic_df %>%
  group_by(member_casual) %>% 
  summarise(average_ride_length = mean(ride_length), median_length = median(ride_length), 
            max_ride_length = max(ride_length), min_ride_length = min(ride_length))
## # A tibble: 2 × 5
##   member_casual average_ride_length median_length max_ride_length
##   <chr>                       <dbl>         <dbl>           <dbl>
## 1 casual                       22.8         12.8           12136.
## 2 member                       12.1          8.62           1498.
## # ℹ 1 more variable: min_ride_length <dbl>
  • Type of bike
cyclistic_df %>%
  group_by(rideable_type) %>% 
  count(rideable_type)
## # A tibble: 3 × 2
## # Groups:   rideable_type [3]
##   rideable_type       n
##   <chr>           <int>
## 1 classic_bike  2573959
## 2 docked_bike     96186
## 3 electric_bike 1620834
cyclistic_df %>% 
  group_by(member_casual, rideable_type) %>% 
  count(rideable_type)
## # A tibble: 5 × 3
## # Groups:   member_casual, rideable_type [5]
##   member_casual rideable_type       n
##   <chr>         <chr>           <int>
## 1 casual        classic_bike   834698
## 2 casual        docked_bike     96186
## 3 casual        electric_bike  617981
## 4 member        classic_bike  1739261
## 5 member        electric_bike 1002853
  • Day of the week
cyclistic_df %>%
  group_by(day_of_week) %>% 
  count(day_of_week)
## # A tibble: 7 × 2
## # Groups:   day_of_week [7]
##   day_of_week      n
##   <chr>        <int>
## 1 Friday      622829
## 2 Monday      555925
## 3 Saturday    677915
## 4 Sunday      558920
## 5 Thursday    643295
## 6 Tuesday     608448
## 7 Wednesday   623647
cyclistic_df %>% 
  group_by(member_casual) %>% 
  count(day_of_week)
## # A tibble: 14 × 3
## # Groups:   member_casual [2]
##    member_casual day_of_week      n
##    <chr>         <chr>        <int>
##  1 casual        Friday      230947
##  2 casual        Monday      176484
##  3 casual        Saturday    323663
##  4 casual        Sunday      256547
##  5 casual        Thursday    201629
##  6 casual        Tuesday     176835
##  7 casual        Wednesday   182760
##  8 member        Friday      391882
##  9 member        Monday      379441
## 10 member        Saturday    354252
## 11 member        Sunday      302373
## 12 member        Thursday    441666
## 13 member        Tuesday     431613
## 14 member        Wednesday   440887
  • Time of day
cyclistic_df %>%
  group_by(time_of_day) %>% 
  count(time_of_day)
## # A tibble: 4 × 2
## # Groups:   time_of_day [4]
##   time_of_day       n
##   <chr>         <int>
## 1 Afternoon   1909465
## 2 Evening      935698
## 3 Morning     1142757
## 4 Night        303059
cyclistic_df %>% 
  group_by(member_casual) %>% 
  count(time_of_day)
## # A tibble: 8 × 3
## # Groups:   member_casual [2]
##   member_casual time_of_day       n
##   <chr>         <chr>         <int>
## 1 casual        Afternoon    730677
## 2 casual        Evening      344873
## 3 casual        Morning      327329
## 4 casual        Night        145986
## 5 member        Afternoon   1178788
## 6 member        Evening      590825
## 7 member        Morning      815428
## 8 member        Night        157073
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Morning") %>% 
  count(time_of_day)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual time_of_day      n
##   <chr>         <chr>        <int>
## 1 casual        Morning     327329
## 2 member        Morning     815428
cyclistic_df %>% 
  filter(time_of_day == "Morning") %>% 
  count(time_of_day)
## # A tibble: 1 × 2
##   time_of_day       n
##   <chr>         <int>
## 1 Morning     1142757
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Afternoon") %>% 
  count(time_of_day)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual time_of_day       n
##   <chr>         <chr>         <int>
## 1 casual        Afternoon    730677
## 2 member        Afternoon   1178788
cyclistic_df %>% 
  filter(time_of_day == "Afternoon") %>% 
  count(time_of_day)
## # A tibble: 1 × 2
##   time_of_day       n
##   <chr>         <int>
## 1 Afternoon   1909465
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Evening") %>% 
  count(time_of_day)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual time_of_day      n
##   <chr>         <chr>        <int>
## 1 casual        Evening     344873
## 2 member        Evening     590825
cyclistic_df %>% 
  filter(time_of_day == "Evening") %>% 
  count(time_of_day)
## # A tibble: 1 × 2
##   time_of_day      n
##   <chr>        <int>
## 1 Evening     935698
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Night") %>% 
  count(time_of_day)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual time_of_day      n
##   <chr>         <chr>        <int>
## 1 casual        Night       145986
## 2 member        Night       157073
cyclistic_df %>% 
  filter(time_of_day == "Night") %>% 
  count(time_of_day)
## # A tibble: 1 × 2
##   time_of_day      n
##   <chr>        <int>
## 1 Night       303059

*Hour

cyclistic_df %>% 
  count(hour) %>%
  print(n=24)
## # A tibble: 24 × 2
##     hour      n
##    <int>  <int>
##  1     0  49562
##  2     1  30897
##  3     2  17323
##  4     3   9737
##  5     4   9637
##  6     5  34089
##  7     6 105262
##  8     7 191937
##  9     8 240884
## 10     9 175533
## 11    10 177928
## 12    11 217124
## 13    12 250837
## 14    13 253258
## 15    14 259014
## 16    15 306870
## 17    16 390746
## 18    17 448740
## 19    18 359978
## 20    19 256332
## 21    20 178712
## 22    21 140676
## 23    22 111298
## 24    23  74605
cyclistic_df %>% 
  group_by(member_casual) %>% 
  count(hour) %>%
  print(n=48)
## # A tibble: 48 × 3
## # Groups:   member_casual [2]
##    member_casual  hour      n
##    <chr>         <int>  <int>
##  1 casual            0  25891
##  2 casual            1  16711
##  3 casual            2   9571
##  4 casual            3   4909
##  5 casual            4   3675
##  6 casual            5   8092
##  7 casual            6  22403
##  8 casual            7  38680
##  9 casual            8  52619
## 10 casual            9  52650
## 11 casual           10  67124
## 12 casual           11  85761
## 13 casual           12 101149
## 14 casual           13 105444
## 15 casual           14 110026
## 16 casual           15 121802
## 17 casual           16 139888
## 18 casual           17 152368
## 19 casual           18 128314
## 20 casual           19  93939
## 21 casual           20  67223
## 22 casual           21  55397
## 23 casual           22  49429
## 24 casual           23  35800
## 25 member            0  23671
## 26 member            1  14186
## 27 member            2   7752
## 28 member            3   4828
## 29 member            4   5962
## 30 member            5  25997
## 31 member            6  82859
## 32 member            7 153257
## 33 member            8 188265
## 34 member            9 122883
## 35 member           10 110804
## 36 member           11 131363
## 37 member           12 149688
## 38 member           13 147814
## 39 member           14 148988
## 40 member           15 185068
## 41 member           16 250858
## 42 member           17 296372
## 43 member           18 231664
## 44 member           19 162393
## 45 member           20 111489
## 46 member           21  85279
## 47 member           22  61869
## 48 member           23  38805

  • Month
cyclistic_df %>%
  count(month) 
## # A tibble: 12 × 2
##    month      n
##    <chr>  <int>
##  1 01    148280
##  2 02    149552
##  3 03    200433
##  4 04    324173
##  5 05    463187
##  6 06    534719
##  7 07    573876
##  8 08    584821
##  9 09    506555
## 10 10    414238
## 11 11    255752
## 12 12    135393
cyclistic_df %>%
  group_by(member_casual) %>% 
  count(month) %>% 
  print(n = 24)
## # A tibble: 24 × 3
## # Groups:   member_casual [2]
##    member_casual month      n
##    <chr>         <chr>  <int>
##  1 casual        01     29618
##  2 casual        02     32774
##  3 casual        03     46786
##  4 casual        04    110526
##  5 casual        05    177025
##  6 casual        06    219778
##  7 casual        07    245254
##  8 casual        08    233819
##  9 casual        09    196938
## 10 casual        10    151312
## 11 casual        11     73533
## 12 casual        12     31502
## 13 member        01    118662
## 14 member        02    116778
## 15 member        03    153647
## 16 member        04    213647
## 17 member        05    286162
## 18 member        06    314941
## 19 member        07    328622
## 20 member        08    351002
## 21 member        09    309617
## 22 member        10    262926
## 23 member        11    182219
## 24 member        12    103891

  • Season
cyclistic_df %>%
  group_by(member_casual) %>% 
  filter(season == "Spring") %>% 
  count(season)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual season      n
##   <chr>         <chr>   <int>
## 1 casual        Spring 334337
## 2 member        Spring 653456
cyclistic_df %>%
  filter(season == "Spring") %>% 
  count(season)
## # A tibble: 1 × 2
##   season      n
##   <chr>   <int>
## 1 Spring 987793
cyclistic_df %>%
  group_by(member_casual) %>% 
  filter(season == "Summer") %>% 
  count(season)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual season      n
##   <chr>         <chr>   <int>
## 1 casual        Summer 698851
## 2 member        Summer 994565
cyclistic_df %>%
  filter(season == "Summer") %>% 
  count(season)
## # A tibble: 1 × 2
##   season       n
##   <chr>    <int>
## 1 Summer 1693416
cyclistic_df %>%
  group_by(member_casual) %>% 
  filter(season == "Fall") %>% 
  count(season)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual season      n
##   <chr>         <chr>   <int>
## 1 casual        Fall   421783
## 2 member        Fall   754762
cyclistic_df %>%
  filter(season == "Fall") %>% 
  count(season)
## # A tibble: 1 × 2
##   season       n
##   <chr>    <int>
## 1 Fall   1176545
cyclistic_df %>%
  group_by(member_casual) %>% 
  filter(season == "Winter") %>% 
  count(season)
## # A tibble: 2 × 3
## # Groups:   member_casual [2]
##   member_casual season      n
##   <chr>         <chr>   <int>
## 1 casual        Winter  93894
## 2 member        Winter 339331
cyclistic_df %>%
  filter(season == "Winter") %>% 
  count(season)
## # A tibble: 1 × 2
##   season      n
##   <chr>   <int>
## 1 Winter 433225
cyclistic_df %>%
  group_by(season, member_casual) %>% 
  count(season)
## # A tibble: 8 × 3
## # Groups:   season, member_casual [8]
##   season member_casual      n
##   <chr>  <chr>          <int>
## 1 Fall   casual        421783
## 2 Fall   member        754762
## 3 Spring casual        334337
## 4 Spring member        653456
## 5 Summer casual        698851
## 6 Summer member        994565
## 7 Winter casual         93894
## 8 Winter member        339331
cyclistic_df %>%
  group_by(season) %>% 
  count(season)
## # A tibble: 4 × 2
## # Groups:   season [4]
##   season       n
##   <chr>    <int>
## 1 Fall   1176545
## 2 Spring  987793
## 3 Summer 1693416
## 4 Winter  433225

Calculations by average ride length

cyclistic_df %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                16.0
  • Member type
cyclistic_df %>%
  group_by(member_casual) %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       22.8
## 2 member                       12.1
  • Type of bike
cyclistic_df %>% 
  group_by(member_casual, rideable_type) %>% 
  summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 3
## # Groups:   member_casual [2]
##   member_casual rideable_type average_ride_length
##   <chr>         <chr>                       <dbl>
## 1 casual        classic_bike                 25.4
## 2 casual        docked_bike                  51.7
## 3 casual        electric_bike                14.8
## 4 member        classic_bike                 13.0
## 5 member        electric_bike                10.5
cyclistic_df %>% 
  group_by(rideable_type) %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 3 × 2
##   rideable_type average_ride_length
##   <chr>                       <dbl>
## 1 classic_bike                 17.0
## 2 docked_bike                  51.7
## 3 electric_bike                12.1
  • Hour
cyclistic_df %>%
group_by(hour, member_casual) %>% 
  summarise(average_ride_length = mean(ride_length)) %>% 
  print(n=48) 
## `summarise()` has grouped output by 'hour'. You can override using the
## `.groups` argument.
## # A tibble: 48 × 3
## # Groups:   hour [24]
##     hour member_casual average_ride_length
##    <int> <chr>                       <dbl>
##  1     0 casual                      20.6 
##  2     0 member                      11.5 
##  3     1 casual                      20.5 
##  4     1 member                      11.8 
##  5     2 casual                      20.8 
##  6     2 member                      12.1 
##  7     3 casual                      19.6 
##  8     3 member                      12.6 
##  9     4 casual                      17.3 
## 10     4 member                      11.8 
## 11     5 casual                      14.6 
## 12     5 member                       9.96
## 13     6 casual                      15.7 
## 14     6 member                      10.4 
## 15     7 casual                      14.7 
## 16     7 member                      10.8 
## 17     8 casual                      16.7 
## 18     8 member                      11.0 
## 19     9 casual                      23.0 
## 20     9 member                      11.2 
## 21    10 casual                      26.6 
## 22    10 member                      12.0 
## 23    11 casual                      27.3 
## 24    11 member                      12.2 
## 25    12 casual                      26.7 
## 26    12 member                      12.1 
## 27    13 casual                      26.3 
## 28    13 member                      12.1 
## 29    14 casual                      26.3 
## 30    14 member                      12.5 
## 31    15 casual                      24.8 
## 32    15 member                      12.4 
## 33    16 casual                      22.7 
## 34    16 member                      12.7 
## 35    17 casual                      21.5 
## 36    17 member                      12.9 
## 37    18 casual                      21.4 
## 38    18 member                      12.9 
## 39    19 casual                      21.3 
## 40    19 member                      12.6 
## 41    20 casual                      20.6 
## 42    20 member                      12.3 
## 43    21 casual                      19.9 
## 44    21 member                      12.0 
## 45    22 casual                      20.2 
## 46    22 member                      12.1 
## 47    23 casual                      20.1 
## 48    23 member                      12.0
cyclistic_df %>% 
  group_by(hour) %>% 
  summarise(average_ride_length = mean(ride_length)) %>% 
  print(n=24) 
## # A tibble: 24 × 2
##     hour average_ride_length
##    <int>               <dbl>
##  1     0                16.3
##  2     1                16.5
##  3     2                16.9
##  4     3                16.2
##  5     4                13.9
##  6     5                11.1
##  7     6                11.6
##  8     7                11.6
##  9     8                12.2
## 10     9                14.7
## 11    10                17.5
## 12    11                18.2
## 13    12                18.0
## 14    13                18.1
## 15    14                18.4
## 16    15                17.3
## 17    16                16.3
## 18    17                15.8
## 19    18                15.9
## 20    19                15.8
## 21    20                15.5
## 22    21                15.1
## 23    22                15.7
## 24    23                15.9
  • Day of the week
cyclistic_df %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 3
## # Groups:   member_casual [2]
##    member_casual day_of_week average_ride_length
##    <chr>         <chr>                     <dbl>
##  1 casual        Friday                     22.2
##  2 casual        Monday                     22.4
##  3 casual        Saturday                   25.8
##  4 casual        Sunday                     26.2
##  5 casual        Thursday                   20.0
##  6 casual        Tuesday                    20.2
##  7 casual        Wednesday                  19.3
##  8 member        Friday                     12.1
##  9 member        Monday                     11.5
## 10 member        Saturday                   13.6
## 11 member        Sunday                     13.6
## 12 member        Thursday                   11.6
## 13 member        Tuesday                    11.6
## 14 member        Wednesday                  11.6
cyclistic_df %>% 
  group_by(day_of_week) %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 7 × 2
##   day_of_week average_ride_length
##   <chr>                     <dbl>
## 1 Friday                     15.8
## 2 Monday                     15.0
## 3 Saturday                   19.4
## 4 Sunday                     19.4
## 5 Thursday                   14.2
## 6 Tuesday                    14.1
## 7 Wednesday                  13.8
  • Time of day

  • Overall

cyclistic_df %>% 
  group_by(time_of_day, member_casual) %>% 
  summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'time_of_day'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 3
## # Groups:   time_of_day [4]
##   time_of_day member_casual average_ride_length
##   <chr>       <chr>                       <dbl>
## 1 Afternoon   casual                       24.4
## 2 Afternoon   member                       12.5
## 3 Evening     casual                       21.0
## 4 Evening     member                       12.6
## 5 Morning     casual                       22.2
## 6 Morning     member                       11.2
## 7 Night       casual                       20.2
## 8 Night       member                       12.0
cyclistic_df %>% 
  group_by(time_of_day) %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 4 × 2
##   time_of_day average_ride_length
##   <chr>                     <dbl>
## 1 Afternoon                  17.1
## 2 Evening                    15.7
## 3 Morning                    14.4
## 4 Night                      15.9
  • Morning
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Morning") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       22.2
## 2 member                       11.2
cyclistic_df %>% 
  filter(time_of_day == "Morning") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                14.4
  • Afternoon
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Afternoon") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       24.4
## 2 member                       12.5
cyclistic_df %>% 
  filter(time_of_day == "Afternoon") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                17.1
  • Evening
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Evening") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       21.0
## 2 member                       12.6
cyclistic_df %>% 
  filter(time_of_day == "Evening") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                15.7
  • Night
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(time_of_day == "Night") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       20.2
## 2 member                       12.0
cyclistic_df %>% 
  filter(time_of_day == "Night") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                15.9
  • Month
cyclistic_df %>% 
  group_by(month, member_casual) %>% 
  summarise(average_ride_length = mean(ride_length)) %>% 
  print(n=24)  #lets you view entire tibble
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
## # A tibble: 24 × 3
## # Groups:   month [12]
##    month member_casual average_ride_length
##    <chr> <chr>                       <dbl>
##  1 01    casual                       14.9
##  2 01    member                       10.0
##  3 02    casual                       17.7
##  4 02    member                       10.4
##  5 03    casual                       16.7
##  6 03    member                       10.2
##  7 04    casual                       22.6
##  8 04    member                       11.6
##  9 05    casual                       24.5
## 10 05    member                       12.7
## 11 06    casual                       24.1
## 12 06    member                       12.9
## 13 07    casual                       25.2
## 14 07    member                       13.4
## 15 08    casual                       24.4
## 16 08    member                       13.3
## 17 09    casual                       23.5
## 18 09    member                       12.7
## 19 10    casual                       20.5
## 20 10    member                       11.7
## 21 11    casual                       17.2
## 22 11    member                       10.8
## 23 12    casual                       14.8
## 24 12    member                       10.2
cyclistic_df %>% 
  group_by(month) %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 12 × 2
##    month average_ride_length
##    <chr>               <dbl>
##  1 01                   11.0
##  2 02                   12.0
##  3 03                   11.7
##  4 04                   15.3
##  5 05                   17.2
##  6 06                   17.5
##  7 07                   18.4
##  8 08                   17.7
##  9 09                   16.9
## 10 10                   14.9
## 11 11                   12.7
## 12 12                   11.3
  • Season

  • Overall

cyclistic_df %>% 
  group_by(season, member_casual) %>% 
  summarise(average_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'season'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 3
## # Groups:   season [4]
##   season member_casual average_ride_length
##   <chr>  <chr>                       <dbl>
## 1 Fall   casual                       21.3
## 2 Fall   member                       11.9
## 3 Spring casual                       22.8
## 4 Spring member                       11.7
## 5 Summer casual                       24.6
## 6 Summer member                       13.2
## 7 Winter casual                       15.8
## 8 Winter member                       10.2
cyclistic_df %>% 
  group_by(season) %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 4 × 2
##   season average_ride_length
##   <chr>                <dbl>
## 1 Fall                  15.3
## 2 Spring                15.5
## 3 Summer                17.9
## 4 Winter                11.4
  • Spring
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(season == "Spring") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       22.8
## 2 member                       11.7
cyclistic_df %>% 
  filter(season == "Spring") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                15.5
  • Summer
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(season == "Summer") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       24.6
## 2 member                       13.2
cyclistic_df %>% 
  filter(season == "Summer") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                17.9
  • Fall
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(season == "Fall") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       21.3
## 2 member                       11.9
cyclistic_df %>% 
  filter(season == "Fall") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                15.3
  • Winter
cyclistic_df %>% 
  group_by(member_casual) %>% 
  filter(season == "Winter") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>                       <dbl>
## 1 casual                       15.8
## 2 member                       10.2
cyclistic_df %>% 
  filter(season == "Winter") %>% 
  summarise(average_ride_length = mean(ride_length))
## # A tibble: 1 × 1
##   average_ride_length
##                 <dbl>
## 1                11.4

Share

In this phase of data analysis I created visualization in Tableau to convey my data findings more quickly.

—– Rides by User Type —–

Members had almost two times more rides than casuals. Their rides reached 2,680,491 ride within year (63,79%) and casuals riders were 1,521,765 (36,21%).

—– Rides by Bike Type —–

Classic bike was the most popular bike type 2,533,979, chosen by both casuals and members. The second choice was electric bike in amount of 1,572,909. Docked bike was chosen only by casuals 95,368 times within year.

—– Rides by Weekday —–

Saturday was the busiest day of the week with 663,675. Casual riders the most ride on Sunday (251,948). Wednesday was the day of greatest activity for members (431,532, but only slightly different from Thursday (431,524).

—– Average Ride Length by Weekday —–

Weekend (Saturday and Sunday) shows the greatest activity of members and casuals. The average journey length on Sunday for a member was 13,90 minutes and for a casual it was 26,68 minutes.

—– Rides by Hour —–

Definitely, the busiest hour within a week was 17. Both members and casuals showed the greatest activity afternoon. From 6:00 a.m. the number of riders gradually increased and so on until 8:00 a.m. Another increase is noticeable from 14 and with a peak at 17. Early morning hours, such as 3 and 4, were much less active.

—– Rides by Month —–

The summer months were the busiest of the year. Taking into account all users, August slightly won over June. Members most often used bicycles in August and casuals were most visible on the street in June.

Act

I provided three, data-driven recommendations to the stakeholders (Cyclistic executives and the marketing team) based on my analysis. Main business goal is converting casual riders into riders with annual membership.

Key Findings

  • Members ride significantly more often than casual riders.
    • Annual members completed 2,680,491 rides (63.79%), nearly double the number of casual riders (1,521,765 rides or 36.21%).
  • Classic bikes are the most popular for all users.
    • The Classic bike was the top choice with 2,533,979 rides, used by both members and casuals;
    • Electric bikes followed with 1,572,909 rides;
    • Docked bikes were used exclusively by casual riders (95,368 rides), indicating a potential barrier or difference in accessibility.
  • Casual riders prefer weekends, while members ride during the week.
    • Casual riders are most active on Sundays, aligning with leisure use;
    • Members ride most often on Wednesdays and Thursdays, suggesting a commuting or routine pattern.
  • Casual riders take longer trips than members.
    • On Sundays, casuals averaged 26.68 minutes per ride, while members averaged only 13.90 minutes;
    • Casual users tend to take longer, more leisurely trips - possibly indicating a tourism or recreational usage pattern.
  • Peak riding time for both groups is 5 PM.
    • The busiest time of day for all users is 17:00 (5 PM), especially on weekdays.
  • Summer months drive the highest usage.
    • August had the most rides overall, with members peaking in August and casuals in June.

Recommendations

  1. Create special weekend promotions or limited-time offers that encourage casual riders to try membership benefits.

  2. Promote the value of classic bikes through membership benefits.

  • Offer “classic-only” membership discounts or introductory plans for casual users who mostly use classic bikes.
  1. Use peak hour engagement to trigger timely membership offers:
  • Trigger in-app push notifications or emails around 5 PM;
  • Offer “commuter plan” trials for casuals who ride during peak hours.