For my final project of the Spring 2025 Data Visualization and Design class at the CUNY Graduate Center I chose to work with the Citibike ridership dataset from 2024. I began this project with the desire to create a tool where anyone can explore this massive dataset and answer their own questions. This post combines some graphics from that tool as well as some more run-of-the-mill visualization in effort to make sense of this mountain of data.

The Data
Citibike distributes ridership data via an Amazon AWS S3 bucket. The method of packaging for the year of 2024 is monthly ZIP files that contain a seemingly random number of CSV files with the actual ride data. There is not much of a data dictionary for this dataset, but the columns and data types are easy to infer. Each row in this dataset contains information about a unique ride, including:
- Ride start and end date and time
- The identification numbers and names for the start and end stations
- GPS coordinates for the start and end stations
- Bike type (classic or electric)
Data Processing
To handle the amount of data I leveraged Python 3.12 along with several data-focused libraries such as Pandas, GeoPandas, Multiprocessing, etc. The code for this project is available on GitHub. Data processing took place over several steps beginning by extracting all stations along with their GPS locations and the first month a ride was observed in 2024 (around 80 stations were installed over the year). Ride data was then extracted and cleaned to minimize data redundancy. In the end the processed data (almost 30GB!) is written in JSON format so that a web front-end could be constructed to allow exploration of the data.
Data Exploration

I will be honest, the visualization of data via bars, pies, and charts is really of secondary interest to me in this project. The vast majority of my work with this data is on the exploration side, via a front-end that is available here. When loaded, all stations will have markers placed on the map. These station markers can be selected to view rides that were recorded as either starting or ending at that station. Various statistics are presented to provide more context, including an hourly histogram of activity, the net bike flux for the active month, and breakdown of inbound/outbound rides. The entire year of data is available in this tool and the Options window provides controls to select which month and what kinds of rides are displayed.
Visualizations
I wanted to investigate a series of questions that I felt could be addressed via this data. Tableau has limitations in both row count and dataset size that required some very creative aggregations to make 45 million rides available to visualize with the software, so I do apologize for any loss of detail involved.
“Who is riding Citibike, and when do they ride?”
In 2024 there were around 45 million Citibike rides. These bikes are active 24 hours a day, criss-crossing the city in an endless tapestry of wheels on pavement. Below is a heatmap of ride activity that can be filtered to highlight the different temporal patterns of casual riders compared to those of Citibike members. Casual riders are most active on the weekend and somewhat during the dinner hours throughout the week. Members, on the other hand, are more likely to be hitting the streets during commute hours for workers (7-9 AM and 5-7 PM.
“Who rides electric bikes and who rides classic bikes?”
This visual highlights the type of bikes that are chosen by Casual Riders as well as Citibike Members. Casual riders are most likely to be tourists, while members are likely to be commuters, exercise bikers, or micro-mobility focused individuals. While both groups ride mostly electric bikes, a much higher percentage of Citibike Members choose classic bikes instead.
“What influences the choice between Electric and Classic bike?”
I wanted to investigate some hunches about factors that may influence a riders choice between riding a classic bike or an electric one. Citibike is present in all New York City boroughs except Staten Island, though only Manhattan has Citibike stations available throughout the entire borough. Between that and the population density of Manhattan, it is pretty obvious that Manhattan will have quite a bit more Citibike activity than the other boroughs. More importantly, it has the greatest density of stations, so navigating the borough is very convenient on bike compared to the outer boroughs.

“How far is the rider going?”
Thinking over the trends mentioned above, I set about investigating the relationship between ride distance and choice between electric and classic bike. I find this visual to be the most interesting, but also the most frustrating. It takes quite a while to load, and I cannot recommend playing with the bike type filter. It’s your life though, so feel welcome to click things.
For rides of longer than about 3km there is a clear preference for electric bikes. A This is highlighted in the next chart which shows the bike types chosen for trips that go between boroughs vs trips that stay within one borough.
Finally, connecting these two, we see below that the average length of a ride is significantly longer for trips between boroughs than for trips that are confined within one borough.
Conclusions
This project has presented some significant challenges as well as some very interesting insight into the behaviors of Citibike riders across the city! As one would expect, your average Citibiker appears to be a New York City resident who is using bikes to commute to and from work during the Monday to Friday grind. They may have some additional rides each week outside of that activity, maybe to meet friends for dinner or to go to an activity or store. The main research question that I was drawn to was “What factors influence a riders choice between an electric or classic bike?”, and the answer seems to be quite complex. The factor with the most obvious impact is simply ride length, but this involves confounding factors like going between boroughs. The bridges connecting boroughs are infamously steep, and I can say from personal experience that I am usually willing to part with a few bucks to not arrive at work drenched in sweat from crossing the Queensboro Bridge!
Please visit the data exploration page and have a look at activity around the system for yourself! Though it is a small part of this post, it represents the majority of my work on this project.
