
Background
We all know that we should be getting more steps in our daily lives. Guidance from the medical field suggests that people aim for 10,000 steps per day. There have been many, many studies that support this recommendation. Walking is a low-impact form of exercise that has many dose-dependent health benefits. In particular, increased walking has been repeatedly linked to lower risks of adverse cardiovascular events, lower cholesterol, and overall higher satisfaction in life[1].
For many years, my partner (Sarah) and I lived in the suburban expanse of Norman, Oklahoma, a college town just south of the vast sprawl of Oklahoma City. On April 10, 2022 we moved to New York City. New Yorkers are well known to be walkers, and a quick Google search indicates that New Yorkers walk up to 3x as much as the average American.
For this project I wanted to analyze and visualize my own walking habits spanning a number of years, including time when we lived in Oklahoma and following our move to New York City to compare my daily walking habits over time and location.
Data Sources
Let’s be honest… we are a phone addicted culture! For this project this is quite a good thing, as I have carried an iPhone with me essentially everywhere I have gone for the better part of a decade. Following the Apple instructions, I was able to export the entire history of data from the Health app. Using the Apple Health Parser python utility from alxdrcirilo on GitHub I was able to parse this enormous amount of data into a usable format (comma-separated value) quickly.
Later in the project design I became interested in exploring patterns in my walking as they relate to temperature/season. I utilized the NOAA Climate Data Online platform to retrieve daily records of average temperature, maximum temperature, and precipitation amounts covering all dates from 1/1/2021 to 3/23/2025. For the dates from 1/1/2021 to 4/9/2022, I used the USW00013967 (Oklahoma City Will Rogers Airport) station and for days from 4/10/2022 onward I USW00014732 (LaGuardia Airport) station. While temperature can vary by a small amount over the geographical area of a city, I believe that these two stations provide “accurate enough” data for this project.
Data Cleanup
Early on in this project I learned that the Apple HealthKit data structure stores steps in variable duration “walks”. Essentially, if the phone remains still for more than a few seconds then the next time the phone starts moving HealthKit starts a new “walk”. This results in highly variable time windows that may be as short as a few seconds or as long as half an hour. A few rows of raw data are shown below.

To deal with this I rolled up all the records for each day to arrive at a total number of steps for each calendar day. In doing so, I did lose much of the sub-day detail in the data. I hope to revisit this project in the future to add visualizations of how my walking trends change on the hourly level, but for now I have chosen to simply focus on daily trends.
Data from NOAA was in a very simple structure (CSV) with each row having fields for the date, average temperature, maximum temperature, and precipitation in inches. This data covered each day of the desired time range and had no missing data for any day. Thus, it did not require any efforts to cleanup. Data from NOAA was merged with the daily step totals in the Tableau data source window using the date as the key between the two sources. A few rows of the final data source structure are below.

General Trends
As an overview, the following chart presents the average daily number of steps for each year from 2021 through 2025. From this data alone, it is very clear that moving from a suburban life with very limited non-car transit options to a major city with multiple types of transit had a dramatic impact on the number of steps I took.
The following calendar represents daily steps from 1/1/2021 through 3/23/2025. Days are color coded into one of 6 buckets based on step count, with the increment between steps being 5000. Thus grey colored days represent between 0 and 5000 steps, and the darkest green represents step counts >25,000. The selected year can be changed at the top of the chart with the < and > buttons in the top right.
Even with this very high-level view of the data, it is very easy to identify the day that my partner and I moved to New York City (April 10, 2022). Prior to this date the overwhelming majority of days are grey colored or one of the two lightest shades of green (<10,000 steps).
Charting Days With at Least 10,000 Steps
While there is a dose dependent benefit from walking (one article says there is an 8-11% reduction in premature death for every 2,000 steps per day[2]), the overwhelming guidance from health organizations is to aim for 10,000 steps per day. In a modification of the previous calendar, I have color coded each day based simply on whether or not I achieved 10,000 steps. This presentation makes the change beginning in April 2022 even more apparent!
Visualized a different way, the count of days per year with at least 10,000 steps trended upward over the entire data set. This means that with each passing year I am walking more, and hopefully gaining more benefits from all this walking.
Does Temperature Impact My Walking
Finally, I combined the Apple HealthKit data with the NOAA daily temperature data to analyze any relationship between ambient temperature and the number of steps I take. While the data has a lot of variability and a non-linear relationship, it is clear that I seem to avoid much walking on days with temperatures under 40F or above 80F.
Conclusion
This project involved a vast amount of data. In total, there were 69,336 raw rows of step data from Apple HealthKit. I very much wish I had been able to retain the hourly data, but my attempts to parse the time format at the hour level did not go well. I plan to come back to this when I have spare time and add in this hourly detail.
Thanks to the data source being an object that I have to carry with me to record data, there is essentially no risk that any of the step totals have been exaggerated. If anything there would be steps that were uncounted for short trips (to the bodega for example), or movements around my house/office where I didn’t pick my phone up.
I have worked as hard as I could to avoid bias in this project, and while I cannot claim to be bias-free, I have not identified any areas where I see bias in the analysis or visualizations.
Expanding on this work would be somewhat easy due to the way HealthKit stores data. It would likely be relatively trivial to add in heart rate data, or blood oxygen saturation data, as these are recorded as well. I wanted to be careful with scope creep on this project and opted to not include any of that data, but it would almost certainly yield some interesting information!
Bibliography
[1] Wattanapisit, Apichai, and Sanhapan Thanamee. “Evidence behind 10,000 steps walking.” J Health Res 31.3 (2017). https://www.thaiscience.info/Journals/Article/JHRE/10985252.pdf
[2] https://www.kumc.edu/about/news/news-archive/jama-study-ten-thousand-steps.html