Blog
Background
In late May/early June of 2023, I attended the United States Conference Of Teaching Statistics in State College, Pennsylvania along with several peers and professors from Cal Poly. USCOTS is a conference that consists of a myriad of workshops, each presented by professors from universities across the United States (and a few from abroad!). There were also various presentations and opportunities to discuss with colleagues interested in similar areas relating to statistics and statistics education. The theme of USCOTS 2023 was communicating with and about data.
Day One
On day one, I helped facilitate a workshop run by three Cal Poly professors: Dr. Glanz, Dr. Theobold, and Dr. Robinson by live testing questions from participants and assisting with participants’ issues as they arose.

The workshop covered Quarto, a versatile tool that allows users to create reproducible documents that integrate both code and text together. As a matter of fact, my website is built entirely in Quarto. We focused on using Quarto as a classroom resource - not only as a way for professors to create course materials such as presentations or websites, but also as a way for students to create reproducible documents for class assignments.
Day Two
On day two, I attended two workshops, each of which focused on improving student learning outcomes using nontraditional methodologies.
Improving students’ communication about data using online statistical games
This workshop provided a handful of games meant to teach statistical concepts through the lens of games that engage students in the learning process. We played two games.
A crop-growing game that teaches students about random variation, blocking, and linear modelling.
A racing game that teaches students about outliers, paired data, and confounding.
If you’re interesting in exploring any of these games, please check out their resources here: https://www.stat2games.sites.grinnell.edu
Communicating progress in a statistics course through non-traditional grading
This workshop discussed non-traditional grading schemes that focus on qualitative grading schemes that de-emphasize numeric scores on particular assignments or exams, and instead focus on giving comprehensive feedback and allowing students to resubmit assignments multiple times. I think this grading scheme is particularly interesting, especially in the era of generative AI, as traditional grading methods may be more susceptible to abuse by students using services such as ChatGPT or Google Bard.
Day Three / Four
Days Three and Four consisted of a mixture of a few types of events:
Keynote Presentations
In keynote presentations at USCOTS 2023, one speaker presented to the entirety of the conference attendees. These presentations focused heavily on the conference theme, providing a diverse set of perspectives on how statisticians and data scientists can enhance their communication skills with and about data. My favorite of the keynotes was Larry Lesser’s presentation on ‘edutainment’, a teaching paradigm that treats students’ entertainment, well-being, and engagement as paramount.
Breakout Sessions
Breakout sessions were short, fast-paced dialogues between presenters and attendees. During a given time slot, there were about 10 different breakout sessions one could attend based on their interests. Here are a few highlight sessions I attended:
- Data Science as a general education course
- In this session hosted by Hunter Glanz, we conversed about data science as a terminal (not meant to lead into another course) general education course, where we discussed how to introduce topics about data science to a general audience. Given that data science is a complex and nuanced subject often taught after prerequisite courses, it is not obvious which topics deserve coverage in terminal course that may be taught to anyone, STEM or non-STEM. Dr. Glanz will be releasing a book by the name ‘Data For All’ (which I will contribute to during my graduate studies!) that is meant to do exactly this, so be on the lookout!
- What makes a compelling visualization?
- This meeting hosted by Beth Chance and Emily Robinson discussed how to create informative and communicative visualizations, touching on both what to do as well as what not to do. This session sparked a great dialogue which gave me further insight on how to make compelling visuals at both an introductory and advanced level.
Poster Talks
During poster talks, dozens of participants presented their novel research on countless topics, ranging from phone apps meant to teach statistics to research on medieval-era data. In these talks, I was able to converse one-on-one with the various participants and ask (too many) questions about what they had done. My personal favorite of these posters taught students to compare traditional statistical modelling methodologies to some contemporary algorithmic techniques.
Conclusion
This was the first conference I have ever attended. Although I would say I experienced a bit of information overload, I would not trade the experience for anything. I met many wonderful individuals and was welcomed with open arms to the statistics education community. I especially appreciated that although 99% (forgive my lackadaisical estimation) of attendees either had or were pursuing their PhDs, I was treated as an equal. Thank you to the organizers, presenters, sponsors, and attendees for providing me with a truly unforgettable experience.
Background
ISI-BUDS is a 6-week program in biostatistics and data science hosted by UC Irvine. It is part of a larger set of 10 programs across the United States sponsored by both the National Institute of Allergy and Infectious Diseases and the National Heart, Lung, and Blood Institute. I attended this program in summer of 2023, along with 14 other undergraduate students from all over the country.
Educational Content
The first three weeks of the program were dedicated to preparing us for the eventual project work through a combination of lectures and labs on a wide array of concepts. Much of the content was familiar to me, but I learned some new highly valuable skills that I wasn’t anticipating. A non-exhaustive list of the content we covered is as follows:
- Data Collection
- Interplay between science and statistics
- Study designs
- Coding
- R & Tidyverse
- Reproducibility
- Data Manipulation
- Data Visualization
- Monte Carlo
- Integration
- Methods for Inference
- Importance Sampling
- R & Tidyverse
- Theory
- Probability Theory
- Statistical distributions
- Modeling
- Generalized Linear Models
- Mixed Effects Models
- Covariates
- Adjustments
- Interpretations
- Mathematical Implications
Alzheimer’s Disease Project
The last three weeks of the program consisted of applying the skills covered in the prior weeks to a research project in the area of biostatistics. Each student was assigned to one of four projects based on their interest in each particular research area. I was placed on a project with three other students that investigated the research attitudes of populations of particular interest to Alzheimer’s Disease research.
Throughout this project, we learned a new paradigm for doing research that challenged our preconceived ideas of how research is done. This paradigm focuses much more on the work that takes place prior to modeling, placing a much larger emphasis on formulating the model specification in accordance with the research question of interest. With an awareness of the reproducibility crisis, our mentors ensured that we were carefully specifying which decisions were made before and after seeing the data and fitting models to ensure that our analyses are clearly specified.
Our primary mentor for our research project was Dan Gillen, the head of the Statistics department at UC Irvine. We also worked with Josh Grill, Ira Lott, and Eric Doran for expertise in the Alzheimer’s Disease and Down Syndrome communities. I cannot begin to stress how impactful Dan Gillen’s mentorship was on my statistical thinking over the course of the program. Although I have a solid foundation in the principles of statistics, Dr. Gillen challenged me to think about things in a new light and provided invaluable guidance throughout our research project. Also, a special thank-you to Thuy Lu, our PhD student advisor, who assisted us at every turn and helped streamline the learning and research process.

Our research project culminated in a presentation and research paper. The presentation allowed us to communicate our findings and practice our statistical communication skills with active members of the statistics community, which is a unique audience I don’t often have the chance to engage with. All of the presentations given by the other groups were insightful and meaningful, ranging from research on colorectal cancer to managing HIV for women in India.

Outside the classroom
In addition to the rigorous and rewarding academic program we were participating in, there was ample time to socialize and connect with the other undergraduate students attending the program. ISI-BUDS provided each student with a free ARC (Anteater Recreation Center) membership that granted access to all facilities. Over the course of the program, we played countless sports including basketball, volleyball, tennis, spikeball, and swimming.
The coordinators of ISI-BUDS were also generous enough to plan a few expeditions for us on a few of the weekends, where we went kayaking and whale watching! These events brought our cohort closer together and bolstered our collaboration with one another throughout the program.

Conclusion
Overall, ISI-BUDS was a fantastic program that further developed my technical skills while also introducing me to a community of like-minded statisticians that I hope to remain in contact with moving forward. Thank you to the directors of the program, the lecturers and mentors that taught many valuable lessons, and the other undergraduate students that made the experience unforgettable.

Fall
I began my journey at UC Irvine in September of 2024, ready to formalize my understanding of statistics and expand the boundaries of my knowledge. During this time, I worked with my advisor to submit an application for the NSF GRFP, proposing to work on an ongoing survival analysis project with applications to Alzheimer’s Disease. Ultimately, we were not funded, but writing the grant proposal was essential practice, and it helped me review the literature in this research area.

During the Fall term, I completed the first two classes of two parallel class sequences: one focusing on mathematical theory and the other emphasizing modeling and application. The theory class reviewed probability, common probability distributions, univariate and bivariate variable transformations, moment-generating functions, weak convergence, the law of total expectation/variance, order statistics, and Poisson processes. The applied class, on the other hand, detailed the basic linear model through matrix representation, decomposition of the variance, individual t-tests, full and partial F-tests, interpretations of parameters, and model selection.
I had my first opportunity to work with students as a teaching assistant (TA) for an introductory statistics course, my “bread and butter” as a regular tutor. It was rewarding giving students an encouraging first exposure to statistics in the classroom!
By the end of the term, I had overcome great setbacks (that first theory midterm didn’t go so well!), but I knew that the most challenging times were yet to come, especially with the end-of-year qualifying exams looming in June.
Winter
In January, I returned to Irvine after the holiday season with a revived impetus to understand the upcoming material, notably because my advisor was teaching the theory class. It was time to make, or break.

The theory class in Winter expanded upon ideas from Fall, introducing more types of convergence and related theorems, asymptotic distributions, point estimation via Method of Moments Estimators (MMEs) and Maximum Likelihood Estimators (MLEs), Uniformly Minimum-Variance Unbiased Estimators (UMVUEs), and hypothesis testing. The applied class generalized concepts from the basic linear models class, introducing the theory and application of Generalized Linear Models (GLMs) and some survival analysis models. Additionally, I enrolled in a computational statistics class covering matrix decomposition methods, importance sampling, Markov Chain Monte Carlo (MCMC), and hidden Markov models.
Seeing the mathematical foundations and their applications to complex statistical problems helped me to better understand the motivation and limitations of statistical methods. As my studies in the history of statistics continue, my appreciation for the development of new methodologies over time and their underlying assumptions grow.
Spring
In April, five friends from my cohort and I began planning our 3-month study plan to tackle the qualifying exams. We decided that, each and every week up until the exam, we would collaboratively complete one exam from a previous year. This divide-and-conquer strategy allowed us to cover more problems than any of us would have been able to complete individually, making our study time more efficient. In addition to studying regularly for the upcoming exam, I was enrolled in three more classes meant to round out my knowledge base of statistics.

The first of my three classes was on infectious diseases, exploring infectious disease modeling through SIR models, Markov chains, Ordinary Differential Equation (ODE) models, branching processes, and importance sampling. The theory class in the Spring term returned to linear models, where we worked heavily with proofs related to estimability, Best Linear Unbiased Estimators (BLUE)), Gauss-Markov theory, Iteratively Reweighted Least Squares (IRLS), quadratic forms, and asymptotic distributions. The applied class extended and formalized ideas from generalized linear models to correlated data through the usage of Generalized Linear Mixed Models (GLMMs) and Generalized Estimating Equations (GEEs), with a heavy reliance upon estimating equation theory.
I’m not sure if it was the difficulty of the classes, my focus on the impending qualifying exam, or that it was my first exposure to many of these ideas, but I found content from this term harder to grasp than that from previous terms. At any rate, I look forward to revisiting the concepts to learn them better.
Qualifying Exams
The qualifying exams for the UC Irvine statistics department can be broken down into two parts: The Theory Exam & The Data Analysis. Given my applied background from my time at Cal Poly, I believed that I would perform better on the data analysis and worse on the theory. After mentally preparing for almost a year and physically preparing for almost 3 months, I felt reasonably confident going into the exam – hoping but not expecting to pass both.

Mid-way through the theory exam on test-day, I was panicked but not drowning. I was drawing a blank on a very important question that I expected to be able to answer, forcing me to work on a question I wasn’t sure I knew how to answer. In the last 20 minutes of the three hour time slot I eeked out several breakthroughs on a few questions, earning myself a smooth pass on the theory exam (though I didn’t find out my results until weeks later). After turning in our theory exams, we were given a project specification for the data analysis and a week to complete it.
The data analysis specification tasked us to answer four questions of varying scientific and technical complexity about message interventions on wearable devices to promote increased activity levels in an experimental context. My analysis employed a few different tools, including a Poisson GLM, GEE, cross-validated LASSO regression, and LME. Although my initial submission for this analysis addressed most of the scientific questions posed, it fell short in some key places. After receiving feedback on these shortcomings, I revised my analysis and resubmited, eventually earning a pass on this portion of the exam as well.
These exams allowed me to assess my understanding of the first year of statistical content and employ the methodologies I learned about in a real experimental context to gain real insight. They marked an important inflection point in my time as a PhD student, as I begin to transition away from the traditional student role into a researcher exploring new ideas. The first year went by in a blink, just four more to go!