Quantitative Data Analysis with Bayesian Statistics
February 1. to May 2, 2024
Lecturers:
- Assistant Professor, Raul Pardo
- Full Professor, Andrzej WÄ…sowski
Time:
Thursdays from 8:15 to 12:00
Course description:
This course introduces basics of Bayesian statistics, Bayesian data analysis, Bayesian learning, and the programming tools that enable automation of these methods. The course emphasizes programmable statistical methods over pen and pencil analytics. During the course the students master data cleaning and formatting, visualization, analysis, and hypotheses testing (Bayesian decision making). Examples and analysis are implemented using the PyMC framework in Python.
The course is suitable for PhD students who collect, analyze, and publish quantitative data in their work (from surveys, experiments, or other sources), and are willing to use programming during the analysis. In the PhD course students will conclude the course by applying Bayesian data analysis to their research data. This requires studying any additional method needed for the model of their particular data analysis problem. The lecturers will supervise the PhD students in that project.
The PhD students will have the opportunity to use their course work as part of a research publication. The course is not suitable for PhD students in statistics itself, as this course aims for researchers using statistics.
We will cover 14 chapters from the text book. Every week we will summarize a chapter in a lecture, and follow with an exercise session on analyzing relevant data cases using PyMC. We expect that students spend ca 6-7 hours a week outside class for reading and completing the homework.
- 01/02 Bayesian Modeling and Inference : McElreath Ch. 2 (Ch. 1 is optional). Make sure that you have a functioning Python 3 installation on your computer. Clone/fork the course git repository before the first class. More instructions therein [teacher: AW]
- 08/02 Regression: McElreath Chapters 3-4 [AW]
- 15/02 Multivariate regression, confounds, categorical variables : McElreath Ch. 5 [AW]
- 22/02 Multicolinearity, post-treatment bias, and collider bias : McElreath Ch. 6 [AW]
- 29/02 (Over/under)-Fitting and Interactions: McElreath Ch 7-8 [AW]
- 07/03 Interactions between regressors: McElreath Ch 8 [AW]
- 14/03 Markov Chain Monte Carlo: McElreath Ch 9 [RP]
- 21/03 Generalized Linear Model (GLM): McElreath Ch 10 [RP]
- 04/04 Guest/Application talks (no new chapter)
- 11/04 Binomial, Poisson and multinomial regression: McElreath Ch 11 [RP]
- 18/04 Over-dispersion, zero-inflated outcomes, ordered categorical models: McElreath Ch 12 [RP]
- 25/04 Multilevel models McElreath Ch 13 [RP]
- 02/05 Models with covariance McElreath Ch 14 (Sections 14.1 & 14.2) + exam project publication [RP]
Prerequisites:
Basic skills in Python programming and basics of probability theory. To participate on PhD level the student must present and integrate relevant data analysis problems related to the student’s research.
Exam [pass/fail]
To obtain PhD credit all of following conditions must be met
- Active participation (physical presence, participating in discussions, etc.)
- Active participation in the supervision meetings with the lecturers on analyzing the relevant research data set
- Studying any additional analysis methods applicable to the relevant research data analysis project
- Handing in a report latest by September 1; 5 pages corresponding to a section in a paper on data analysis
- Delivering a presentation to the student's research group, or to the lecturers, on the student's relevant research data analysis project
Credits: 7,5 ECTS
Amount of hours the student is expected to use on the course:
- approx. 150 hours on weekly lectures
- approx. 70 hours on final project
How to sign up:
To register send a short email to wasowski@itu.dk with
1) Your name
2) Your affiliation,
3) Your PhD supervisor's name,
4) The title of your PhD project, and
5) a very brief explanation what data set you would like to analyze in the final project of the course (2-3 sentences suffice).