Title: Data Analysis: Statistical Modeling and Computation in Applications

Author for citation: Stefanie Jegelka, Caroline Uhler, and Karene Chu

License for content: Unknown

Publication date: 2023

This is an MIT-created course that is released on the edX platform. The 16-week course is designed to help learners combine the "foundational and practical skills" of data science "with domain knowledge to ask and answer questions using real data." The course is free to take and is a part of the university's MicroMaster's program. A verified certificate of completion is available afterwards for $300 USD.

The edX course description:

"Data science requires multi-disciplinary skills ranging from mathematics, statistics, machine learning, problem solving to programming, visualization, and communication skills. In this course, learners will combine these foundational and practical skills with domain knowledge to ask and answer questions using real data.

This course will start with a review of common statistical and computational tools such as hypothesis testing, regression, and gradient descent methods. Then, learners will study common models and methods to analyze specific types of data in four different domain areas:

  • Epigenetic Codes and Data Visualization
  • Criminal Networks and Network Analysis
  • Prices, Economics and Time Series
  • Environmental Data and Spatial Statistics

Learners will be guided to analyze a real data set from each of these areas of focus, and present their findings in written reports. They will also discuss relevant and practical issues with peers.

What you'll learn:

  • Model, form hypotheses, perform statistical analysis on real data
  • Use dimension reduction techniques such as principal component analysis to visualize high-dimensional data and apply this to genomics data
  • Analyze networks (e.g. social networks) and use centrality measures to describe the importance of nodes, and apply this to criminal networks
  • Model time series using moving average, autoregressive and other stationary models for forecasting with financial data
  • Use Gaussian processes to model environmental data and make predictions
  • Communicate analysis results effectively"

About the authors

The course is taught by:

  • Stefanie Jegelka, X-Consortium Career Development Associate Professor at MIT EECS. "Her research is in algorithmic machine learning, and spans modeling, optimization algorithms, theory and applications. In particular, she has been working on exploiting mathematical structure for discrete and combinatorial machine learning problems, for robustness and for scaling machine learning algorithms. Her research is supported by a Sloan Research Fellowship, an NSF CAREER Award, a DARPA Young Faculty Award, an NSF BIGDATA, an Adobe Research award, an STL award and other awards by NSF and DARPA. Previously, she was also supported by a Google Research Award and an MIT RSC award."
  • Caroline Uhler, Henry L. & Grace Doherty Associate Professor at MIT. "Caroline Uhler joined the MIT faculty in 2015 as an assistant professor in EECS and IDSS. She holds an MSc in Mathematics, a BSc in Biology, and an MEd in High School Mathematics Education from the University of Zurich. She obtained her PhD in Statistics from UC Berkeley in 2011. Before joining MIT, she spent short postdoctoral positions at the Institute for Mathematics and its Applications at the University of Minnesota and at ETH Zurich, and 3 years as an assistant professor at IST Austria. Her research focuses on mathematical statistics and computational biology, in particular on graphical models, causal inference and algebraic statistics, and on applications to learning gene regulatory networks and the development of geometric models for the organization of chromosomes."
  • Karene Chu, Digital Learning Scientist and Research Scientist at MIT. "Karene Chu received her Ph.D. in mathematics from the University of Toronto in 2012. Since then she has been a postdoctoral fellow first at the University of Toronto/Fields Institute, and then at MIT, with research focus on knot theory. She has taught single and multi-variable calculus, and linear algebra at the University of Toronto where she received a teaching award."

General layout and contents of the course

A pre-enrollment syllabus for this course isn't available, and therefore the sections of the course are unknown.

The course

PDF.png: The course can be found on the edX site, under the Data Analysis & Statistics category. A session started January 24 and ends on May 19, 2023. The free audit track expires May 16.