Theory of Neural Networks

CSE 60963 | Fall 2024

Time
MWF 11:30am–12:20pm
Location
TBD
Instructor
David Chiang

Introduction to the theory of neural networks: expressivity (what functions a neural network can and cannot compute) and trainability (what functions a neural network can and cannot learn). Neural network architectures covered will include feed-forward, recurrent, convolutional and attention (transformer) neural networks.

This offering of the course will be focused on expressivity of neural networks for sequence data (language models), studying how they relate to theoretical models of computation like automata, Turing machines, logics, and Boolean circuits.

Prerequisites

  • Theory: Students must be familiar with finite automata, Turing machines, and first-order logic, and comfortable reading and writing proofs. This requirement is satisfied by Theory of Computing (CSE 30151) or equivalent.
  • Neural networks: Students should minimally understand feed-forward neural networks and how they are trained by gradient descent (backpropagation). They should ideally be familiar with recurrent neural networks and transformers. This requirement is satisfied by one of the following or permission of the instructor:
    • Neural Networks (CSE 40868/60868)
    • Natural Language Processing (CSE 40657/60657)

Topics

This schedule is still subject to change.

Week Topics
1 Introduction
2 Perceptrons and the XOR problem
3 Feedforward NNs and universal approximation
4 (Tentative) Neural tangent kernel
5 (Tentative) Overparameterization and double descent
6 Recurrent NNs (RNNs) and finite automata
7 RNNs with intermediate steps and Turing machines (and beyond)
8 LSTMs and counter machines
9 Circuit complexity and descriptive complexity
10 State-space models
11 Transformers and the PARITY problem
12 Transformers with intermediate steps and Turing machines
13 Unique-hard attention transformers and counter-free automata
14 Average-hard and soft attention transformers
15 Conclusion

Requirements

Tentatively, the requirements will be as follows. There will be no final exam.

  • Problem sets (50%)
  • Programming assignments (30%)
  • Final project (20%)