Introduction to Computational Linguistics

Announcements

The first tutorial will be held jointly with CSC401 on Fri, Sep 6. Time and Locations: 10-11: BA 1180, 11-12: BA 1190, 12-13: ES B142.
First class on September 4!
PREREQUISITES: Please read the prerequisites requirements carefully and check to make sure that you have met all the requirements.

Open Table of Contents

Course Information
When and Where
Assignments
Piazza, Quercus and Other Important Links
Schedule & Class Materials
Instructions
Tentative Syllabus
Course Policies

Course Information

Instructor: Frank Niu.
Prof. Gerald Penn oversees the essays for graduate students.
Office Hours: MWF 11-12 before/after class or by appointment.
Email: csc485-2024-09@cs. (add the toronto.edu suffix).
TAs: Jinman Zhao, Zixin Zhao, Jinyue Feng, Yushi Guan, Bindu Dash and Devan Srinivasan.
See more in Course Information.

ℹ️

For non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.

When and Where

Course	Section	Time	Room
CSC 485	LEC 0101	MWF 10-11	MWF: ES B142
CSC 485/2501	LEC 0201	MWF 12-13	M: MP 137 WF: ES B142

Assignments

Assignment	Due	Lead TA
Assignment 1	Thursday, 3rd October	Zixin Zhao
Assignment 2	Thursday, 7th November	Yushi Guan
Assignment 3	Wednesday, 4th December	Jinyue Feng

Assignment 1: Dependency Parsing

This assignment is designed to familiarize you with classic tasks while utilizing the latest cutting-edge methods. You will build neural dependency parsers using various algorithms. You will become familiar with parsing and learn the fundamentals of working with neural models using PyTorch.

Assignment 2: Word Sense Disambiguation and Language Models Interpretation

In the first part of the assignment, we will explore various Word Sense Disambiguation (WSD) algorithms, ranging from the Lesk algorithm to BERT. You will gain familiarity with semantics and hands-on experience with large-scale transformer models. Then, in the second part of this assignment, you will explore the current advancements in understanding how language models work and what occurs within them.

Assignment 3: Building Grammar Rules: Symbolic Machine Translation

In this assignment, you will learn how to write phrase structure grammars for some different linguistic phenomena in two different languages: English and Chinese. You can use the two grammars to create an interlingual machine translation system by parsing in one and generating in the other. You will use this case study to analyse a very interesting linguistic phenomenon --- the quantifier scoping difference between the two languages.

Essays (CSC2501)

Essay	Due Date	Paper(s)
1	Sep 16	A. Turing. Computing Machinery and Intelligence.
2	Sep 30	Kurtz et al. (2019) Improving Semantic Dependency Parsing with Syntactic Features.
3	Oct 16	Two papers: - Raganato et al. (2017) Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. - Lu and Nguyen (2018) Similar but not the Same: Word Sense Disambiguation Improves Event Detection via Neural Representation Matching.
4	Nov 15	Tommi Buder-Gröndahl (2024) What Does Parameter-free Probing Really Uncover?
5	Nov 25	Lillian Lee (2022). Fast Context-Free Grammar Parsing Requires Fast Boolean Matrix Multiplication.

Piazza, Quercus and Other Important Links

Please monitor the Piazza discussion board. There will be assignment clarifications and hints.

We will use Quercus for non-public materials and the quizzes. You should be automatically enrolled if you’re registered for the course.

Schedule & Class Materials

The schedule will change based on progress and class feedback.

Textbook for the course.
- [J&M] Jurafsky and Martin, Speech and Language Processing.
- [BK&L] Bird, Klein and Loper, Natural Language Processing with Python.

Both books are available online for free, so the physical copy is optional.

Week 1. Introduction

L1: Introduction to Computational Linguistics

Date: Wed, Sep 4.
Readings:
- J&M: Chapter 1 (See 2nd Edition).
- BK&L: 1, 2.3, 4.
Slides: Lecture 1.

T1: PyTorch Review

Date: Fri, Sep 6.
Material: Google Colab.

Week 2. Syntax, Grammar and Parsing.

L2: Introduction to Transformer Language Models

Date: Mon, Sep 9.
Readings:
- J&M: Chapter 9.
- Vaswani et al. (2017): Attention Is All You Need.
- Devlin et al. (2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
- Zain ul Abideen. Blog post: Attention Is All You Need: The Core Idea of the Transformer.
- (optional) Harvard NLP. Blog post: The Annotated Transformer.
- (optional) Jay Allamar. Blog post: The Illustrated Transformer.
Slides: Lecture 2: Transformers.

L3: Introduction to Syntax and Parsing

Date: Wed, Sep 11.
Readings:
- J&M 17.0-1, 18.0-2, D.3.
- BK&L: 8.0-4.
Slides: Lecture 3: Syntax and Parsing.

L4: Dependency Grammar & Parsing - Part 1

Date: Fri, Sep 13.
Readings:
- J&M 19.0-2.
- BK&L: 8.5.
Slides: Lecture 4: Dependency Parsing (Part 1).

Week 3. The Turing Test

L5: Dependency Grammar & Parsing - Part 2

Date: Mon, Sep 16.
Readings:
- J&M 19.3-5.
Slides: Lecture 5: Dependency Parsing (Part 2).

L6: The Turing Test and Linguistic Levels

Date: Wed, Sep 18.
Readings:
- J&M 15.0.
- A. Turing. Computing Machinery and Intelligence.
- (For fun!) human or not.
- (More fun) Can You Tell Which Short Story ChatGPT Wrote? NY Times Opinion Podcast.
Slides: Lecture 6: Turing Test and Linguistic Levels.

T2: Transition-based Parser and Gap Degree

Date: Fri, Sep 20.
Slides: A1 Tutorial 2.

Week 4. Lexical Semantics

L7: Lexical Semantics - Part 1

Date: Fri, Sep 23.
Readings:
- J&M G.1-2, 6.1.
- BK&L: 9.
Slides: Lecture 7: Lexical Semantics.

T3: Graph-based Parser

Date: Fri, Sep 25.
Slides: A1 Tutorial 3.

L8: Lexical Semantics - Part 2

Date: Fri, Sep 27.
Readings: J&M G.3-4.
Slides: Gap Degree & Survey.
Slides: Lecture 8: Word Sense Disambiguation.

Week 5.

Week 6.

L12: Large Language Models

Date: Mon, Oct 7.
Readings: J&M Chapter 10.
Slides: Lecture 12: Large Language Models.
Resources:
- How to run GPT2 XL.
- AI can’t cross this line and we don’t know why: A good YouTube explanation to the scaling laws.

L13: Interpreting LLMs

Date: Mon, Oct 9.
Readings:
- J&M Chapter 10.
- Clark et al. (2019) What Does BERT Look at? An Analysis of BERT’s Attention.
- Buder-Gröndahl (2024) What does Parameter-free Probing Really Uncover?
Slides: Lecture 13: Interpreting LLMs.
Resources:
- Neel Nanda’s Interpretability Reading List.
- How might LLMs store facts: 3Blue1Brown’s summarisation of the recent research of the MLP layer.

T1: A2 Tutorial 1

Date: Mon, Oct 9.
Slides: CSC485/2501 A2

Week 7.

L14: Interpreting LLMs (Part 2)

Date: Wed, Oct 17.
Readings:
- Understanding Transformer Layers:
  Does BERT Rediscovers the Classical NLP Pipeline?
  - Jawahar et al. (2019) BERT Rediscovers the Classical NLP Pipeline.
  - Tenney et al. (2019) What Does BERT Learn about the Structure of Language?
  - Niu et al (2022) Does BERT Rediscovers the Classical NLP Pipeline?
- Residual Stream, Causal Tracing, Knowledge Neuron Thesis
  - Elhage et al. (2021) A Mathematical Framework for Transformer Circuits.
  - Meng et al. (2022) Locating and Editing Factual Associations in GPT.
  - Dai et al. (2022) Knowledge Neurons in Pretrained Transformers.
    - Their limitations: Niu et al. (2024) & Li et al. (2024).
  - OpenAI (2023): Language models can explain neurons in language models … And Huang et al. (2023): No They Can’t.
Slides: Lecture 14: Interpreting LLMs.

I’ve combined the entire LLM interpretation segment into a single slide for simplicity.

Quiz 7: TransformerLens: A Crash Course.

T2: A2 Tutorial 2

Date: Mon, Oct 19.
Slides: Tutorial 2: TransformerLens and Task Vectors.
Code & Material: task_vector.py.

Week 8.

L15: Interpreting LLMs (Part 3)

Date: Wed, Oct 21.
Slides: Lecture 15: Interpreting LLMs.
Attention Heads Readings:
- Clark et al. (2019) What Does BERT Look at? An Analysis of BERT’s Attention.
- Buder-Gröndahl (2024) What Does Parameter-free Probing Really Uncover?.
Quiz 8: Knowledge Neuron Suppression.

L16: Interpreting LLMs & More

Date: Wed, Oct 23.
Slides: Lecture 16: Interpreting LLMs & More LLM Stuff.
Induction Heads, Circuits and ICL Readings:
- Olsson et al. (2022) In-context Learning and Induction Heads.
- Jin et al. (2024) Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models.
- Wang et al. (2023) Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small.
- Lindner et al. (2023) Tracr: Compiled Transformers as a Laboratory for Interpretability.

T3: A2 Tutorial 3

Date: Fri, Oct 15.

Week 9. 🏖️ Reading Week 🏖️

No class, work on your assignment and READ!

Week 10.

L17: Syntax and Interpretation

Date: Wed, Nov 4.
Slides: Lecture 17: Syntax and Interpretation.

L18: Parsing with Features

Date: Wed, Nov 6.
Slides: Lecture 18: Parsing with Features.
Resources:
- TRALE Tutorial.
- Example Grammars.

T3: A2 Tutorial 4

Date: Fri, Nov 8.

Week 11.

L19: Parsing with Features

Date: Wed, Nov 11.
Slides: Lecture 19: Parsing with Features - Part 2.

L20: Chart Parsing

Date: Wed, Nov 13.
Slides: Lecture 20: Chart Parsing.

A3 T1: TRALE Basics

Week 12.

L21: Chart Parsing (Part 2)

Date: Wed, Nov 18.
Slides: Lecture 21: Chart Parsing.

L22: Statistics & Parsing - PP Attachment Disambiguation

Date: Wed, Nov 20.
Slides: Lecture 22: Statistics & Parsing - PP Attachment Disambiguation.

A3 T2: Subcategorization & Gap

Date: Fri, Nov 22.
Slides: A3 Tutorial 2.

Week 13.

L23: Statistics & Parsing - Statistical Parsing

Date: Wed, Nov 25.
Slides: Lecture 22: Statistics & Parsing - Statistical Parsing.

L24: Statistics & Parsing - Unsupervised Parsing

Date: Wed, Nov 27.
Slides: Lecture 22: Statistics & Parsing - Unsupervised Parsing.

A3 T3: Subcategorization & Gap

Date: Fri, Nov 29.
Slides: A3 Tutorial 2.

Week 14.

L25: Question Answering

Date: Mon, Dec 2.
Slides: Lecture 25: Question Answering - Good Ol’ QA.

L26: Statistics & Parsing - Unsupervised Parsing

Date: Tue, Dec 3.
Slides: Lecture 26: Question Answering & Prompt Engineering.

Instructions

Tentative Syllabus

Introduction to Transformers
Dependency Grammar & Parsing
The Turing Test
Syntax and Interpretation
Lexical semantics, vector semantics and Word Sense Disambiguation
Large Language Models
Interpretation of Large Language Models
Chart Parsing
Parsing with Features
Statistical parsing
Unsupervised Parsing
Question Answering

Roughly speaking, one topic per week. Some will take more lectures and some will take less. Please let us know what sub-topic you are interested in in the Course Content Survey!

Course Policies

Prerequisites

Mandatory: CSC209 and STA237/247/255/257.
CSC311 and CSC324/384 are strongly recommended.
Engineering students may substitute APS105, APS106, ESC180 or CSC180 for the CSC209 requirement, although experience with the Unix operating system is strongly recommended, and/or ECE302, STA286, CHE223, CME263, MIE231/236, MSE238, ECE286 for the statistics requirement.

🚨

The University’s automatic registration system checks for prerequisites: even if you have managed to register for the class, you will be dropped from it unless either you had satisfied the prerequisite before you registered, or you received a prerequisite waiver.

Evaluation Policies

There will be three assignments, some quizzes and no final exam. For CSC 2501 students, there will also be 5 essays to write based upon assigned readings.

Component	CSC485	CSC2501
Assignment 1	30%	25%
Assignment 2	30%	25%
Assignment 3	30%	25%
Quizzes	10%	10%
Course Content Survey	Bonus 1%	Bonus 1%
Essay 1~5		3% Each

🚨

No late assignments will be accepted except in case of documented medical or other emergencies.

Remark Requests

Requests for remarking an assignment must be made within seven (7) days of the return of the marked assignment via the remark request online form (link TBA).

Requests for remarking will be reviewed by the head TA or instructor if deemed necessary, consult with the grading TA, who originally mark the assignment. If the grading TA determines that the original grade was too high, the student’s grade may be lowered. If the grading TA determines that the original grade was too low, the student’s grade will be adjusted accordingly. Once all the remarks are completed, they will be released back to the students. The decision on the remark will be final and no further requests for a remark will be considered. Any or all other departmental and university policies on remarking applies as well.

Policy on Collaboration, Online Resources, AI Writing Assistance and Plagiarism

Collaboration on and discussion of quiz content is encouraged. No collaboration on homeworks or essays is permitted. The work you submit must be your own. No student is permitted to discuss or share homeworks with any other student from either this or previous years.

Posting solutions, materials, or handouts from assignments, quizzes, or essays to any public forum (including but not limited to Reddit, GitHub, and GitLab) is strictly prohibited. The use of any unauthorized online materials is also forbidden. Submitting any code or writing that is not your own constitutes an academic offence.

The use of AI writing assistance (ChatGPT, Copilot, etc) is allowed only for refining the language of your writings. You can use generative models for paraphrasing or polishing the you original content rather than for suggesting new content. Submitting any code generated by any AI assistants is strictly prohibited. The only exception is when you are specifically instructed to evaluate the behaviour and performance of these LLM models, in which case the models’ output must be clearly distinguished from the rest of the report.

Failure to observe this policy is an academic offence, carrying a penalty ranging from a zero on the homework to suspension from the university. See Academic integrity at the University of Toronto.