Introduction to Computational Linguistics
Announcements
- The first tutorial will be held jointly with CSC401 on Fri, Sep 6. Time and Locations: 10-11: BA 1180, 11-12: BA 1190, 12-13: ES B142.
- First class on September 4!
- PREREQUISITES: Please read the prerequisites requirements carefully and check to make sure that you have met all the requirements.
Table of Contents
Open Table of Contents
- Course Information
- When and Where
- Assignments
- Piazza, Quercus and Other Important Links
- Schedule & Class Materials
- Instructions
- Tentative Syllabus
- Course Policies
Course Information
- Instructor: Frank Niu.
Prof. Gerald Penn oversees the essays for graduate students. - Office Hours: MWF 11-12 before/after class or by appointment.
- Email: csc485-2024-09@cs. (add the toronto.edu suffix).
- TAs: Jinman Zhao, Zixin Zhao, Jinyue Feng, Yushi Guan, Bindu Dash and Devan Srinivasan.
- See more in Course Information.
For non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.
When and Where
Course | Section | Time | Room |
---|---|---|---|
CSC 485 | LEC 0101 | MWF 10-11 | MWF: ES B142 |
CSC 485/2501 | LEC 0201 | MWF 12-13 | M: MP 137 WF: ES B142 |
Assignments
Assignment | Due | Lead TA |
---|---|---|
Assignment 1 | Thursday, 3rd October | Zixin Zhao |
Assignment 2 | Thursday, 7th November | Yushi Guan |
Assignment 3 | Wednesday, 4th December | Jinyue Feng |
Assignment 1: Dependency Parsing
This assignment is designed to familiarize you with classic tasks while utilizing the latest cutting-edge methods. You will build neural dependency parsers using various algorithms. You will become familiar with parsing and learn the fundamentals of working with neural models using PyTorch.
Assignment 2: Word Sense Disambiguation and Language Models Interpretation
In the first part of the assignment, we will explore various Word Sense Disambiguation (WSD) algorithms, ranging from the Lesk algorithm to BERT. You will gain familiarity with semantics and hands-on experience with large-scale transformer models. Then, in the second part of this assignment, you will explore the current advancements in understanding how language models work and what occurs within them.
Assignment 3: Building Grammar Rules: Symbolic Machine Translation
In this assignment, you will learn how to write phrase structure grammars for some different linguistic phenomena in two different languages: English and Chinese. You can use the two grammars to create an interlingual machine translation system by parsing in one and generating in the other. You will use this case study to analyse a very interesting linguistic phenomenon --- the quantifier scoping difference between the two languages.
Essays (CSC2501)
Essay | Due Date | Paper(s) |
---|---|---|
1 | Sep 16 | A. Turing. [Computing Machinery and Intelligence]/teaching/csc485-f24/essay_1.pdf). |
2 | Sep 30 | Kurtz et al. (2019) Improving Semantic Dependency Parsing with Syntactic Features. |
3 | Oct 16 | Two papers: - Raganato et al. (2017) Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. - Lu and Nguyen (2018) Similar but not the Same: Word Sense Disambiguation Improves Event Detection via Neural Representation Matching. |
4 | Nov 15 | Tommi Buder-Gröndahl (2024) What Does Parameter-free Probing Really Uncover? |
5 | Nov 25 | Lillian Lee (2022). Fast Context-Free Grammar Parsing Requires Fast Boolean Matrix Multiplication. |
Piazza, Quercus and Other Important Links
Please monitor the Piazza discussion board. There will be assignment clarifications and hints.
We will use Quercus for non-public materials and the quizzes. You should be automatically enrolled if you’re registered for the course.
Schedule & Class Materials
The schedule will change based on progress and class feedback.
- Textbook for the course.
- [J&M] Jurafsky and Martin, Speech and Language Processing.
- [BK&L] Bird, Klein and Loper, Natural Language Processing with Python.
Both books are available online for free, so the physical copy is optional.
Week 1. Introduction
L1: Introduction to Computational Linguistics
T1: PyTorch Review
- Date: Fri, Sep 6.
- Material: Google Colab.
Week 2. Syntax, Grammar and Parsing.
L2: Introduction to Transformer Language Models
- Date: Mon, Sep 9.
- Readings:
- J&M: Chapter 9.
- Vaswani et al. (2017): Attention Is All You Need.
- Devlin et al. (2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
- Zain ul Abideen. Blog post: Attention Is All You Need: The Core Idea of the Transformer.
- (optional) Harvard NLP. Blog post: The Annotated Transformer.
- (optional) Jay Allamar. Blog post: The Illustrated Transformer.
- Slides: Lecture 2: Transformers.
L3: Introduction to Syntax and Parsing
- Date: Wed, Sep 11.
- Readings:
- Slides: Lecture 3: Syntax and Parsing.
L4: Dependency Grammar & Parsing - Part 1
- Date: Fri, Sep 13.
- Readings:
- Slides: Lecture 4: Dependency Parsing (Part 1).
Week 3. The Turing Test
L5: Dependency Grammar & Parsing - Part 2
- Date: Mon, Sep 16.
- Readings:
- J&M 19.3-5.
- Slides: Lecture 5: Dependency Parsing (Part 2).
L6: The Turing Test and Linguistic Levels
- Date: Wed, Sep 18.
- Readings:
- J&M 15.0.
- A. Turing. Computing Machinery and Intelligence.
- (For fun!) human or not.
- (More fun) Can You Tell Which Short Story ChatGPT Wrote? NY Times Opinion Podcast.
- Slides: Lecture 6: Turing Test and Linguistic Levels.
T2: Transition-based Parser and Gap Degree
- Date: Fri, Sep 20.
- Slides: A1 Tutorial 2.
Week 4. Lexical Semantics
L7: Lexical Semantics - Part 1
- Date: Fri, Sep 23.
- Readings:
- Slides: Lecture 7: Lexical Semantics.
T3: Graph-based Parser
- Date: Fri, Sep 25.
- Slides: A1 Tutorial 3.
L8: Lexical Semantics - Part 2
- Date: Fri, Sep 27.
- Readings: J&M G.3-4.
- Slides: Gap Degree & Survey.
- Slides: Lecture 8: Word Sense Disambiguation.
Week 5.
L9: Word Sense Disambiguation
- Date: Mon, Sep 30.
- Readings: J&M G.3-4.
- Slides: Lecture 9: Word Sense Disambiguation (cont.).
L10: Vector Semantics - Part 1
- Date: Wed, Oct 2.
- Readings: J&M Chapter 6.
- Slides: Lecture 10: Vector Semantics (Part 1).
L11: Vector Semantics - Part 2
- Date: Fri, Oct 4.
- Readings: J&M Chapter 6.
- Slides: Lecture 11: Vector Semantics (Part 2).
- Resources:
Week 6.
L12: Large Language Models
- Date: Mon, Oct 7.
- Readings: J&M Chapter 10.
- Slides: Lecture 12: Large Language Models.
- Resources:
- How to run GPT2 XL.
- AI can’t cross this line and we don’t know why: A good YouTube explanation to the scaling laws.
L13: Interpreting LLMs
- Date: Mon, Oct 9.
- Readings:
- J&M Chapter 10.
- Clark et al. (2019) What Does BERT Look at? An Analysis of BERT’s Attention.
- Buder-Gröndahl (2024) What does Parameter-free Probing Really Uncover?
- Slides: Lecture 13: Interpreting LLMs.
- Resources:
- Neel Nanda’s Interpretability Reading List.
- How might LLMs store facts: 3Blue1Brown’s summarisation of the recent research of the MLP layer.
T1: A2 Tutorial 1
- Date: Mon, Oct 9.
- Slides: CSC485/2501 A2
Week 7.
L14: Interpreting LLMs (Part 2)
-
Date: Wed, Oct 17.
-
Readings:
- Understanding Transformer Layers:
Does BERT Rediscovers the Classical NLP Pipeline?- Jawahar et al. (2019) BERT Rediscovers the Classical NLP Pipeline.
- Tenney et al. (2019) What Does BERT Learn about the Structure of Language?
- Niu et al (2022) Does BERT Rediscovers the Classical NLP Pipeline?
- Residual Stream, Causal Tracing, Knowledge Neuron Thesis
- Elhage et al. (2021) A Mathematical Framework for Transformer Circuits.
- Meng et al. (2022) Locating and Editing Factual Associations in GPT.
- Dai et al. (2022) Knowledge Neurons in Pretrained Transformers.
- Their limitations: Niu et al. (2024) & Li et al. (2024).
- OpenAI (2023): Language models can explain neurons in language models … And Huang et al. (2023): No They Can’t.
- Understanding Transformer Layers:
-
Slides: Lecture 14: Interpreting LLMs.
I’ve combined the entire LLM interpretation segment into a single slide for simplicity.
- Quiz 7: TransformerLens: A Crash Course.
T2: A2 Tutorial 2
- Date: Mon, Oct 19.
- Slides: Tutorial 2: TransformerLens and Task Vectors.
- Code & Material: task_vector.py.
Week 8.
L15: Interpreting LLMs (Part 3)
- Date: Wed, Oct 21.
- Slides: Lecture 15: Interpreting LLMs.
- Attention Heads Readings:
- Clark et al. (2019) What Does BERT Look at? An Analysis of BERT’s Attention.
- Buder-Gröndahl (2024) What Does Parameter-free Probing Really Uncover?.
- Quiz 8: Knowledge Neuron Suppression.
L16: Interpreting LLMs & More
- Date: Wed, Oct 23.
- Slides: Lecture 16: Interpreting LLMs & More LLM Stuff.
- Induction Heads, Circuits and ICL Readings:
- Olsson et al. (2022) In-context Learning and Induction Heads.
- Jin et al. (2024) Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models.
- Wang et al. (2023) Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small.
- Lindner et al. (2023) Tracr: Compiled Transformers as a Laboratory for Interpretability.
T3: A2 Tutorial 3
- Date: Fri, Oct 15.
Week 9. 🏖️ Reading Week 🏖️
No class, work on your assignment and READ!
Week 10.
L17: Syntax and Interpretation
- Date: Wed, Nov 4.
- Slides: Lecture 17: Syntax and Interpretation.
L18: Parsing with Features
- Date: Wed, Nov 6.
- Slides: Lecture 18: Parsing with Features.
- Resources:
T3: A2 Tutorial 4
- Date: Fri, Nov 8.
Week 11.
L19: Parsing with Features
- Date: Wed, Nov 11.
- Slides: Lecture 19: Parsing with Features - Part 2.
L20: Chart Parsing
- Date: Wed, Nov 13.
- Slides: Lecture 20: Chart Parsing.
A3 T1: TRALE Basics
- Date: Fri, Nov 15.
- Slides: A3 Tutorial 1: TRALE Basics.
- A3 T1 Example Grammars.
Week 12.
L21: Chart Parsing (Part 2)
- Date: Wed, Nov 18.
- Slides: Lecture 21: Chart Parsing.
L22: Statistics & Parsing - PP Attachment Disambiguation
- Date: Wed, Nov 20.
- Slides: Lecture 22: Statistics & Parsing - PP Attachment Disambiguation.
A3 T2: Subcategorization & Gap
- Date: Fri, Nov 22.
- Slides: A3 Tutorial 2.
Week 13.
L23: Statistics & Parsing - Statistical Parsing
- Date: Wed, Nov 25.
- Slides: Lecture 22: Statistics & Parsing - Statistical Parsing.
L24: Statistics & Parsing - Unsupervised Parsing
- Date: Wed, Nov 27.
- Slides: Lecture 22: Statistics & Parsing - Unsupervised Parsing.
A3 T3: Subcategorization & Gap
- Date: Fri, Nov 29.
- Slides: A3 Tutorial 2.
Week 14.
L25: Question Answering
- Date: Mon, Dec 2.
- Slides: Lecture 25: Question Answering - Good Ol’ QA.
L26: Statistics & Parsing - Unsupervised Parsing
- Date: Tue, Dec 3.
- Slides: Lecture 26: Question Answering & Prompt Engineering.
Instructions
Tentative Syllabus
- Introduction to Transformers
- Dependency Grammar & Parsing
- The Turing Test
- Syntax and Interpretation
- Lexical semantics, vector semantics and Word Sense Disambiguation
- Large Language Models
- Interpretation of Large Language Models
- Chart Parsing
- Parsing with Features
- Statistical parsing
- Unsupervised Parsing
- Question Answering
Roughly speaking, one topic per week. Some will take more lectures and some will take less. Please let us know what sub-topic you are interested in in the Course Content Survey!
Course Policies
Prerequisites
- Mandatory:
CSC209
andSTA237/247/255/257
. CSC311
andCSC324/384
are strongly recommended.- Engineering students may substitute
APS105
,APS106
,ESC180
orCSC180
for theCSC209
requirement, although experience with the Unix operating system is strongly recommended, and/orECE302
,STA286
,CHE223
,CME263
,MIE231/236
,MSE238
,ECE286
for the statistics requirement.
The University’s automatic registration system checks for prerequisites: even if you have managed to register for the class, you will be dropped from it unless either you had satisfied the prerequisite before you registered, or you received a prerequisite waiver.
Evaluation Policies
There will be three assignments, some quizzes and no final exam. For CSC 2501 students, there will also be 5 essays to write based upon assigned readings.
Component | CSC485 | CSC2501 |
---|---|---|
Assignment 1 | 30% | 25% |
Assignment 2 | 30% | 25% |
Assignment 3 | 30% | 25% |
Quizzes | 10% | 10% |
Course Content Survey | Bonus 1% | Bonus 1% |
Essay 1~5 | 3% Each |
No late assignments will be accepted except in case of documented medical or other emergencies.
Remark Requests
Requests for remarking an assignment must be made within seven (7) days of the return of the marked assignment via the remark request online form (link TBA).
Requests for remarking will be reviewed by the head TA or instructor if deemed necessary, consult with the grading TA, who originally mark the assignment. If the grading TA determines that the original grade was too high, the student’s grade may be lowered. If the grading TA determines that the original grade was too low, the student’s grade will be adjusted accordingly. Once all the remarks are completed, they will be released back to the students. The decision on the remark will be final and no further requests for a remark will be considered. Any or all other departmental and university policies on remarking applies as well.
Policy on Collaboration, Online Resources, AI Writing Assistance and Plagiarism
Collaboration on and discussion of quiz content is encouraged. No collaboration on homeworks or essays is permitted. The work you submit must be your own. No student is permitted to discuss or share homeworks with any other student from either this or previous years.
Posting solutions, materials, or handouts from assignments, quizzes, or essays to any public forum (including but not limited to Reddit, GitHub, and GitLab) is strictly prohibited. The use of any unauthorized online materials is also forbidden. Submitting any code or writing that is not your own constitutes an academic offence.
The use of AI writing assistance (ChatGPT, Copilot, etc) is allowed only for refining the language of your writings. You can use generative models for paraphrasing or polishing the you original content rather than for suggesting new content. Submitting any code generated by any AI assistants is strictly prohibited. The only exception is when you are specifically instructed to evaluate the behaviour and performance of these LLM models, in which case the models’ output must be clearly distinguished from the rest of the report.
Failure to observe this policy is an academic offence, carrying a penalty ranging from a zero on the homework to suspension from the university. See Academic integrity at the University of Toronto.