Skip to content

Introduction to Computational Linguistics

Announcements

Table of Contents

Open Table of Contents

Course Information

ℹ️

For non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.

When and Where

CourseSectionTimeRoom
CSC 485LEC 0101MWF 10-11MWF: ES B142
CSC 485/2501LEC 0201MWF 12-13M: MP 137
WF: ES B142

Assignments

AssignmentDueLead TA
Assignment 1Thursday, 3rd OctoberZixin Zhao
Assignment 2Thursday, 7th NovemberYushi Guan
Assignment 3Wednesday, 4th DecemberJinyue Feng

Assignment 1: Dependency Parsing

This assignment is designed to familiarize you with classic tasks while utilizing the latest cutting-edge methods. You will build neural dependency parsers using various algorithms. You will become familiar with parsing and learn the fundamentals of working with neural models using PyTorch.

Assignment 2: Word Sense Disambiguation and Language Models Interpretation

In the first part of the assignment, we will explore various Word Sense Disambiguation (WSD) algorithms, ranging from the Lesk algorithm to BERT. You will gain familiarity with semantics and hands-on experience with large-scale transformer models. Then, in the second part of this assignment, you will explore the current advancements in understanding how language models work and what occurs within them.

Assignment 3: Building Grammar Rules: Symbolic Machine Translation

In this assignment, you will learn how to write phrase structure grammars for some different linguistic phenomena in two different languages: English and Chinese. You can use the two grammars to create an interlingual machine translation system by parsing in one and generating in the other. You will use this case study to analyse a very interesting linguistic phenomenon --- the quantifier scoping difference between the two languages.

Essays (CSC2501)

EssayDue DatePaper(s)
1Sep 16A. Turing. [Computing Machinery and Intelligence]/teaching/csc485-f24/essay_1.pdf).
2Sep 30Kurtz et al. (2019) Improving Semantic Dependency Parsing with Syntactic Features.
3Oct 16Two papers:
- Raganato et al. (2017) Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison.
- Lu and Nguyen (2018) Similar but not the Same: Word Sense Disambiguation Improves Event Detection via Neural Representation Matching.
4Nov 15Tommi Buder-Gröndahl (2024) What Does Parameter-free Probing Really Uncover?
5Nov 25Lillian Lee (2022). Fast Context-Free Grammar Parsing Requires Fast Boolean Matrix Multiplication.

Please monitor the Piazza discussion board. There will be assignment clarifications and hints.

We will use Quercus for non-public materials and the quizzes. You should be automatically enrolled if you’re registered for the course.

Schedule & Class Materials

The schedule will change based on progress and class feedback.

Both books are available online for free, so the physical copy is optional.

Week 1. Introduction

L1: Introduction to Computational Linguistics

T1: PyTorch Review

Week 2. Syntax, Grammar and Parsing.

L2: Introduction to Transformer Language Models

L3: Introduction to Syntax and Parsing

L4: Dependency Grammar & Parsing - Part 1

Week 3. The Turing Test

L5: Dependency Grammar & Parsing - Part 2

L6: The Turing Test and Linguistic Levels

T2: Transition-based Parser and Gap Degree

Week 4. Lexical Semantics

L7: Lexical Semantics - Part 1

T3: Graph-based Parser

L8: Lexical Semantics - Part 2

Week 5.

L9: Word Sense Disambiguation

L10: Vector Semantics - Part 1

L11: Vector Semantics - Part 2

Week 6.

L12: Large Language Models

L13: Interpreting LLMs

T1: A2 Tutorial 1

Week 7.

L14: Interpreting LLMs (Part 2)

I’ve combined the entire LLM interpretation segment into a single slide for simplicity.

T2: A2 Tutorial 2

Week 8.

L15: Interpreting LLMs (Part 3)

L16: Interpreting LLMs & More

T3: A2 Tutorial 3

Week 9. 🏖️ Reading Week 🏖️

No class, work on your assignment and READ!

Week 10.

L17: Syntax and Interpretation

L18: Parsing with Features

T3: A2 Tutorial 4

Week 11.

L19: Parsing with Features

L20: Chart Parsing

A3 T1: TRALE Basics

Week 12.

L21: Chart Parsing (Part 2)

L22: Statistics & Parsing - PP Attachment Disambiguation

A3 T2: Subcategorization & Gap

Week 13.

L23: Statistics & Parsing - Statistical Parsing

L24: Statistics & Parsing - Unsupervised Parsing

A3 T3: Subcategorization & Gap

Week 14.

L25: Question Answering

L26: Statistics & Parsing - Unsupervised Parsing


Instructions


Tentative Syllabus

Roughly speaking, one topic per week. Some will take more lectures and some will take less. Please let us know what sub-topic you are interested in in the Course Content Survey!


Course Policies

Prerequisites

🚨

The University’s automatic registration system checks for prerequisites: even if you have managed to register for the class, you will be dropped from it unless either you had satisfied the prerequisite before you registered, or you received a prerequisite waiver.

Evaluation Policies

There will be three assignments, some quizzes and no final exam. For CSC 2501 students, there will also be 5 essays to write based upon assigned readings.

ComponentCSC485CSC2501
Assignment 130%25%
Assignment 230%25%
Assignment 330%25%
Quizzes10%10%
Course Content SurveyBonus 1%Bonus 1%
Essay 1~53% Each
🚨

No late assignments will be accepted except in case of documented medical or other emergencies.

Remark Requests

Requests for remarking an assignment must be made within seven (7) days of the return of the marked assignment via the remark request online form (link TBA).

Requests for remarking will be reviewed by the head TA or instructor if deemed necessary, consult with the grading TA, who originally mark the assignment. If the grading TA determines that the original grade was too high, the student’s grade may be lowered. If the grading TA determines that the original grade was too low, the student’s grade will be adjusted accordingly. Once all the remarks are completed, they will be released back to the students. The decision on the remark will be final and no further requests for a remark will be considered. Any or all other departmental and university policies on remarking applies as well.

Policy on Collaboration, Online Resources, AI Writing Assistance and Plagiarism

Collaboration on and discussion of quiz content is encouraged. No collaboration on homeworks or essays is permitted. The work you submit must be your own. No student is permitted to discuss or share homeworks with any other student from either this or previous years.

Posting solutions, materials, or handouts from assignments, quizzes, or essays to any public forum (including but not limited to Reddit, GitHub, and GitLab) is strictly prohibited. The use of any unauthorized online materials is also forbidden. Submitting any code or writing that is not your own constitutes an academic offence.

The use of AI writing assistance (ChatGPT, Copilot, etc) is allowed only for refining the language of your writings. You can use generative models for paraphrasing or polishing the you original content rather than for suggesting new content. Submitting any code generated by any AI assistants is strictly prohibited. The only exception is when you are specifically instructed to evaluate the behaviour and performance of these LLM models, in which case the models’ output must be clearly distinguished from the rest of the report.

Failure to observe this policy is an academic offence, carrying a penalty ranging from a zero on the homework to suspension from the university. See Academic integrity at the University of Toronto.