CS 505 -- Natural Language Processing
Wayne Snyder
Associate Professor of Computer Science
Cell: 617 966 (2^10+41) Email:
waysnyder@gmail.com
www.cs.bu.edu/fac/snyder/cs505/
Prerequisites
CS 131, CS 132, CS 237 or equivalent (discuss with me). It is also expected that you have
experience in Python programming equivalent to CS 111. Note that,
contrary to the Bulletin description, CS 365 is not
required this semester.
Description
Natural language processing (NLP) is a field of
Artificial Intelligence which aims to equip computers with the
ability to process natural (human) language. The course will
explore modern quantitative techniques for the automatic analysis of
natural language data using large corpora and statistical, machine
learning, and deep learning models. Although techniques from
linguistics (e.g., phonetics, morphology, grammars, semantics) are clearly
relevant to this subject, we do not assume any prior background in
linguistics, and we shall focus on topics which do not involve a
significant linguistics component. The emphasis will be on textual
corpora in English, but I hope to present an overview of speech
processing as the final topic in the course.
Course
Materials and Handouts
- We will cover about half of the following textbook ,
which is available for free online (and also on Amazon):Speech and Language
Processing, Jurafsky and Martin, 3rd edition (draft of January 12th,
2022). I hope to cover the first 10 chapters, and if there is time,
the last two (on speech processing). Some material will be
glossed over or skipped entirely, and I will be specific about what
to read when assigning chapters.
- Assignments, lecture slides, and additional readings/viewings will be provided on the class web
page.
- Videos related to the course will be posted on
my
YouTube channel, and linked from the class web page.
- We will use Piazza as a discussion forum, and Gradescope for
assignment submission and reporting grades. We will add you to
these sites the first week of class.
Assignments
- NLP is NOT a spectactor sport, and just as you can not
become proficient at the guitar by listening to someone else play
it, you can not learn NLP without doing lots of reading, thinking,
and a variety of exercises, both mathematical and programming.
- After a short introductory homework to get you up to speed on
Jupyter notebooks and Gradescope submission, we will do several
one-week assignments, then more involved two-week assignments, and
finally a group project on a topic of your choosing, which you will
present to the class by video. I expect that we will do 5-6
homeworks (plus the project).
- I will drop the lowest assignment at the end of the
term.
Tests
- There will be NO tests. :-)
Late Policy
Homeworks are due at midnight on Sunday in Gradescope. You can submit up to 24 hours late for a 10% penalty. Each of these deadllines has a 6-hour "grace period" so you may submit up to Monday morning 6am for full credit, and Tuesday morning 6am with the late penalty.
There will be no extensions to individuals except for "acts of
God" (Covid diagnosis, death in the family etc. -- your laptop breaking is not an act
of God, nor is a job interview). I let the grading process run as it will, and I make decisions about exceptions to policy ONLY at the end of term. So my apologies in advance when you ask me for an extension in a moment of crisis, I will unfortunately have to refuse you and remind you that the lowest homework is dropped for *exactly* this reason.
Grades
- 70% Homeworks (Again, I will drop the lowest homework score)
- 30% Final Project (code, writeup, and video presentation)
These percentages are tentative and may be changed at my discretion at any
time. Class participation, coming to office hours and wanting to pursue the material
beyond the scope of the lecture, Piazza posts with interesting links about the course material, etc. are wonderful and much appreciated, and surely will help your performance in the class.
Miscellaneous
- Except for "acts of God" as discussed above, I can not give individual extensions to homework/project deadlines, as this would be unfair to the rest of the class who
were required to observe the deadline. We all have occasions when other deadlines, crashed laptops, non-critical illnesses, etc.
get in the way of making a deadline. But there is simply no fair way in a class of this size to give individual extensions
to students who send me a frantic email after missing a deadline. Occasionally we extend a deadline for some good reason (e.g., snowstorm) but in that case I make it a general extension for all students.
To account for normal interruptions to your work, we give you a 6-hour grace period and drop the lowest homework.
If you feel that this does not adequately cover your particular
situation, I invite you to post a private message on Piazza at the end of term and to explain why. I can promise you that I will read it and consider it, but of course I will continue to insist on fairness to all students when I make a decision on your petition.
- There will be no incompletes in this class except for reasons of dire
illness near the end of a semester in which all previous work has been
completed satisfactorily.
- You can not redo any homework, or do extra work after the semester is
over to improve your grade, as this arrangement would then by fairness have to
be extended to the rest of the class (an impossible situation).
Collaboration and Academic Conduct
- You are encouraged to
discuss the material with one another in working on the
homeworks, but if you do so, you must list the people with whom you had
discussions.
- However, you must write your own code and debug your own
code. We will provide plenty of help in discussions, office hours,
and Piazza. If you are struggling, please contact me and we will try
to find a solution.
- I have zero tolerance for any kind of academic misconduct,
and be assured
that I will instantly report violations of the Academic
Code to the Academic Conduct Committee. I am a past member and chairman of
this committee.
- Gradescope contains a very sophisticated plagiarism
detection algorithm, which will be run on all assignments.