This course introduces the students a set of methods to transform, model, analyze, and reason about text as
data. Over the course of the semester, we'll learn to apply natural language processing methods to problems
that span the areas of social sciences and humanities where the data is in the form of text.
The objective of the course is for the students to learn the application of text processing libraries
including scikit-learn,
gensim, spacy,
and huggingface on problems; learn techniques to collect and label
text; perform exploratory data analysis; learn to statistically test hypotheses using textual data; represent
text both in terms of linguistic structure features and low-dimensional distributed representations of words,
sentences, and documents; perform text-driven prediction; and learn about ethical issues surrounding the use
of text as data.
The course is targeted towards undergraduate students from various disciplines such as computer science, law,
sociology, etc. No formal technical background is assumed though some programming knowledge in Python is
expected (tutorials on Python will be shared before the course starts).
What this course is not about? This is not a course to learn the intricate details such of the
algorithms or the model architectures that power natural language processing methods; see CS 329 if you want
to learn that. Instead, we'll focus on using NLP methods as algorithmic instruments to perform measurements
on text data, and, through practice, learn the underlying challenges in this enterprise.
Sandeep Soni (PAIS 588)
Wednesday, 11am-12pm (in person); Friday, 11am-12pm (via Zoom); or by appointment
QTM 151 or CS 170; no technical background in data science is assumed but students are expected to know the
basics of programming, such as in Python.
Students will work in groups of 3 or 4 on a project with the following components.
Proposal and literature review
Students will propose the research question, motivate its rationale as an interesting question worth asking,
provide a sketch of the tools, methods and the timeline for the deliverables, and situate situating their
proposed work for the gap it will fill with respect to existing scientific literature on the topic
(Deliverable: 2 pages; minimum 5 sources)
Midterm report
Students will be asked to submit a midterm report describing the results from initial experiments.
Emphasis in this report should be on describing the methodology, establishing a concrete set of experiments
to answer the empirical question in the project, and establishing a validation strategy for the final
experimentation (Deliverable: 4 pages; minimum 10 sources)
Final report
The most important deliverable of the project is a final report that will include a complete description of
the work. The report will summarize the data and their collection methodology, methods, experimental
details and results, plus a thorough analysis. The report should be of high quality according to the standards
used to judge a conference submission (Deliverable: 4 pages, not including references)
To create the final report, you must use the template from this repo.
Presentation
Teams will present their work by preparing the poster and presenting it to the class and other Emory students/faculty.
The poster should give an adequate but high-level summary of the project.
(Deliverable: a poster)
Academic Integrity
All students will follow the Emory honor code.
With the exception of the group project, in which collaboration is allowed and encouraged, all submissions (homeworks
and problem sets) must be completed independently.
The use of large language models (eg. ChatGPT) and other generative AI technologies is discouraged for writing as well as source code.
Both for writing and source code, cite the appropriate source if you end up mentioning or using someone else's work.
All submission deadlines for homeworks and project deliverables will be strictly enforced;
exceptions will be made on a case-by-case basis and only if the student has a valid reason for needing an exception.
Students who violate the Honor Code may be subject to a variety of sanctions and are likely to fail the course.
Students with Disabilities
We will strive to make the class accessible to all students. To this end, if you need disability-related accommodations and
have an accommodation letter from OAS, please inform me.