DS 320 – Natural Language Processing and Classification

DS 320 – Natural Language Processing and Classification
DS 320 - Natural Language Processing and Classification

SEMESTER UNITS:

6
PREREQUISITE:

DS 300

< Back

Course Description

In this course, students will discover the essential tools and techniques for effective machine learning model development. Students will learn the importance of version control in managing code iterations and delve into ML platform tools for efficient experiment tracking, and gain advanced skills in the art of text preprocessing, including noise removal and feature extraction using N-grams and Bag of Words. Students will explain binary classification theory and implement logistic regression using sklearn and explore performance metrics like accuracy, precision, and recall, with a focus on handling class imbalance.

Students will delve into feature selection and hyperparameter tuning for optimal model performance, explore a range of advanced classification models, and learn to fine-tune them using techniques like grid search. This course intends for students to be equipped to build, evaluate, and select the best classification models for their projects.

Course Learning Outcomes

List the principles and importance of version control
in managing code and project iterations.
Utilize ML platform tools for experiment tracking and
management.
Describe text cleaning techniques, including noise
removal, tokenization, stemming, and lemmatization.
Apply methods such as N-grams and Bag of Words
to extract features from text.
Differentiate binary classification from regression
tasks and comprehend their concepts and theory.
Compare logistic regression to linear regression,
including theory and practical implementation using
sklearn.
Assess the performance of binary classifiers,
especially in the presence of class imbalance.
Identify key metrics such as accuracy, precision,
recall, and F1-score for evaluating classifier performance.