Projects

HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust [pdf] [data]
Sunyam Bagga and Andrew Piper
Journal of Open Humanities Data (JOHD 2022)

‘Are you kidding me?’: Detecting Unpalatable Questions on Reddit [pdf] [code] [data] [talk]
Sunyam Bagga, Andrew Piper and Derek Ruths
16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)

Detecting Narrativity Across Long Time Scales [pdf] [data & code]
Andrew Piper, Sunyam Bagga, Laura Monteiro, Andrew Yang, Marie Labrosse and Yu Liu
Computational Humanities Research Conference (CHR 2021)

Measuring the Effects of Bias in Training Data for Literary Classification [pdf] [code] [talk]
Sunyam Bagga and Andrew Piper
4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. International Committee on Computational Linguistics (COLING 2020)

Generalization Classification [book] [code]
Classifying sentences for whether they encode a “generalization” (or not) using deep learning models. Used in Chapter “Machine learning as a collaborative process.” In Can We Be Wrong? The Problem of Textual Evidence in a Time of Data. Cambridge: Cambridge University Press.

Stylistic Accommodation on Reddit [pdf] [code]
Caitrin Armstrong and Sunyam Bagga

Sentiment & Topic Analysis of Migrant Related Tweets [abstract] [report] [code]
Sunyam Bagga and Alayne Moody
Digital Humanities 2020 Conference, Ottawa, ON, Canada

Best Answer Prediction in Community-based Question-Answering Services [pdf] [code]
Sunyam Bagga, Qianyu Liu and Jin Guo

Opportunistic Self Organizing Migrating Algorithm for real-time Dynamic Traveling Salesman Problem [pdf]
Shubham Dokania, Sunyam Bagga and Rohit Sharma
51st Annual Conference on Information Sciences and Systems (CISS), held at Johns Hopkins University, USA, 2017