top of page

DIGITAL HUMANITIES

PYTHON PROJECT

Digital Humanities Project_Page_01.jpg
Digital Humanities Project_Page_02.jpg

This research project is completed as part of the CHNSHIS 202: Digital Methods for Chinese Studies graduate seminar requirements. In 2016, the Fairbank Center for Chinese Studies first launched the for-credit graduate seminar “Digital Methods in Chinese Studies” (CHNSHIS 202) through the Department of East Asian Languages and Civilizations, instructed by Fairbank Center research fellow Donald Sturgeon.

 

The course introduces graduate students in Chinese studies to programming skills and digital humanities techniques of direct practical relevance to research in their discipline. It covers effective use of digital resources, programming techniques with an emphasis on data preparation and extraction, textual analysis and topic modeling, and the visualization of complex datasets.

My project is presented on the official website of the Fairbank Center for Chinese Studies at Harvard University as an outstanding digital humanities research case study.

Project Intro:

 

“Scholar-Beauty fictions” refer to a genre of episodic novels focusing on the love story between a young student and an upper-class girl. It gradually developed into a stereotypical model of narrative near the end of the Ming dynasty and had a great vogue in the Qing dynasty. Usually the love story contains about 24 to 100 chapters, and their themes and contents are pretty much repetitive. So this genre is highly aware of its own tradition. In my final project, my goal is to use digital humanities tools to examine how this genre changed over time and to distinguish some meaningful variations & classifications within the genre.

019a21554b4c18000001bf72703086.jpg@1280w
bfc5d94eab464fdebf72d5cc6715b9d7.jpeg
Sun_Wen_Red_Chamber_2.jpg
Sun_Wen_Red_Chamber_5.jpg

I applied a variety of techniques in my project, including using similarity measures to compare how vocabulary usage varied across the texts, and using Support Vector Machines (SVMs, one type of machine learning algorithm) to train a statistical model to accurately distinguish between texts written in various different styles based on analysis of their content, also evaluating its reliability against known cases. I then applied a similar technique using SVMs to build a statistical model aiming to distinguish between novels composed in different time periods based on word usage. This model was trained and evaluated using data from texts with known authorship to demonstrate the reliability of its results, then applied to texts with questionable authorship to give predictions about their likely origin.

[P.S. This course was offered only to students and researchers specialized in Chinese studies, and all participants were fluent in the Chinese language. Therefore, the research texts that I selected and the project presentation inevitably included some Chinese characters, primarily because of the nature of the digital methodology applied in my textual analysis. In the presentation below, I have translated all important information into English for those who don't know Chinese languages, except for the titles of the Chinese texts, some fixed tokens, and some essential input data.]

​​See my final presentation slides and research results below:

bottom of page