Text Mining Using Tidy Data Principles

written by Sean Law and Benjamin Zaitlen on 2018-10-09

Text data is increasingly important in many domains, and tidy data principles and tidy tools can make text mining easier and more effective. In this talk, Julia will demonstrate how we can manipulate, summarize, and visualize the characteristics of text using these methods and R packages from the tidy tool ecosystem. These tools are highly effective for many analytical questions and allow analysts to integrate natural language processing into effective workflows already in wide use. We will explore how to implement approaches such as sentiment analysis of texts, measuring tf-idf, and building text models.


Julia Silge is a data scientist at Stack Overflow, with a PhD in astrophysics and an abiding love for Jane Austen. Julia worked in academia and ed tech before moving into data science and discovering R. She enjoys making beautiful charts, programming in R, text mining, and communicating about technical topics with diverse audiences.