Workshop: Introduction to Data Cleaning in OpenRefine

Date: Wednesday, August 7, 2024
Time: 1:30pm - 3:00pm EST
Location: online

Do you ever get annoyed with a big spreadsheet that isn’t quite formatted correctly for your needs? Find yourself repeating simple tasks over and over? OpenRefine might be the answer to simplify and speed up your data cleaning, especially if you are working with text data. This ninety-minute hands-on online workshop will teach you how to install OpenRefine, set up a new project, and use a few of its most useful features. At the end, we’ll demonstrate some advanced features, including integration with WikiData, as inspiration for future projects. Sample data will be provided, but feel free to bring your own dataset too.

Register here!

Get assistance from a librarian for your project

The library offers one-on-one consultations for anyone in the Northeastern community working with text mining and computational text analysis. Consultations are available online via Zoom or by email. During a consultation we can answer questions and work with you to address topics including the following:

  • Ideas for locating, acquiring, and cleaning appropriate datasets for your topic
  • Recommendations for text analysis tools or methods to fit your project
  • Assistance using any of Northeastern's text mining platforms (Constellate, ProQuest TDM Studio, HathiTrust)
  • Assistance using common text mining and analysis tools (e.g., Python, R, AntConc, Mallet)
  • Training or troubleshooting to support text mining research

Schedule a Consultation

Request a workshop at Northeastern

We offer virtual guest workshops that can be requested by faculty, staff, or student groups interested in learning the basics of some popular tools and methods related to text mining and computational text analysis. The following menu lists some of our most commonly requested and delivered workshops.

Please note that any additional requests for extensive customization (e.g. use of specific datasets or creation of new learning objects) should be made as early as possible, and no later than 2 weeks before the date of the session, to ensure the best outcome.

  • Intro to Python (1 hr)
    If you're interested in using Python, this interactive workshop will get you jump started. No prior coding experience required. We will cover basic coding in Python and how to troubleshoot errors and common problems.
  • Intro to R/RStudio (1.5 hr)
    If you're interested in using R / RStudio, this interactive workshop will get you jump started. No prior coding experience required. We will cover opening RStudio for the first time, basic coding in R, and how to troubleshoot errors and common problems. You will use a sample dataset and write a script to create a graph.
  • Intro to Data Cleaning in OpenRefine (1 hr)
    If you already have some data and you're interested in improving your data preparation workflow, this interactive workshop will get you started with OpenRefine. We will cover opening OpenRefine for the first time, basic concepts behind OpenRefine and data cleaning, and built-in features to address common data cleaning needs. You will use a sample data set and prepare it for analysis.

Request a Workshop

Recommended Resource: Online Workshops from Constellate

Constellate is is the text analytics service from the not-for-profit ITHAKA - the same people who brought you JSTOR and Portico. In addition to providing a platform for text mining and analysis, they frequently offer free online workshops.

See Constellate's upcoming workshops.