Resources
Course Glossary
Please follow this link to access the GoogleDoc containing our course glossary. Please post definitions (you can quote snippets from our readings) to our glossary
Datasets
- A curated list of datasets relevant to our course
- For CSV, text, and image files from some of the datasets on the list above, see this directory
Lessons Overview
Command Line
- Getting set up and a warm-up introduction to the command line fromProgramming Historian
- Introduction to the Command Line
- In-class Command Line exercises
- Command Line Cheatsheet
Using JupyterNotebooks
- Videos on using Jupyter notebooks:
Python
- Introduction to Python l [interactive cloud version]
- Python and Jupyter Notebook Tips
-
- Anatomy of a Python Script]
-
- Variables
-
- Data Types
-
- String Methods
-
- Conditionals and Comparisons
-
- Lists and Loops
- More practice with Python: Homework 4 [interactive cloud version]
- Introduction to Python Basics, continued [interactive cloud version]
- Recap of Python Bascis
- Manipulate, clean, and sort lists
- Introduction to Pandas [interactive cloud version] - working with tabular data in Python
- What is Pandas?
- Using Pandas to clean tabular data
- Cheatsheet: Operations we can perform on DataFrames
- Using Pandas to analyze data
- Finding out more about our dataset using Pandas
- Make a simple data visualization
- What is Pandas?
- Using Pandas to clean tabular data
- Python Pandas Cheat Sheet
- What is metadata? Exploring a dataset through its metadata [interactive cloud version]
- Hunches, Hypotheses, and Exploratory Data Analysis with Pandas
- Topic modeling
- In-browser topic modeling tool
- Reminder: the browser-based topic modeling tool is a little clunky. To use it on more than one text, you’ll have to format your corpus in a very particular way. Instead of uploading a directory of text files, you’ll have to concatenate all your documents into a single text file, where each line consists of a document
- Topic Modeling in Python
- Topic Modeling set-up instructions
- Introduction to Topic Modeling in Python [interactive cloud version]
- In-browser topic modeling tool
Text Analysis
- In Python
- Counting word frequencies
- Homework 6: Introduction to Text Analysis
- Introduction to Topic Modeling [interactive version]
- (Make sure you’ve completed the topic modeling set up instructions)
- Programming Historian Tutorials
- Matthew Lavin, “Analyzing Documents with tf-idf” (2019)
- Zoë Wilkinson Saldaña, “Sentiment Analysis for Exploratory Data Analysis” (2018)
- Digital Research Institute (DRI), “Introduction to Text Analysis with Python and NLTK”” (2018)
- Outside of Python
- Using the command line to find keywords in collections of texts
- Voyant Tools
- Introduction to Voyant
- Example of Voyant Tools session with our US Inaugural Addresses dataset
- WordCounter: a simple, browser-based tool for word frequency analysis
HTML, Web-scraping and OpenRefine
- HTML & collecting data from the Web
- What is HTML
- HTML and Web-scraping
- Collecting data from webpages: Application Programming Interfaces (APIs)
- Using
w-get
to download files
- OpenRefine
- Using OpenRefine to Web-scrape Song Lyrics from Genius
- Advanced tips for Web-scraping
Data Cleaning
- What are regular expressions
- Using regard expressions to clean data
Collaboration, Git, and GitHub
- Introduction to Git and GitHub
- “Mastering Markdown” (the language used in Markdown files (files ending .md) and in the prose portions of Jupyter Notebooks
- Create a GitHub website: “Getting Started With GitHub Pages” (Note that this is different than adding a Jupyter Notebook to a GitHub repository. Adding a Jupyter notebook or a .md file to a repository is another way of publishing your files.
Data Visualization
- In Python:
- Introduction to Python (Continued) [interactive version]
- Making a simple data visualization
- Introduction to Python: Pandas [interactive version]
- Making simple bar and pie charts using pandas data
- Hunches, Hypotheses, and Exploratory Data Analysis with Pandas
- Making simple scatter plots (3 examples) [interactive version]
- Making more complex static scatter plots (scroll down to bottom)
- Making an interactive scatter plot using Altair [interactive version]
- Introduction to Topic Modeling, visualizing topic models
- Mapping [interactive version]
- Network Analysis [interactive version]
- Introduction to Python (Continued) [interactive version]
- Other data visualization resources:
- WTFcsv : a simple, browser-based tool that gives you descriptive statistics on your spreadsheet data
- Voyant Tools
- WordCounter: a simple, browser-based tool for word frequency analysis
- Palladio: an open source, browser-based data visualization tool for exploring and visualizing tabular data - Matthew Lincoln’s [introduction to Palladio](https://(matthewlincoln.net/mapping-knoedler-palladio/) tutorial