dot

mark

 

Data

Classes

Bill McDonald
Professor of Finance

Thomas A. and James J. Bruder Chair in

   Administrative Leadership

Me.jpg

E‑Mail:

mcdonald.1@nd.edu

Address:

335 Mendoza College of Business

University of Notre Dame

Notre Dame, IN  46556

Telephone:

(574) 631‑5137

 

 

Textual Analysis Tools

This page contains some tools that are useful for textual analysis in financial applications.  The essential method of textual analysis goes by various labels in other disciplines such as content analysis, natural language processing, information retrieval, or computational linguistics.  A growing literature finds significant relations between stock price reactions to the sentiment of information releases as measured by word classifications such as those provided below.

ND Finance Dictionaries

Note:  We thank Cam Harvey and others who suggested some of the modifications we’ve included in v2 of these lists. The word lists are described in Loughran and McDonald (2009).

 

·         Negative Words-v2

·         Positive Words-v2

·         Uncertainty Words-v2

·         Litigious Words-v2

·         Modal Words Strong-v2

·         Modal Words Weak-v2

·         Download zip folder with all lists

Harvard-IV-4 Psychological Dictionary

TagNeg File with Inflections

·         Harvard IV Negative Word List_Inf.txt
Because of the inherent imprecision of stemming, we have expanded the Harvard list to include relevant inflections.

General Word Lists

·         Master Dictionary
Derived from release 4.0 of 2of12inf.  Extended to include words appearing in 10-K documents that are not found in the original 2of12inf word list.

·         Stop words

1.        Generic

2.        Names

3.        Dates and numbers

4.        Geographic

5.        Currencies


10-K Dictionary

·         10-K Dictionary

Tabulation of all words appearing in 10-K documents from 1994-2008, excluding sections tagged as tables with more than 25% numbers and excluding exhibits.  The file contains the word, word count, and number of documents containing at least one occurrence of the word. (csv format)

 

© 2009 University of Notre Dame