ABSTRACT 
 
Title of Dissertation: ENVIRONMENTAL ADVOCACY MESSAGES: 
RELATIONSHIPS BETWEEN THE MESSAGES 
THAT CONSTITUENTS SEND TO DECISION 
MAKERS AND ORGANIZATIONAL 
ENGAGEMENT 
 
Directed By: Dr. Miroslaw J. Skibniewski 
Department of Civil and Environmental 
Engineering 
 
Environmental advocacy organizations aim to help citizens contact their policymakers, to 
recruit new members, and to increase their contacts? level of engagement with 
organization issues. They use online petitions and form-letter services for these purposes. 
These services put citizens in contact with policymakers and encourage citizens to take 
follow-up actions, such as sending another message, referring a friend, or making a 
donation. While these services effectively recruit members, they marginally influence 
policymakers. To increase influence, organizations now ask petitioners to include 
personal messages in their communications. This dissertation asks if text analysis of these 
personal messages can help advocacy organizations further fulfill their recruitment and 
engagement goals. It investigates text-metrics both for predicting engagement from 
existing contacts and for services, such as chatbots, to suggest follow-up actions to new 
contacts. Methods employ rule-based text analysis tools (LIWC, VADER, Flesch 
Reading Ease, and Regular Expressions) to pilot the use of pronouns, sentiment, writing 
complexity, and the identification of personal stories as predictors of engagement. Data 
include over two million messages and nearly 500,000 personal messages from over 
150,000 individuals supporting sustainable policies and projects. Results reveal 
 
 
relationships between messages and two engagement factors: (1) the number of messages 
that groups of contacts send and (2) payment of membership dues. Results also bolster 
research that highlights the importance of identifying contacts who can share stories 
about how environmental issues have affected them. Conclusions encourage advocacy 
organizations and policymakers to analyze messages to increase engagement and 
understand constituency support of policies and projects. Future work may integrate text 
analysis into membership models and advocacy services. Future work may also improve 
personal story classification and investigate machine-learning for identifying potential 
members. 
  
 
 
 
 
 
 
ENVIRONMENTAL ADVOCACY MESSAGES: RELATIONSHIPS BETWEEN 
THE MESSAGES THAT CONSTITUENTS SEND TO DECISION MAKERS 
AND ORGANIZATIONAL ENGAGEMENT 
 
by 
 
David F. Choy 
 
Dissertation submitted to the Faculty of the Graduate School of the 
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctorate of Philosophy 
2019 
 
 
 
 
 
 
 
Advisory Committee: 
Prof. Miroslaw J. Skibniewski, Chairman 
Prof. Gregory B. Baecher 
Assoc. Prof. Qingbin Cui 
Asst. Prof. Michelle (Shelby) Bensi 
Prof. Brian Butler, Dean?s Representative 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Copyright 
David F. Choy 
2019 
 
 
 
ACKNOWLEDGEMENTS 
This dissertation was written with funding and support from my dissertation advisor Prof. 
Miroslaw J. Skibniewski, dissertation committee members Prof. Gregory Baecher, Assoc. 
Prof. Qingbin Cui, Asst. Prof. Michelle (Shelby) Bensi, Prof. John H. Cable, Prof. Dr. 
Brian Butler, Dean?s representative and Assoc. Dean of the College of Information 
Studies, and the faculty and staff in the Project Management Center for Excellence in the 
Department of Civil and Environmental Engineering at the University of Maryland. 
Development relied on advice from, Dr. S. Wojciech Sokolowski, Senior Research 
Associate, John Hopkins University, Gregory Cantori, BAA, CANTORI principal, James 
Mumm, MA, Campaigns Director, Greenpeace USA and former Director of Research 
and Development, People?s Action, Dr. Clara Man Cheung, Lecturer in Project 
Management, University of Manchester, Adam Kapp, Product Management Director, 
Sierra Club, Seth Long, Online Organizer, Sierra Club, Bruce Cohen, IT Super Volunteer 
and Board Member, Bike Maryland, Walter Davis, Former Executive Director, National 
Organizers Alliance, and Naveed Ahmed, Internal Audit Manager, World Wildlife Fund. 
Special thanks to Seth Long for first revealing to me, and for emphasizing the importance 
of personal stories in constituent messages for nonprofit organizers. This dissertation 
would not be possible without the love, patience, and critical input from my family, 
especially, Rachel Yood, Cory Choy, Lesley Choy, Steven Choy, Andrea Schnieder, and 
Sarah Ahmed. Thanks to my coworkers at the University of Maryland and King Cow 
Interactive LLC: Thank you Kathy Frankle, Brooks Clarke, and Jesse Kovach. Thank you 
Vigilante Coffee, College Park: the best place to write in the morning. Thank you Kemp 
Mill flock: Sweet Penelo Pea, Rita, Luna Bird, Mineola, Clementine, Kumquat, Murray, 
and Sir Arthur Durham Bun Bun. 
ii 
 
TABLE OF CONTENTS 
 Introduction ........................................................................................................1 
 Background and Need ..............................................................................................2 
1.1.1. Green Technology Needs Green Policy: Advocacy Organizations, 
Policymakers, and Project Managers .....................................................2 
1.1.2. Exploration: Membership ......................................................................9 
1.1.3. In the Words of Advocacy Data Analysts and Product Managers .......11 
1.1.4. Advocacy Campaign and Data Flow and Potential Beneficiaries .......12 
 Goal, Objectives, and Research Questions ............................................................13 
 Objective One Hypotheses: Message Content and Number of Messages Contacts 
Send........................................................................................................................15 
 Objective Two Hypotheses: Exploration of Personal Stories, Sentiment, Writing 
Level, Popular Words, Groups of Words, and Membership..................................16 
 Literature & Technology Review .....................................................................18 
 Petitions, Slacktivism, Creative Campaigns, and Letters in Between ...................18 
 Custom, Individual, Personal, Testimony ..............................................................21 
 Lost Voices and Secret Lives: Personal Stories and Personal Pronouns ...............22 
 Listening to Power Before Listening to Those Speaking Up to It .........................23 
 VADER and Flesch Ease of Reading Tests ...........................................................24 
 Methodology ....................................................................................................27 
 Approach ................................................................................................................27 
 Data: Terms, Collection, and Database Construction ............................................30 
3.2.1. Original Data: Contact Data in Message Records ...............................31 
3.2.2. Messages Category Terms: Messages, Custom Messages, Personal 
Messages, and Personal Stories ...........................................................33 
3.2.3. Collection Period .................................................................................36 
3.2.4. Database Schema .................................................................................36 
3.2.5. Database Construction: Creating Message and Contact Tables ..........39 
3.2.6. Example Database Construction ..........................................................42 
 Objective One Methods: Relationships Between Messages and the Number of 
Messages that Groups of Contacts Send ................................................................45 
3.3.1. Hypothesis One: Personal Pronouns ....................................................45 
3.3.2. Hypothesis Two: Personal Messages ...................................................52 
3.3.3. Hypothesis Three: Message Length .....................................................53 
3.3.4. Example Objective One Calculations ..................................................54 
 Objective Two Methods: Membership Exploration ..............................................58 
iii 
 
 Tools ......................................................................................................................58 
 Results for Objective One: Number of Messages ............................................61 
 Hypothesis One: Relationships Between Pronouns and the Number of Messages 
that Contacts Send..................................................................................................61 
4.1.1. Hypothesis One Test Results ...............................................................61 
4.1.2. Additional Observations and Calculation Checks for Hypothesis One
..............................................................................................................68 
 Hypothesis Two: Relationships Between Personal Messages and the Number of 
Messages that Contacts Send .................................................................................85 
 Hypothesis Three: Relationships Between Message Length and the Number of 
Messages that Contacts Send .................................................................................87 
 Results for Objective Two: Membership Exploration .....................................89 
 Exploration One: Membership as a Measure of Organizational Engagement .......91 
5.1.1. Membership and The Number of Messages Sent ................................91 
5.1.2. Membership and LIWC Pronoun Rates ...............................................95 
5.1.3. Membership and Message Length .....................................................101 
 Exploration Two: Ungrouped Correlations .........................................................102 
 Exploration Three: Personal Stories ....................................................................107 
5.3.1. Personal Stories and Family...............................................................111 
5.3.2. Self-Identification Predictors for Personal Stories and Membership 115 
 Exploration Four: Flesch Reading Ease ...............................................................126 
 Exploration Five: Sentiment ................................................................................133 
 Exploration Six: Top Words ................................................................................139 
 Exploration Seven: LIWC Scores and Membership ............................................141 
5.7.1. Pronoun Exceedance Tests ................................................................141 
5.7.2. Pronoun Exceedance Test Results .....................................................141 
5.7.3. Other Notable LIWC Dimensions: Swear Words, Punctuation, 
Nonfluencies, Family, and Friends ....................................................145 
 Discussion ......................................................................................................147 
 Objective One Discussion: Messages per Contact as a Measure of Organizational 
Engagement..........................................................................................................147 
6.1.1. Pronouns and Messages per Person ...................................................147 
6.1.2. The Pronouns of Environmental Advocacy .......................................148 
6.1.3. Personal Messages Rates and Word Counts ......................................149 
 Objective Two Discussion: Exploring Membership, Personal Stories, Sentiment, 
and Writing Simplicity .........................................................................................151 
iv 
 
 Limitations and Two Database Gotchas ..............................................................155 
 Conclusions and Future Work ........................................................................157 
 Text Analysis for Online Advocacy Organizations .............................................157 
 Test Analysis for Policymakers, Service Providers, and Stakeholder Managers 160 
 Future Work .........................................................................................................163 
7.3.1. Engagement Framework and Investigating Relationships of Messages 
and Time Use Profiles ........................................................................163 
7.3.2. Doing What You Love or Marginalizing ?Lost Voices? ...................164 
7.3.3. Improving Online Advocacy Services ...............................................165 
Appendix A. States and Territories ..................................................................................168 
Appendix B. Personal Story Queries ...............................................................................169 
B.1. Simple MySQL Searches for Personal Stories ....................................................169 
B.2. Regular Expression Searches for Personal Stories ..............................................170 
B.3. Self-Identification with Nouns .............................................................................171 
B.4. Activity Self-Identification with Verbs ...............................................................171 
B.5. Swear Words ........................................................................................................172 
B.6. Finding Members With Matching Messages .......................................................173 
B.7. Personal Story Search Reference Tables .............................................................173 
B.8. Personal Story Reference Table 1. Basic MySQL Searches for Personal Stories 
(LIKE and REGEX) .............................................................................................174 
B.9. Personal Story Reference Table 2. First-Person Singular Self-Identification with 
Nouns ...................................................................................................................176 
B.10. Personal Story Reference Table 3. First-Person Singular Self-Identification with 
Verbs ....................................................................................................................177 
B.11. Personal Story Reference Table 4. Swear Words ................................................180 
Appendix C. Validation of VADER for Environmental Advocacy Messages Sent to 
Policymakers ....................................................................................................................181 
C.1. Validation Summary and Introduction to Precision, Recall, and F-Score Measures
..............................................................................................................................181 
C.2. Validation with a Single Human Reviewer .........................................................183 
C.3. Validation with a Multiple Human Reviewers ....................................................188 
C.4. Validation Conclusion and Recommendation .....................................................190 
Appendix D. Defining and Validating a Model for Classification of Personal Stories ...192 
v 
 
Glossary ........................................................................................................................198 
References. .......................................................................................................................201 
 
  
vi 
 
LIST OF TABLES 
Table 3.1.1 Independent Message Predictor Variables and Dependent Engagement 
Variables ........................................................................................................................... 30 
Table 3.2.1 Messages Table .............................................................................................. 37 
Table 3.2.2 Contacts Table ............................................................................................... 38 
Table 3.2.3 Example Source Messages............................................................................. 43 
Table 3.2.4 Example Source Messages............................................................................. 43 
Table 3.2.5 Example Combined Messages Table ............................................................. 44 
Table 3.2.6 Example Derived Contacts Table .................................................................. 44 
Table 3.3.1 Three Methods to Relate Text Metrics to the Number of Messages that 
Contacts Send.................................................................................................................... 48 
Table 3.3.2 Example Message Table with Pronoun Rate and Word Count Fields ........... 55 
Table 3.3.3 Example Contact Table with Lumped LIWC and Word Count Fields.......... 56 
Table 3.3.4 Example Contact Groups ............................................................................... 57 
Table 3.5.1 Analysis Tools, Development Environments, and Python Libraries ............. 59 
Table 3.5.2 Platforms and Node Packages for Validating VADER Sentiment and 
Prototyping Services ......................................................................................................... 60 
Table 3.5.3 Study Programming Languages ..................................................................... 60 
Table 4.1.1 Relationships Between Group Average Pronoun Use Rates and The Number 
of Messages Sent by Contacts........................................................................................... 63 
Table 4.1.2 Pronoun Rate Comparison ............................................................................. 71 
Table 4.1.3 Effects of Limiting Data by Minimum Word Count (WC) on the Correlation 
Between the Use of First-Person Plural ?we? Pronouns and Groups of Contacts Who 
Have Sent the Same Number of Messages. ...................................................................... 80 
vii 
 
Table 4.3.1 Membership Rates for Large, National Environmental Nonprofit 
Organizations with Online Petition or Letter-Writing Campaigns. .................................. 91 
Table 5.3.1 Husband and Wife Personal Story Observation Contingency Table and 
Calculations..................................................................................................................... 113 
Table 5.3.2 Husband and Wife Personal Story Expected Values Contingency Table and 
Calculations..................................................................................................................... 113 
Table 5.3.3 Membership Rates (m), Sizes (contacts, n), and Chi-Squared Test P-Values 
for Family Conditions (true, c, and not true, ~c) ............................................................ 114 
Table 5.3.4 Self-Identification and Membership ............................................................ 116 
Table 5.3.5 Residence and Membership ......................................................................... 118 
Table 5.3.6 Work, Occupation, and Membership ........................................................... 122 
Table 5.3.7 Activism Verbs Used in the First-Person .................................................... 123 
Table 5.3.8 Outdoor Verbs.............................................................................................. 124 
Table 5.3.9 Swear Words and Membership .................................................................... 125 
Table 5.5.1 Membership Rates and Group Sizes for Contacts Grouped by VADER 
Sentiment Scores ............................................................................................................. 137 
Table 5.5.2 Example Messages with Positive and Negative Sentiment ......................... 138 
Table 5.6.1 Popular Words and Membership ................................................................. 140 
 
viii 
 
LIST OF FIGURES 
Figure 1.1.1 Advocacy Campaign Dataflow Diagram ...................................................... 13 
Figure 3.2.1 Message Categories ...................................................................................... 35 
Figure 4.1.1 Group Average Use of Pronouns (%) vs. Messages Sent ............................ 64 
Figure 4.1.2 Group Average Use of Personal Pronouns (%) vs. Messages Sent .............. 64 
Figure 4.1.3 Group Average Use of ?I? Pronouns (%) vs. Messages Sent....................... 65 
Figure 4.1.4 Group Average Use of ?We? Pronouns (%) vs. Messages Sent .................. 65 
Figure 4.1.5 Group Average Use of ?You? Pronouns (%) vs. Messages Sent ................. 66 
Figure 4.1.6 Group Average Use of ?She/He? Pronouns (%) vs. Messages Sent ............ 66 
Figure 4.1.7 Group Average Use of ?They? Pronouns (%) vs. Messages Sent ................ 67 
Figure 4.1.8 Group Average Use of Impersonal Pronouns (%) vs. Messages Sent ......... 67 
Figure 4.1.9 Plot of Table 4.1.2. Comparing the Use of Pronouns in Personal Message 
Data with the Use of Pronouns in LIWC Twitter and General Data Sets......................... 71 
Figure 4.1.10 Average Use of Pronouns (%) vs. Messages Sent Expressed by Group 
Maximums of (a) Contact Minimums, (b) Contact Averages, and (c) Contact Maximums
........................................................................................................................................... 73 
Figure 4.1.11 Use of All Pronouns vs. Number of Messages Sent Expressed by Group 
Averages of (a) Contact Minimums, (b) Contact Averages, and (c) Contact Maximums 73 
Figure 4.1.12 Group Average Use of ?We? Pronouns (%) (left axis) and Group Size 
(right linear axis) vs. Messages Sent ................................................................................. 75 
Figure 4.1.13 Group Average Use of ?We? Pronouns (%) (left axis) and Group Size 
(right log axis) vs. Messages Sent..................................................................................... 76 
Figure 4.1.14 Group Average Use of all LIWC Pronouns Dimensions (%) (left axis) And 
Group Size (right log axis) vs. Messages Sent. ................................................................ 77 
ix 
 
Figure 4.1.15 Number of All Messages and Number of Personal Messages (left axis) and 
Group Size (right axis) vs. Messages Sent........................................................................ 78 
Figure 4.1.16 Positively Skewed Distribution of Word Count ......................................... 79 
Figure 4.1.17 Effects of Limiting Data by Minimum Word Count (WC) on the 
Correlation Between the Use of First-Person Plural ?we? Pronouns and Groups of 
Contacts Who Have Sent the Same Number of Messages. .............................................. 80 
Figure 4.1.18 Group Average Use of ?We? Pronouns Weighted by Message Length and 
then by Contact (%) vs. Messages Sent ............................................................................ 82 
Figure 4.1.19 Group Average Use of ?We? Pronouns (%) vs. Message Sent After July 1, 
2018................................................................................................................................... 84 
Figure 4.1.20 Group Average Use of ?We? Pronouns (%) vs. Personal Message Sent ... 84 
Figure 4.2.1 Average Personal Message Rate vs. Messages Sent per Contact ................. 86 
Figure 4.2.2 Average Personal Messages vs. Messages Sent per Contact ....................... 86 
Figure 4.2.3 Number of All Messages and Number of Personal Messages for Groups of 
Contacts Who Have Sent the Same Number of Messages from One to Twenty .............. 87 
Figure 4.3.1 Word Count vs. Number of Messages Sent.................................................. 88 
Figure 5.1.1 Membership Rate (%) vs. Number of Messages Sent Membership rates 
range from 7%, for the group of contacts who sent one message, to 37%, for the groups 
of contacts who sent 13, 15, 16, 17, and 21+ messages. Membership rates are percentages 
of members for groups of contacts.................................................................................... 93 
Figure 5.1.2 Average Number of Messages and Personal Messages Sent Per Contact 
Organized by Conditions of Membership and Whether a Contact has Sent a Personal 
Message............................................................................................................................. 95 
x 
 
Figure 5.1.3 Membership Rate (%) vs. LWIC Pronoun Dimensions Rates (%) .............. 97 
Figure 5.1.4 Membership Rate vs. Average Word Count .............................................. 102 
Figure 5.2.1 Continued. Relationships (R) between Individual Contact Linguistic Score 
Averages and Engagement (Messages, Personal Messages, and Membership) for 
Minimum Average Word Counts ? avg(pm) ? of 0, 25, 50, and 75. .............................. 104 
Figure 5.3.1 Family Words and Membership ................................................................. 114 
Figure 5.3.2 Self-Identification and Membership ........................................................... 116 
Figure 5.3.3 Self Gender Identification and Membership .............................................. 117 
Figure 5.3.4 Residence and Membership ........................................................................ 119 
Figure 5.3.5 Work, Occupation, and Membership.......................................................... 122 
Figure 5.4.1 Membership Rates for Groups of Contacts with Minimum Flesch Reading 
Ease Scores ..................................................................................................................... 127 
Figure 5.4.2 Group Size (Number of Contacts) for Groups of Contacts with Minimum 
Flesch Reading Ease Scores............................................................................................ 127 
Figure 5.4.3 Membership Rates for Groups of Contacts with Average Flesch Reading 
Ease Scores ..................................................................................................................... 129 
Figure 5.4.4 Group Size (Number of Contacts) for Groups of Contacts with Average 
Flesch Reading Ease Scores............................................................................................ 129 
Figure 5.4.5 Membership Rates for Groups of Contacts with Maximum Flesch Reading 
Ease Scores ..................................................................................................................... 131 
Figure 5.4.6 Group Size (Number of Contacts) for Groups of Contacts with Average 
Flesch Reading Ease Scores............................................................................................ 131 
Figure 5.5.1 Membership Rates for Contact Minimum VADER Scores ....................... 134 
xi 
 
Figure 5.5.2 Membership Rates for Contact Average VADER Scores .......................... 134 
Figure 5.5.3 Membership Rates for Contact Average VADER Scores .......................... 135 
Figure 5.5.4 Membership Rates for VADER Scores (Average, Min, and Max) ............ 135 
Figure 5.5.5 Group Sizes for Max Compound VADER Score Conditions .................... 136 
Figure 5.5.6 The Difference Between the Membership Rates for Maximum Compound 
VADER Score Conditions and Their Alternative Conditions ........................................ 136 
Figure 5.7.1 Membership Rates for Exceedance Conditions .......................................... 143 
Figure 5.7.2 Membership Rates for Alternative Exceedance Conditions ....................... 143 
Figure 5.7.3 Membership Rates for Minimum LIWC Scores ........................................ 144 
Figure 5.7.4 Membership Rates for Alternative Minimum LIWC Scores (Maximums) 144 
 
  
xii 
 
 INTRODUCTION 
This dissertation investigates relationships between the words that constituents write to 
decision makers and these constituents? engagement with environmental nonprofit 
organizations. Findings benefit advocacy organizations, developers of online advocacy 
services, policymakers, and civil project managers. Findings contribute to research in 
applications of linguistic analysis processing to predict behaviors (e.g. McHaney et al. 
2018, Pennebaker 2011, Robinson 2013) and research in the value of personal stories 
(Sandhu 2017, The Congressional Management Foundation 2017, The Social Change 
Agency 2017a, 2017b, Karpf 2016). Methods employ popular rule-based linguistic tools, 
including the Natural Language Toolkit (Bird et al. 2019), Linguistic Inquiry and Word 
Count 2015 (LIWC 2018), Valence Aware Dictionary and sEntiment Reasoner (VADER; 
Hutto and Gilbert 2014), and Flesch Reading Ease Analysis (Flesch 1948). Data include 
over two million messages and nearly 500,000 originally authored messages from over 
150,000 individuals distributed across the United States. Messages support campaigns to 
preserve national parks, curb toxic emissions, and expedite U.S. energy independence. 
Results provide evidence to support advocacy organizations delivering messages, and the 
policymakers reading them, to employ text analysis tools in order to predict 
organizational engagement and understand constituency support of civil and 
environmental policies and projects. 
1 
 
 Background and Need 
1.1.1. Green Technology Needs Green Policy: Advocacy Organizations, Policymakers, 
and Project Managers 
Civil and environmental scientists and engineers recognize climate change. They develop 
ways to provide renewable energy, limit greenhouse gas emissions, and recycle waste. 
Project managers recognize that research innovations, however, only transition to 
practice, and scale, through policy and community support. Policy determines the 
direction and success of studying and protecting our environment. It regulates how 
communities use natural resources to generate electricity. It protects habitats and national 
parks. It determines how NASA budgets earth science vs. space exploration. It 
encourages and incentivizes recycled materials in pavement. It makes residential 
investment in solar energy feasible for homeowners. 
In the U.S., local and national nonprofit organizations advocate for environmental 
policies in several ways ? education and awareness campaigns, petitions, letter-writing 
campaigns to policymakers and editors, clean-ups, protests, legal action, and 
investigations. Most importantly, advocacy organizations hold policymakers accountable 
for their promises to protect the environment and deliver energy independence from fossil 
fuels, and advocacy organizations expose policymakers when they break their promises. 
These advocacy organizations use petitions and online letter-writing campaigns 
to, explicitly, empower residents to connect with their policymakers, and advocate for 
environmental sustainability. These petitions and letter-writing campaigns, less explicitly, 
also help advocacy organizations recruit participants (Su?rez 2009, Cruickshank et al. 
2010, Carpenter 2016, Parry et al. 2011, Jacobs 2016), fulfill advocacy organizations? 
2 
 
needs to understand the behaviors and demographics of their constituents (e.g. members, 
volunteers, allies), and empower participants to take first steps up Arnstien?s ?ladder of 
citizen participation? (i.e. the ?engagement ladder,? Arnstien 1969): sign more petitions, 
send more letters, send more personal letters, share more personal stories that support 
specific campaigns, enlist their friends, partner and organize with the organizations, and 
become leaders themselves. Joining an organization as a dues-paying member is also part 
of this more-or-less explicit intent of organizations? use of petition tools and surveys. 
Advocacy organizations ask citizens to support a laudable cause before asking for money. 
For example, President Barack Obama?s campaign ladder consisted of four rungs: (a) 
liking the presidential campaign on a Facebook page, (b) signing a birthday card, (c) 
filling out a survey or sharing a personal story ? and at the top ? (d) contributing in 
exchange for campaign swag. This ladder helped the campaign successfully mobilize a 
large, grassroots base. 
In the same way that an online marketing firm or political campaign recognizes an 
ad click as an action, advocacy organizations recognize, and carefully track, the 
behaviors of their contacts in contact relationship management (CRM) databases (e.g. 
Blackbaud Raiser?s Edge, Convio, Neon, Salesforce). Commercial companies refer to 
CRM services as customer relationship management services; advocacy organizations 
refer to CRM services as constituent relationship management services. Both use CRM 
services, however, in similar ways. Both collect contact information, event attendance 
records, donation histories, demographics, addresses, interests, household relationships, 
and other interactions. In A/B hypothesis tests, both compare levels of response to 
3 
 
different email news headlines and social media posts (Karpf 2016) with organizational 
engagement. 
One unique set of data that advocacy organizations collect are the messages that 
their contacts send to their policymakers. They collect them through petition, letter-
writing, and chat-bot services that they provide online. They refer to these tools as 
advocacy actions services, and refer to the messages that constituents author and send 
through them as advocacy action messages. Environmental advocacy organizations use 
the information from online petitions and letter-writing campaigns to learn more about 
their contacts and interact with them. When collecting signatures on door-to-door 
canvases, canvassers can write down notes about their conversations (or door slams), 
political yard signs, the demographics of the people they meet, the family members of the 
people they meet, and more. Advocacy organizations can then centrally parse this 
information into CRM fields. Online campaign managers learn different, seemingly more 
limited kinds of information than canvassers ? letters to Congress, for example, require 
citizens to report their address or zip code to be taken seriously by members for 
Congress. As the message carrier, advocacy organizations can then collect these zip 
codes and feed them into services like WealthEngine (2019) to learn more about the 
potential of message writers to donate to the organization and join the organization as 
dues-paying members. 
A new data analyst at one large environmental advocacy organization calls its 
organization?s database ?well collected, but not well informed? (pers. comm 2018) ? 
meaning the organization has collected information about its contacts, but the 
organization still has work to do to extract actionable evidence from the information. 
4 
 
Organizations are currently in the processes of developing lumped engagement scores for 
their contacts to make their data more meaningful. They are first looking at fields that fit 
nicely into Boolean, integer, and short text fields. They are looking at, for example, the 
number of events a person has attended, their contribution history, their zip code, their 
age, their gender, and the issues that they have expressed interest in on surveys. 
While organization analysts are hard at work modeling engagement, campaign 
organizers are paying attention to research from the Congressional Management 
Foundation (2017) and the Social Change Agency (2017a, 2017b), who have revealed 
people with lived experiences affected by campaign issues will be noticed by 
policymakers, climb the engagement ladder more quickly, and can become campaign 
organizers themselves. Seth Long, regional online organizer, agrees. He wants to develop 
a deeper understanding of how personal messages impact ?advocacy outcomes, equity 
values, and movement building: organizing, communications, and legal? at the Sierra 
Club (pers. comm. 2018). Personal stories, additionally, become testimony in courts, and 
ethos and pathos in articles.  
Simultaneously, while analysts are building engagement models, and while 
campaign managers are recognizing the importance of personal stories, digital product 
managers ? in advocacy, in congressional offices, and everywhere ? are hiring 
developers to build and add chat-bot services to their portfolio of communication tools. 
Form-based action campaigns on websites still exist, but chatbots reach people in 
focused, personal ways that websites cannot. They operate inside the communication 
tools people already use to connect with their friends and associates on a regular basis ? 
sms, iMessage, Facebook Messenger, etc. They democratize action without 
5 
 
overwhelming congressional offices, and have recently become successful in doing so 
(Putorti 2019).  
Considering analysts? goals to understand their organization?s members and 
contacts with the data they have, organizers? recognition and search for those with lived 
experiences, and product managers? and service developers? new efforts to develop 
chatbots, this study tests deriving several predictor metrics from just one of the fields that 
organizations are collecting data for, but are not currently utilizing without manual 
review. The metrics from this one field could be useful to all three of these types of 
people in the environmental advocacy world ? advocacy organization managers, 
campaign organizers, and product developers. The metrics from this one field may also 
be useful to policymakers receiving advocacy messages, and the civil project managers 
that policymakers share data with, in their search to understand and highlight, whether 
fairly or not, the opinions of their constituents concerning the environmental impacts of 
their policy decisions. This one field is the personal message text field where activists 
write their messages. 
Large advertising companies have built their businesses with machine learning 
and natural language processing (NLP). Google, for example, has a history of processing 
user emails to support ad targeting. Facebook extracts ?entities? from business messages, 
such as greetings, sentiment, location, and quantities (Facebook 2019). This study asks if 
analysis of personal messages could also help organizations paint a more comprehensive 
picture of their organization, explain constituent behaviors, and increase organizational 
engagement in ways that business-as-usual methods (e.g. demographic profiling, 
relationship tracking, interaction tracking), alone, cannot. 
6 
 
Treating this personal message field as a Boolean variable ? answering the 
question, did a contact originally author and attach a personal message to their 
communication or not ? this study confirms that the presence of content in the field, 
alone, can act as a predictor of engagement without further analysis. Some baseline 
results from this study confirm what campaign organizers say ? contacts who send 
personal messages are also more likely to send more messages and make financial 
contributions towards organization membership. Results show that most contacts (97% of 
the study contacts) who write personal messages at rates of 18% or higher are also more 
likely to send more than one message. Results also show the membership rate for those 
sending personal messages is 27% compared to the overall 13% membership rate for 
those sending any type of message, personal or otherwise ? more than double. 
Beyond the presence of sending personal messages at all, this study applies rule-
based linguistics analyses to messages to learn more about their authors. It begins by 
asking if analysts can use frequencies of pronouns in messages to predict the number of 
messages a contact will send. To do this, this study begins by using the Linguistic Inquiry 
and Word Count (LIWC) tool, which has been successful in both predicting human 
behaviors, as well as deepening the academic understanding of how people write and 
speak in different situations. LIWC is well established; textbook writers teach students 
about it (e.g. Krippendorff 2018) and scholars have cited articles describing its 
development and operation (e.g. Pennebaker et al. 2015) thousands of times. 
Of interest to this study, LIWC is often used to analyze the words of people in 
power. Lenard (2016), Jones (2017), and Pennebaker (2011) apply it to U.S. candidates 
and political figures. It is no-doubt interesting to see how candidates and politicians in 
7 
 
power talk to their constituents, their opponents in debates, and their fellow 
representatives on the floor. This study flips the focus of these analysis to study the 
people speaking up to their policymakers instead of studying the way policymakers speak 
(at times, down) to them. From an engineering project management and public 
representative point of view, understanding and empathizing with customers and 
constituents is key to serving them. 
Beyond LIWC, researchers have studied the words that people have used in 
reaction to political candidates, environmental policies, energy, and construction projects 
? new and proposed (e.g. Wang 2012, Ding 2018). These studies are written for 
candidates, lawmakers, project managers, and project stakeholders that are judging risk 
of, and the public perception of enacting policies and making project decisions. These 
audiences are often, but not necessarily, concerned with the environmental impacts of 
their projects. In the same way this study turns its focus away from learning about how 
policymakers talk to their constituents, and to the way constituents talk to policymakers, 
it also deprioritizes how lawmakers might evaluate the risk and public acceptance of a 
project (for bad or good), and prioritizes how environmental advocacy organizations can 
improve and support (or not support) projects to keep the earth green. This study is also 
different than past and upcoming studies (e.g. Ding 2018, Li et al. 2019) in that it studies 
messages that are directly written to policymakers vs. public tweets. It tests relationships 
between messages and data inconvenient to collect by anyone other than advocacy 
service providers and their advocacy organization clients. Even policymakers, who are 
the recipients of environmental advocacy messages concerning a particular issue, often 
8 
 
do not have access to the messages sent to other policymakers on similar, or even the 
same, issue. 
This dissertation (a) studies the words of constituents instead of the words of 
policymakers and leaders, (b) focuses first on how environmental advocacy organizations 
can affect policies and projects before it focuses on how policymakers and project 
managers can judge public acceptance of their proposals and projects, and (c) relies on 
data convenient to collect only by advocacy organizations and service developers. The 
methods and findings from this dissertation are nonetheless significant to policymakers 
and project managers. Results support offices of policymakers to employ methods in this 
dissertation, even if they only have access to the messages sent directly to them. Results 
also support policymakers in better understanding advocacy organization summaries of 
messages and directly analyzing any additional message data that they may receive. 
Policymakers, unlike advocacy organizations and service providers, are, in fact, uniquely 
situated to have immediate access to messages sent from multiple audiences. Using 
methods in this dissertation, policymakers may gain insight into the strength of different 
lobbies advocating differing opinions. 
1.1.2. Exploration: Membership 
Nonprofit contributions have grown more than 10%, on average, every year since 2012 to 
over $34B in 2018 (Nonprofits Source 2018). Giving Tuesday raised $380M in one day 
for nonprofits in 2018 and $511M in 2019 (Giving Tuesday 2019). Of all nonprofits, 
environmental advocacy organizations led the group of organizations with the largest 
increases in contributions in 2018 (Nonprofits Source 2018). For this study, membership 
signifies a monthly or annual financial contribution commitment. 
9 
 
During the course of summarizing data and investigating pronouns, this study 
noticed the dictionary of swear words, negative words, long words, and punctuation in 
the LIWC program could also potentially predict engagement. At this same time, this 
study began looking at membership in addition to the number of messages a person sends 
as an indicator of organizational engagement, where membership indicates a financial 
contribution and commitment. These factors, in conjunction with the knowledge of the 
importance of personal stories, inspired a series of explorations to investigate what 
relationships additional text analyses can reveal about membership. These explorations 
(1) developed and tested rule-based linguistic regular expressions to search for words and 
phrases indicative of personal stories with input from a campaign manager that 
professionally reads and searches for personal stories, (2) assessed the sentiment of 
messages with the well-established rule-based Valence Aware Dictionary and sEntiment 
Reasoner (VADER) tool, built specifically for looking at short online messages, and (3) 
assessed the complexity of messages as a function of syllables per word and words per 
sentence with the popular Flesch reading easy model. The Flesch model provides a 
readability score tied to an education level that a reader might need to comprehend a 
piece of text. Metrics from these three assessments were then piloted as predictors of 
membership. Results show that looking at patterns of words ? built out from a 
foundation of phrases centered around lived experiences ? can better indicate 
organizational membership than looking at the rate that contacts use words from LIWC 
dimensions alone. They also show that sentiment and the ease of messages for people to 
read at different grade levels can also help identify non-members and members. 
10 
 
1.1.3. In the Words of Advocacy Data Analysts and Product Managers 
An analyst at one environmental nonprofit organization agrees that measures and metrics 
commonly calculated in text analysis may be piloted as predictor variables to 
engagement. They are interested, first, in natural language processing (NLP) to 
summarize message length and personal stories. They point out, for this study, ?we 
currently do not have the capacity or skills in house to do NLP but have a high degree of 
interest in personal messages and how they relate to engagement.? Further, ?NLP gives 
us a view into this data that we don?t have and with resulting distribution or segmentation 
from different NLP analysis, we could run tests on those audiences to see how their 
engagement differs. If this is effective in future targeting and activist engagement, we 
would also have a solid evidence for more organizational investment in NLP, modelling 
tools and skill sets working with [organization] data? (2018). 
Parul Sharma, Associate Product Director of an online advocacy system at the 
Sierra Club called AddUp (2019), points out that AddUp currently recommends action 
steps to users based upon the user?s location and the user?s last action, but she wants to 
know if message content factors can play a role in making recommendations and giving 
users ?a more personalized journey.? She wants to know ?what types of issues do people 
want to write personal messages for? and ?is there a common theme around types of 
issues vs. sentiments.? She wants to know if content analysis can help predict if 
individuals ?are at a ?pre-member? stage? Likely will give donations or become 
volunteers ? or in the future, will become event organizers or creators?? If, for example, 
AddUp could recognize (a) personal stories and (b) writing styles indicative of a future 
organizer from an individual?s first contact with the organization, then AddUp could 
11 
 
provide tailored next steps to engage that individual. In the future, from an advancement 
(i.e. fundraising) point of view, writing characteristics could supplement location and 
financial data from services like WealthEngine (2019) in suggesting contribution levels 
to new contacts. 
In summary: Advocacy organizations want to know the feasibility of summarizing 
messages and relating them to other constituent data. They want to know if doing so can 
aid them in encouraging more messages, more sharing, more personal and localized 
prompts, and other higher value actions from their constituents (e.g. attendance, 
membership, leadership). They want to know if they can spot and amplify personal 
stories in messages, and then empower the authors of these stories to support their 
campaigns. 
1.1.4. Advocacy Campaign and Data Flow and Potential Beneficiaries 
This study commenced addressing needs of advocacy organizations and service 
developers, but results and conclusions show policymakers may equally directly benefit 
from it (Section 7.2). Civil and environmental project managers whom policymakers 
share data with will also benefit. Figure 1.1.1 illustrates how these parties work together. 
12 
 
 
Figure 1.1.1 Advocacy Campaign Dataflow Diagram 
 
 Goal, Objectives, and Research Questions 
The goal of this study is to explore relationships between the messages that constituents 
send their policymakers and these constituents? engagement with advocacy organizations 
that provide the systems that enable them to send these messages. It focuses on 
environmental advocacy organizations. It fulfills two main objectives. Objective One, 
answering three originally proposed hypotheses, tests relationships between three 
properties of messages and the number of messages contacts send as a first measure of 
engagement. Those properties are: (a) pronoun usage, (b) personal message rate, and (c) 
message length. Objective Two, within a series of explorations, tests relationships 
between additional text metrics and membership as a second measure of engagement. 
Those additional metrics are based on (a) regular expression searches for personal stories, 
(b) reading ease analysis, (c) sentiment analysis, (d) frequently used words, and (e) 
collections of words. 
13 
 
These objectives should help answer the following research questions: (1) How 
can managers of advocacy organizations and policy offices analyze and categorize 
messages? (2) How can they relate messages factors to organizational engagement 
factors? (3) What methods can they use to explore these relationships and spot trends? (4) 
How can they identify personal stories among personal messages? (5) What baseline, 
text-analysis metrics could be used in CRMs and online tools? Results (Chapter 4 and 
Chapter 5) reveal observations to answer these questions; discussion and conclusion 
chapters (Chapter 6 and Chapter 7) summarize answers. For the first three questions, the 
discussion of results summarizes relationships and methods. Results and conclusions 
emphasize the importance of the number of contacts in groups of individuals on the 
applicability of tests to reveal trends (Section 4.1.2.3, Section 6.3). For the fourth 
question, Exploration Three (Section 5.3) introduces how this study used regular 
expressions in an attempt to identify personal stories and Exploration Seven (Section 5.7) 
and the discussion chapter (Chapter 6) discuss their capabilities. Appendix B lists regular 
expressions. (Regular expressions found some personal messages, but they also found 
other types of messages.) For the fifth question, methods, results, and conclusions 
summarize how simple message analysis, including one that simply counts types of 
messages, establish engagement baselines (Section 3.1, Chapter 5, and Chapter 7). In 
describing future work to develop an engagement model, conclusions discuss theoretical 
and machine learning approaches to identifying engagement predictors (Section 7.3.1) 
and (b) question if problems identified in organizing campaign canvasses offline could be 
present online (Section 7.3.2). 
14 
 
 Objective One Hypotheses: Message Content and Number of Messages 
Contacts Send 
This study begins by investigating three specific questions of interest to environmental 
advocacy organizations given only a minimal table of messages with contact identifiers. 
1. Literature has shown correlations between an individual?s linguistic style and 
their behaviors ? from their ability to succeed in health (Pennebaker 2011) and 
academic programs (Robinson 2013) to selection and categorization of decision 
support simulation models used in mining, public health, water resources, and 
other applications (McHaney et al. 2018). The first hypothesis predicts similar 
relationships exist between the writing styles an activist employs and their 
engagement with an advocacy organization. To test this hypothesis, this study 
uses personal pronouns to identify writing styles and the number of actions an 
activist has taken to indicate organizational engagement. The study can accept the 
hypothesis if relationships exist between average LIWC pronoun scores (pronoun 
rates of use) for groups of contacts who have sent the same number of messages 
and the number of messages that they have sent. 
2. A second hypothesis states that there is a relationship between the number of 
personal messages contacts write and the total number of messages that they send 
(with or without personal comment). For online campaign managers, accepting 
this hypothesis would indicate that target groups that return high rates of personal 
messages are more likely to send additional messages in the future. 
3. A third hypothesis states that there is a relationship between message length with 
the number of messages that contacts send. It tests the questions, do contacts tend 
15 
 
to write more or less often if they also write long messages? A negative 
relationship would akin message words to limited units of person-hours on a 
project, where contacts sending more messages may not have time to write longer 
ones. 
These hypotheses use the total number of messages sent by contacts as a measure of 
organizational engagement for them. In doing so, they disregard the importance of 
personal stories to policymakers and organizations (Congressional Management 
Foundation 2017, Social Change Agency 2017a). They do address, however, the common 
case, reviewed in the literature review (Chapter Two), where congressional staffers 
reduce messages to yay and nay summary piles (Miler 2014), losing personal stories, but 
increasing the value of a contact?s output as a measure of influence. 
 Objective Two Hypotheses: Exploration of Personal Stories, Sentiment, 
Writing Level, Popular Words, Groups of Words, and Membership 
After testing the three initial hypotheses, this study investigates relationships between 
additional text metrics and membership. It tests a general hypothesis: There are 
differences in membership rates between (a) all contacts who have sent personal 
messages (27% membership rate) and (b) groups of contacts who have written messages 
that satisfy text conditions. Text conditions are based on the number of messages contacts 
send, personal stories in messages, message writing complexity, message sentiment, and 
the use of popular words and dictionaries of words in messages. 
In evaluating membership rates for groups of contacts satisfying conditions, this 
study considers 5%, 10%, and 15% membership rate differences from the average 27% 
membership rate for those who have sent personal messages as moderate, strong, and 
16 
 
very strong differences, respectively. A 10% difference above the 27% average 
membership rate for those who have sent personal messages is equal to a strong 37% 
(27% + 10% = 37%) membership rate. This, coincidentally, equates to a 37% increase 
(10% / 27% = 37%). It also equals a 185% increase above the membership rate for those 
who have sent any type of message (with or without a personally authored message), 
which is 13% ( 37% - 13% = 24%; 24% / 13% = 185%). Hypotheses for each text 
condition are significant if the chi-square test p-values for comparing contingency tables 
of observed and expected values are less than 0.01. 
  
17 
 
 LITERATURE & TECHNOLOGY REVIEW 
 Petitions, Slacktivism, Creative Campaigns, and Letters in Between 
Modern advocacy services such as MoveOn, AddUp, SwingLeft, and Change.org are 
coded around a timeless ?embedded recruitment technology? that has contributed to 
historical successes of organizers prior to the internet, such as French Calvinists in the 
1560s and American Antislavery leaders in the 1830s (Carpenter 2016) ? the petition. 
Adopted online, earlier by environmental advocacy organizations than other types of 
advocacy organizations (Su?rez 2009), climate change awareness organizers used 
petitions and letter-writing campaigns to reach global audiences, including, notably, to 
support the 2015 Paris Agreement and the 2014 People?s Climate March leading up to it 
(Jacobs 2016, Avaaz 2015). 
The reach of online advocacy services grew as U.S. home internet use accelerated 
from 0 to 60 percent between the years 2000 and 2010 (Pew Research 2019). At the time, 
MoveOn was a visible example of resistance to the Iraq War. MoveOn?s founders credit 
its growth to their ?realization? of petitions as organizing tools in 1998 (MoveOn 2019). 
At a minimal level, like in-person petition canvases (Parry et al. 2011), online advocacy 
systems recruit members and benefit the organizations that run them (Bhagat 2005). 
While petitions increase organizational engagement, researchers have argued that the 
gains come at the expense of disengaging policymakers, who become overwhelmed by 
impersonal messages. For this reason, researchers have embraced the term, coined by a 
reporter for the act of conveniently sending online communications to a policymaker, 
?slacktivism? (Morozov 2009). Another reporter, White (2010), calls the act 
?clicktivism? in a scathing comparison of online advocacy systems, like MoveOn, with 
18 
 
marketing firms that exchange ?faith in the power of ideas, or the poetry of deeds, to 
enact social change? for unread messages. Miler, in 2014, further provides evidence that 
online messages are one of the least noticed forms of advocacy. She shows that these 
messages are only counted by issue into yay or nay piles by (often low-paid or unpaid) 
congressional staff and, many times, never read at all. Actual stories of lived experiences 
and thoughtful suggestions are lost between pages of faxed form letters. This observation, 
alone, adds a dark significance to this study?s use of the number of messages (personal or 
form) as a measure of organizational engagement in Objective One. Miler shows 
congressional offices are much more likely to notice and respond to constituents who 
they can ?see:? donors, lobbyists, and creative activists. 
Morozov, White, and Miler have emphasized the effectiveness of well-articulated 
and moneyed campaigns over ?slacktivism.? Without money, the resources to be ?seen,? 
however, require luck, ingenuity, or earned ethos. Activist Kristen Mink, in an example 
of luck, was able to give the final push to remove fossil fuel lobbyist Scott Pruitt out of 
the office of the Administrator of the EPA after accidentally running into, and then 
publicly confronting him while her husband recorded the encounter on her phone in 2018. 
The video went viral. In an example of ingenuity, when thousands of daily emails, phone 
calls, and faxes started flooding one Republican senator?s office shortly after the 2016 
presidential election, to the point where the office was no longer able to count the 
messages, ?creative? activists sent their messages as hitchhikers inside pizza deliveries to 
the Congress (Schulz 2017). In an example of earned ethos, Dr. Gerry Galloway, research 
professor at the University of Maryland, serves as an expert to municipalities planning for 
the effects that sea level rise (pers. comm. 2019). The developer of Resistbot, one online 
19 
 
advocacy system, would not question the importance and laudability of these examples, 
but would also name them examples of privilege (Putorti 2019). While Resistbot has, in 
the past, been guilty of overwhelming representative offices with faxes and disengaging 
policymakers, today, it delivers messages over a new electronic system recently created 
for, and advertised by, Congress, called Communicating with Congress (CWC; U.S. 
House of Representatives 2017). Responses by congressional offices to a 2015 survey 
administered by the Congressional Management Foundation (2017) show electronic 
?individualized? messages, like those couriered by Resistbot to CWC, now have greater 
influence on undecided positions than postal letters, editorials, telephone town halls, 
phone calls, lobbyist visits, and form letters. Only (1) in-person visits by constituents, and 
(2) contacts from constituent representatives, are more effective. The new system cuts the 
paper-gap; but the messages still need to be read. 
This literature shows: (1) Petitions and letter-writing campaigns are inherently 
organizing tools that help organizations recruit and engage members. Organizations that 
track the number of messages individuals send, therefore, are collecting one measure of 
organizational engagement. (This study shows that, with text analysis, the words in the 
messages that organizations collect are as important, or more important, than the count of 
the number of messages.) (2) While flooding congressional offices with form letters 
disengages them, Congress has told the Congressional Management Foundation that 
personal, ?individualized? messages sent through the new, more manageable CWC 
system influence their positions on undecided issues more than most other forms of 
communication that they receive. These findings show that Congress can now more 
conveniently run text-analysis on the messages that they receive. While this study focuses 
20 
 
first on the importance of messages to advocacy organizations, as described in the 
introduction, policymakers have access to an equally unique set of messages: messages 
coming from different organizations. 
 Custom, Individual, Personal, Testimony 
The Social Change Agency (2017a) with Sandhu (2017) show that members of the 
European Parliament (MPs) agree with the U.S. Congress. Personal stories are important; 
barrages of impersonal form letters are not. Beyond that, they report, ?digital campaigns 
that truly centre the voices of lived experience have the potential to be groundbreaking. 
However, there is currently little space for those with lived experience to genuinely speak 
to power using their own voices? (2017b). The Social Change Agency calls those with 
lived experiences ?lost.? Once found, beyond using their stories as legal testimony, the 
Social Change Agency tells advocacy organizations that successful campaigns are led by 
those directly affected by them, and suggests ways to putting those directly affected by 
campaigns in organizing and leadership positions. Their findings encourage participatory 
project management, and encourage existing leaders to see themselves as allies of those 
with lived experiences. 
If Arnstien?s ?ladder of citizen participation? (1969) or President Barack Obama?s 
engagement ladder (see Section 1.1) were constrained to only the types of messages 
people send to their representatives, unsigned petitions and form letters without reliable 
contact information would sit on the bottom rung of the ladder. Messages individualized 
with contact information would sit above those, then customized form letters above those, 
then personal messages, which have personally-authored words attached to them, above 
those, then personal stories, which express lived experiences and could be used as 
21 
 
examples in legal testimony, at the top. This dissertation uses these terms, ?messages,? 
?custom messages,? ?personal messages,? and ?personal stories? to describe these 
different types of messages. They are detailed in Section 3.2. 
 Lost Voices and Secret Lives: Personal Stories and Personal Pronouns 
While the Social Change Agency labels individuals with personal stories ?lost voices? in 
the title of their publications (2017a, 2017b), Pennebaker says pronouns, among other 
words, have ?secret lives? in the title of his book (2011). Interesting to this study, 
personal stories use personal pronouns. Further, Pennebaker has shown people who are 
suffering frequently use ?I? words. Although this study does not employ experts to 
manually identify people suffering from a condition, such as asthma from air poor air 
quality, and then test them for their use of ?I? words, people using ?I? words could be 
expressing lived experiences, and this study does test the use of ?I? words as a predictor 
of organizational engagement. Pennebaker, alternatively, shows people who are focusing 
on a task tend to use low rates of ?I? words. He also shows third-person singular 
pronouns like ?he or she? express friends or people held in esteem, while third-person 
plural ?they? pronouns are used by authors to put adversarial parties or parties that the 
author is worried about at a distance from themselves. 
In development and application of the text analysis tool, Linguistic Inquiry and 
Word Count (LIWC 2018), Pennebaker has shown relationships between the use of 
different types of personal pronouns in journal entries and social status. Of interest to 
nonprofit organizations, and likely to the Social Change Agency, he suggests that future 
research could show correlations between the use of personal pronouns and individual 
leadership traits. Robinson (2013) used LIWC to show language analysis of students? 
22 
 
course introductions at the beginning of a semester can be used to predict final semester 
grades. The test of the first hypothesis in this study employs LIWC to count and 
categorize pronouns in personal advocacy messages. It tests if there are relationships 
between the use of personal pronouns and the number of messages people send as a 
measure of organizational engagement. 
 Listening to Power Before Listening to Those Speaking Up to It 
While writers have used ?we? to refer to ?me and you? as well as a first-person group of 
people, in this study, none of them penned pluralis majestatis ? the royal form of ?we,? 
into any of their messages. Researchers, however, love analyzing those today who might 
have used the term long ago. Lenard (2016) uses LIWC to look at gender differences in 
how representatives in the 113th U.S. Congress use pronouns in representing their 
constituents and Jones (2017) does this for Hillary Clinton. Lenard shows male 
politicians use the pronoun ?you? more than female politicians and that no significant 
gender differences exist in the use of other pronouns. She does show, however, all 
politicians frequently use ?I? words in formal addresses. Jones (2017) shows how Hillary 
Clinton from 1992 to 2013 spoke with an increasingly masculine pronoun vocabulary, 
with less and less ?I? words (4.34% to 2.77%) and more and more ?we? words (2.50% to 
3.44%). Jones also illustrates Pennebaker?s (2011) findings on the use of pronouns to 
describe friends vs. adversaries. Jones does this with an extreme example of how 
President Donald Trump references his family and executives with first-person personal 
pronouns, such as ?my,? while he distances himself from ?out-group? parties with the 
article ?the? in referencing ?the gays,? ?the women? and ?the Hispanics.? 
23 
 
Pronoun analyses can be contradictory with each other. While Jones and 
Pennebaker (2011) associate ?I? words with the female gender, outside of politics, Mulac 
et al. (2013) calls them masculine words. In another example, Pennebaker faults 
presidential candidate John Kerry?s speech writers? advice to Kerry to use the first-person 
plural ?we? words in greater frequency (Pennebaker 2011). Pennebaker contends the 
advice led to lower ratings. The speech writers, defending themselves, might have called 
the style inclusive, not nosism. Ruijuan (2010), alternatively, credits President Obama?s 
frequent use of ?we? in conversational patterns along with ?you? words to create an 
?intimate dialog? during a presidential victory speech. For this study, a negative 
relationship between the use of ?I? words and engagement and a positive relationship 
between ?we? words and engagement would support the theory that highly engaged 
political activists speak in the more masculine form that Jones (2017) shows politicians 
leaning toward. (Jones points out President Trump is a notable, sole outlier; he uses first-
person singular ?I? words at very high rates.) 
 VADER and Flesch Ease of Reading Tests 
As described in the introduction, while summarizing results from Objective One, this 
study noticed examples of swear words, negative words, short messages, and minimal 
punctuation from contacts who were not active members. These observations, and advice 
from sociologist Wojciech Sokolowski (pers. comm. 2019), inspired an exploration 
between popular rule-based sentiment and writing complexity. Frame alignment theory in 
nonprofit research (Snow et al. 1986, Sokolowski 1996) supports the idea that individuals 
may adopt the language of organizations that they belong to. If an advocacy organization, 
24 
 
for example, uses positive language in their communications with their members, 
engaged members may also use positive language in writing policymakers. 
Researchers have used sentiment analysis on Twitter data to evaluate project 
acceptance (Ding 2018) and estimate damage of natural disasters (Li et al. 2019). 
Valence Aware Dictionary and sEntiment Reasoner (VADER; Hutto and Gilbert 2014) 
was selected among other sentiment classifiers for its logical, rule-based model. It 
considers a dictionary of words, crowd-validated as positive or negative. It was made 
specifically to evaluate social media messages, which are similar to many of the online 
messages in this study. Its lexicon, openly available to browsing on GitHub, contains not 
only words, but also emojis, emoticons, and netspeak. Unlike network-derived models, 
individual VADER scores are easily explained, and the VADER project authors give 
clear guidance on how to interpret them. VADER reports positive, negative, and 
compound scores. Project authors recommend testing messages on their compound 
scores: Messages with compound scores above or equal to 0.05 are positive; messages 
with compound scores below or equal to -0.05 are negative; other messages are neutral. 
VADER is accessible with Python via the Natural Language Toolkit (NLTK; Bird et al. 
2019). This study pilots the validation of VADER with a random sample of 400 messages 
and six human reviewers. Appendix C describes the validation process. 
 If VADER is the standard for lexical-based sentiment analysis of short 
messages without machine learning, then Flesch (1948) readability tests are the same for 
assessing readability of a passage of text. Flesch ease of reading scores are based on the 
number of syllables per word and the number of words per sentence in a message. As 
shown in Section 5.4, Flesch scores are tied to education grade levels from grade-school 
25 
 
reading levels to college graduate reading levels. The Bureau of Labor Statistics (2015) 
shows that education is the best indicator of citizen volunteer rates among other 
predictors factors, including age, race, marriage, children, and employment. If writing 
grade level is an indicator of education, and if membership is an indicator of 
volunteering, then this study should expect, therefore, people who write at higher grade 
levels to have higher membership rates. 
For policymakers and civil project managers, text analysis of messages sent 
directly to them may tell them different things than messages observed on Twitter. If 
relationships between messages and membership exist, and policymakers have messages 
segmented by communication channel, policymakers may then be able to assess the 
strength of individual advocacy groups delivering the messages in addition to public 
sentiment. For advocacy organizations, relationships could be used to directly address the 
need that they have to suggest next steps to individuals, with minimal information, after 
they send a message. 
  
26 
 
 METHODOLOGY 
 Approach 
This study begins by employing basic, exploratory, deductive research methods to 
achieve Objective One introduced in Chapter 1 (Section 3.3). It tests three hypotheses to 
investigate relationships between the number of messages that contacts send and the 
following message and text metrics: 
1. The use of pronouns 
2. The number of personally authored messages 
3. The length of messages that contacts write  
The total number of messages that contacts send is the first measure of engagement to 
which this study relates text predictors. In conducting these initial tests, tangential, 
incomplete observations of message frequency, message content, and organizational 
membership status of message authors inspire notions that membership rates increase 
with the following message and text metrics:  
1. The number of messages contacts send 
2. Personal stories in messages 
3. Positive message sentiment 
4. Message writing complexity (i.e. writing grade level) 
These conjectures, along with education and volunteering data from the Bureau of Labor 
Statistics (2015), and along with and frame alignment theory (Snow et al. 1986, 
Sokolowski 1996), encourage additional explorations into message predictors of 
membership (Section 3.4, Chapter 5). This study labels these additional explorations as 
Objective Two explorations. It begins these additional explorations by calculating 
27 
 
baseline membership rates (rates of dues paying members) for contacts grouped by the 
number of messages and the number of personally authored messages that they send ? 
two message metrics that require no text analysis. Next, it compares these baseline 
membership rates to membership rates of groups of contacts using lower and higher rates 
of pronouns, personal stories, positive sentiment, and complex sentences, with the 
following tools: Linguistic Inquiry and Word Count (LIWC), expert judgement and 
regular expressions, Valence Aware Dictionary for sEntiment Reasoning (VADER), and 
the Flesch ease of reading test. This study uses Pearson?s Chi-Squared test to determine 
whether membership differences for these groups of contacts are significant. 
From an applied research perspective, conditions that create the largest groups of 
contacts, with the highest membership rate differences from their alternative-condition 
groups, are, alone, the best predictors of membership and candidates for the future 
development of a predictive engagement model. In an applied, methodical, search for 
conditions (i.e. patterns), sans any theoretical basis, this study concludes Objective Two 
explorations with the calculation of membership rates of (a) 10,000 groups of contacts 
using or not using the 5,000 most popular words found in messages and (b) groups of 
contacts using or not using words from each LIWC dimension. 
The study alternates between deductive and inductive reasoning in exploring 
message metrics and text metrics indicative of two types of organizational engagement: 
the number of messages contacts send (Objective One) and organizational membership 
(Objective Two). Observations made while completing Objective One tests informed the 
development of Objective Two tests.  
28 
 
While the applied needs of environmental advocacy organizations, policymakers, 
and advocacy service providers motivate this study, the lack of research in predicting 
organizational engagement from message analysis and text analysis of advocacy 
messages sent directly to policymakers (vs. publicly on social networks) drives the basic 
research goals to describe data and test theories found in related studies. The most 
applied methods that this study employs are the tests of membership rates for groups of 
contacts using popular words and the tests of membership for groups of contacts using 
words in LIWC dimension dictionaries. Future applied research can build on this study?s 
findings to develop organizational engagement prediction models. Table 3.1.1 
summarizes the variables, data, and methods that this study uses, for the two objectives, 
to study the relationships between message predictors and the two measures of 
engagement. Section 3.3 lists the tools that this study uses to accomplish these tasks. 
  
29 
 
Table 3.1.1 Independent Message Predictor Variables and Dependent Engagement 
Variables 
 
Objective One Objective Two: Exploration 
Measure of Number of Messages Contacts Send Membership Rate 
Engagement  
dependent Number of contacts in a group that 
variables are members divided by the total 
number of contacts in that group 
Message and Pronoun Use Rates (Section 4.1) Number of Messages (Section 5.1.1) 
Text Predictors Personal Message Rate Use of Pronouns (Section 5.1.2) 
of Engagement (Section 4.2) Message Length (Section 5.1.3) 
independent Average Message Length Personal Stories (Section 5.3) 
variables ( Section 4.3) Writing Complexity (Section 5.4) 
 E5. Sentiment (Section 5.5) 
(Test results in E6. Popular Words (Section 5.6) 
parenthetical E7. All LIWC Dimensions 
section (Section 5.7) 
references) 
Methods Objective One methods review data, Objective Two methods define 
Overview construct databases, group contacts membership baselines and 
by the number of messages that they significant membership difference 
have sent, top-code contacts who scales, group contacts by predictor 
have sent over 20 messages into a variable rates, calculate membership 
single group, calculate lumped rates for these groups, and compare 
predictor variable values (pronouns these membership rates to baseline 
use rates, personal message rate, membership rates 
average message length) for each 
group, and describe correlations and 
trends to test hypotheses 
 
 
 Data: Terms, Collection, and Database Construction 
This section begins by describing the original data collected by this study and discusses 
how, without access to contact tables, to use contact data in message records. It then 
defines message terms required for understanding subsequent method descriptions in this 
chapter and findings in the results and conclusion chapters. This section, most 
importantly for this purpose, defines distinctions between all messages, custom messages, 
personal messages, and personal stories. Finally, this section describes the data collection 
30 
 
period, schema for database fields required by this study to conduct analyses, and 
guidance on database constructing for future studies. 
3.2.1. Original Data: Contact Data in Message Records  
Original study data include over two million messages and nearly 500,000 originally 
authored messages from over 150,000 individuals across the U.S. (Appendix A). The 
messages support sustainable civil and environmental policies and projects. While many 
messages to public officials are public information, collecting messages requires 
coordination with both organizations encouraging messages and the software service 
providers delivering messages. Messages in this study have been collected by these 
advocacy organizations and service providers through a variety of website services, CRM 
databases, and chatbot reports. 
After being prepared for testing hypotheses, data consist of message and contact 
tables. Message tables contain records of advocacy messages that contacts send their 
policymakers. Message records include both the messages themselves and metadata 
about the messages Metadata include fields for the systems that messages are sent 
through, such as CQ (https://cqrollcall.com/), Capwiz (now obsolete), Salesforce 
(https://www.salesforce.com/), Convio (http://www.convio.com/), Facebook 
(facebook.com/), and custom campaign websites built around CiviCRM 
(https://civicrm.org/). Metadata also include fields for advocacy topics, such as air and 
water quality, climate change, energy, transportation, water supply, wildlife, birds, and 
ivory. Contact tables contain records of people that advocacy campaigns have interacted 
with. Contact data include the online advocacy systems that contacts have used, their paid 
membership statuses, event attendance information, and other demographic information. 
31 
 
With the messages table and the contacts tables, this study can define message and text 
metrics for each message, lump message metrics into contact message metrics, and lump 
contact message metrics into contact group message metrics.  
While this study would have ideally started with two tables, one for messages and 
one for contacts, most contact data was originally stored in message table fields and this 
study extracted it into a contact table via unique contact IDs. Future studies may 
encounter this problem. While it is safe for researchers to assume all contact relationship 
management systems have tables of contacts, contact reports are not always available or 
easily generated by advocacy product managers for privacy, convenience, and system 
compatibility reasons. Some advocacy service providers, for example, do not retain or 
report contact tables for older campaigns at all. Advocacy organizations, additionally, can 
have separate membership management and online advocacy systems for which contact 
IDs are separate. While this study had access to partial information from contact tables, it 
relied on contact metadata stored in message records. 
Retrieving contact information dispersed in multiple message tables requires more 
work than retrieving contact information from a single contact table. It requires more join 
queries and more logic to handle discrepancies. It is also storage inefficient. For example, 
the birthday and gender values that contacts report do not usually change and should 
generally be stored in contact tables, not message tables. This study did notice, however, 
that future work could use some contact data when it is stored in message tables. 
Multiple, timestamped records of a contact membership statuses, for example, could help 
test a temporal engagement hypothesis in the future; e.g. the relationship between 
advocacy message frequency and membership renewal times. Changes in mailing 
32 
 
addresses recorded in an action table, furthermore, could aid future spatiotemporal 
studies, such as the study of relationships between address changes, locations of coal-
fired power plants, and related personal stories in messages. 
Organizations and service providers continually clean their databases to maintain 
data accuracy and consistency. For this study, they have previously both checked, and 
protected input fields that collect, the data used in this study for duplicate contacts, 
human entry errors, and non-humans (e.g. spam bots). Additionally, name fields, phone 
number fields, email fields, complete address fields, and other personally identifiable 
information have been removed prior to the analysis of messages. Finally, some of the 
example messages used to illustrate points in this dissertation (not analyzed data) have 
been modified to preserve the privacy of the message authors. 
3.2.2. Messages Category Terms: Messages, Custom Messages, Personal Messages, 
and Personal Stories 
This study divides messages into three principal categories: (1) not custom and not 
personal messages (1,586,252; abbreviated NOTCORP messages), (2) personal messages 
(491,027; abbreviated PM), and (3) custom messages (122,345; abbreviated as CM). See 
Figure 3.2.1. The first category of messages, not custom and not personal messages, does 
not contain individually authored text. This study, therefore, cannot perform text analysis 
on messages in this category to describe individual authors? writing styles or sentiment. 
Policymakers treat these non-customized, prewritten messages more like petition 
signatures than individual letters. They devalue these messages (Section 2.1).  
The second category, personal messages, contains messages originally authored 
by message senders. Some personal messages, moreover, contain stories of ?lived 
33 
 
experiences? (Sandhu 2017) concerning the effect of environmental issues on the 
authors? lives. This study performs text analysis on all personal messages.  
Finally, the third category, custom messages, contains prewritten messages (i.e. 
form letters) that individuals have edited. Future studies may be able to extract the 
individually written part of custom messages for text analysis. 
While contacts could have customized or attached personal messages to many 
NOTCORP messages before they were sent, data records do not reveal if some 
NOTCORP messages lacked the option for contacts to customize them. In total, data 
consists of 2,199,624 messages ? personal, customized, or not ? which were sent by 
690,631 unique contacts (an average of 3.6 messages per contact). Of these, 194,409 
contacts have sent personal messages. A very small number of personal messages 
(0.015%) in the data have custom messages attached to them. This study categorizes 
these messages as PM for text analysis, not CM. 
Beyond these categories of messages (NOTCORP, PM, and CM), this study 
further categorizes and describes personal messages, both objectively and subjectively, to 
define potential predictors and measures of engagement. With only the single ?message? 
field in message records, which contain words that comprise the body text of each 
personal message, several potential linguistic predictors of engagement and descriptive 
metrics can be derived. First, counting the number of words in this field yields a word 
count (WC) value. Then, with the LIWC tool, this study learns about the percentage of 
words used in each message across LIWC dictionary dimensions, including the pronoun 
dimensions pertinent to Hypothesis One and family dimension pertinent to Objective 
Two. LIWC also calculates words per sentence (WPS). WC and WPS are both objective 
34 
 
message metadata. While the selection of words in LIWC dictionaries are subjective, they 
have been developed and refined by past research (Pennebaker et al. 2015). Given these 
dictionaries, the calculation of LIWC rates are objective calculations, reported as 
percentages of dictionary words in individual messages (from 0% to 100%). These 
calculations enable messages to be compared to each other, lumped into per-contact rates, 
and compared to messages in other corpuses. 
 
Figure 3.2.1 Message Categories 
This study divides 2,199,624 total messages into three top level categories: not custom and not 
personal messages (1,586,252), personal messages (491,027), and custom messages 
(122,345). Personal messages contain subjectively categorized personal stories. 
  
35 
 
3.2.3. Collection Period 
This study collected messages that were sent between July 1, 2017 and October 31, 2018 
for one set of data and from 2012 to 2014 for a smaller set. For the messages that this 
study collected in 2017 and 2018, this study collected personal (PM) and custom 
messages (CM) messages between July 1, 2017 and October 10, 2018 (16 months) and 
other (NOTCORP) messages between July 1, 2018 and October 10, 2018 (four months). 
This study computed LIWC scores for personal messages and used all messages to 
compute the number of messages sent per contact as a measure of engagement. The 
results for testing Hypothesis One describe the effects of limiting the analysis of 
messages to those sent during the personal messages time period and excluding 
NOTCORP messages completely. 
3.2.4. Database Schema 
Messages Table 
After preparing data for analysis, the messages table contains the fields shown in Table 
3.2.1 necessary to first test the three hypotheses defined by Objective One and, second, 
test the relationships defined by Objective Two explorations. 
  
36 
 
Table 3.2.1 Messages Table 
 
Field Description 
Message ID Unique, primary key (integer) 
Contact ID Contact key (integer) 
Message Text 
Message category Personal, custom, not personal or custom 
enumerated values (enum) 
VADER Score  Sentiment score from -1 to 1 (float) 
Flesch Reading Ease Score Score (float) 
LIWC Rates Percentages of words (float) 
? Pronouns   
? Personal Pronouns 
o First-Person Singular ?I? 
Pronouns  
o First-Person Plural ?We? 
Pronouns 
o Second-Person ?You? 
Pronouns 
o Third-Person Singular 
?He/She? Pronouns 
o Third-Person Singular ?They? 
Pronouns 
? Impersonal Pronouns 
 
The Contact ID field identifies unique contacts. It is a necessary field for summarizing 
message data for individual contacts, which, in turn, is necessary for summarizing groups 
of contacts who have sent the same number of messages. The message category field is 
necessary to determine which messages have personal messages attached to them so that 
this study can determine linguistic properties authentic to their authors. (Note: While 
message category is expressed here as a single field, in many study calculations, message 
category was expressed with multiple category-named tables fields for query 
convenience, efficiency, and readability.) This field is also essential to testing Hypothesis 
Two ? the relationship between personal and all messages. Although Objective Two 
methods look at relationships between each LIWC pronoun dimension and engagement, 
the test of Hypothesis One only required usage percentages for the personal pronouns. 
37 
 
Contacts Table 
After preparing data for analysis, the contacts table contains the fields in Table 3.2.2. 
Table 3.2.2 Contacts Table 
 
Field Description 
Contact ID Unique, primary key (integer) 
Member Boolean field describing if a contact has ever been a 
member at the time of sending a message 
Number of Personal Messages Number (int) 
Total Number of Messages Number (int) 
Average LIWC Rates Average percentages of words (float) 
? Pronouns   
? Personal Pronouns  
o First-Person Singular ?I? 
Pronouns  
o First-Person Plural ?We? 
Pronouns 
o Second Person ?You? 
Pronouns 
o Third-Person Singular 
?He/She? Pronouns 
o Third-Person Singular 
?They? Pronouns 
? Impersonal Pronouns 
 
In addition to the fields for the contacts table listed in Table 3.2.2, the following fields 
contribute to calculation checks: 
? Number of custom messages (int) 
? Number of not personal or custom messages (int) 
The Contact ID field is a unique ID necessary to relate contacts and messages. The 
Member field is a Boolean field indicating if the contact has ever sent a message during 
the study time period while having an active membership status, where a contact?s 
membership status is active for a year after paying membership dues for it. The two 
number of message fields (personal, total) are group summaries of message data 
described in the message table (Table 3.2.1). Like the number of message fields, average 
LIWC rates are summaries of data from the contact table. The number of message fields 
38 
 
and the average LIWC rate fields speed up analysis, especially for large sets of messages 
(>100,000), but may be replaced with SQL queries to join message table data to contact 
data. 
Additional Fields 
Beyond the essential fields listed above, other fields were available in the messages table 
from the data that were not used. They could be used in future work as segments, drivers, 
and measures of engagement: message type (e.g. petition), message issue (e.g. energy 
policy, ivory, plastic straws, whales), contact gender, contact device (mobile or not), 
advocacy system (e.g. CQ, Convio, etc.), membership level (e.g. student, limited income, 
standard, big donor). Note that most contacts in this study?s data were classified at the 
standard membership level. (See the introduction to Chapter 5 for more information 
about membership levels.) 
3.2.5. Database Construction: Creating Message and Contact Tables 
As described above, this study did not begin with the ideal messages tables and contact 
tables needed to test hypotheses showing the relationship between message attributes and 
the number of messages that contacts send as a measure of organizational engagement; 
original data were composed of message records, with contact metadata inefficiently 
attached to these message records. Further, data were split across several files and fields 
were untyped1 in comma-separated-value (CSV) text files. The first step this study took 
was to combine files with similar fields in a database. In doing so, it assigned types and 
 
 
1 That is, initial data were not composed of structured data formats with types (e.g. integers, dates, 
text, etc.), like they are in a database. 
39 
 
keys to database fields. To do this, Excel and the following Python packages were used: 
Sys, Os, Pandas, Numpy, Datetime, Pymysql, Sqlalchemy, and Scipy. 
First, ten CSV files containing personal (PM) and custom message (CM) records 
and four files containing other (NOTCORP) tables were manually converted into 
standard Excel XLSX format by opening the CSV files in Excel, removing extraneous 
summary text at the end of each file, and cleaning up the first row of each spreadsheet to 
make sure each first-row cell was correctly labeled as a field name with respect to its 
column?s rows. During this process, Excel automatically guessed and saved the type of 
each field. It correctly recognized and typed most dates, numbers, and text. As expected, 
it did not assign more specific information to fields, like floating point precision values to 
numbers or character sizes to text. 
Next, this study imported data into Python Pandas objects for exploration and 
conversion to MySQL tables via the Sqlalchemy Python package. (For comparison, 
creating Pandas objects directly from CSV files with the Pandas read_csv module 
worked, but did not automatically type data as well as preprocessing the data in Excel.) A 
Python script iterated through each Excel file and added each message record into one of 
two MySQL tables ? one for personal and custom messages (coded CMPM) and one for 
other messages (NOTCORP). MySQL was then used to confirm the uniqueness of 
message IDs. Queries to count unique message and contact IDs revealed that IDs from 
some sources were case sensitive, so this study set the collation2 of these keys 
 
 
2 For an introduction to database collations, see Chapter 10 of the MySQL manual: ?Character 
Sets, Collations, Unicode? (https://dev.mysql.com/doc/refman/8.0/en/charset.html). 
40 
 
accordingly. Additional database instructions then set primary and unique indices on both 
the message and contact ID fields of these message tables. 
This study then grouped messages by Contact ID across both CMPM and 
NOTCORP message tables with the MySQL GROUP BY command to create a unique 
contact table with summary contact fields. It used MySQL functions COUNT, AVG, 
SUM, and IFNULL to calculate summary metrics like the sums of all messages, personal 
messages, and customized messages in the contact table. For each contact, this study set 
contact gender to the last value of gender reported by the message table. A concatenation 
function created a field of all personal message text for each contact, which acted later as 
a calculation check throughout the analysis (e.g. double-checking pronoun use rates 
among message words for contact outliers using high and low rates of pronouns). 
After this study created the contact table, this study used LIWC to assign word 
count rate scores (i.e. percentage pronoun words per message) and summary scores (e.g. 
total word count, words per sentence) to each individual personal message. To do this, 
Python scripts were used to tabulate message IDs and corresponding message text from 
the database into an Excel file for LIWC. LIWC made a copy of this table and appended 
its linguistic scores to each record. Python was then used in a reverse operation, to import 
the scores into a new database table. JOIN and GROUP BY functions were then used to 
calculate the average and standard deviation of scores for each unique contact. In 
retrospect, LIWC scores could have been calculated before contact tables were created to 
shorten the procedure described thus far, but isolating the linguistic analysis both limited 
potential memory problems and avoided any problems of inputting irrelevant data into 
the LIWC program (e.g. contact id, contact metadata, message topic, etc.). 
41 
 
3.2.6. Example Database Construction 
Meet Lisander Snodgrass, Bowser Lemans, Guybrush Gilbert, and ?Whale Lover? Gene. 
They are four fictitious environmental advocates who have sent messages to their 
congressional representatives in two different campaigns, reported by two different 
advocacy systems in Table 3.2.3 and Table 3.2.4. Using their messages, this section 
provides a simplified example of preparing data for analysis through the construction of a 
database containing a combined messages table and containing a derived contacts table 
(Table 3.2.5, Table 3.2.6). Throughout this example, table rows related to Lisander 
Snodgrass have been highlighted to emphasize database operations and calculations. 
While the non-fictitious study in this dissertation began with a message table 
already prepared, containing unique message IDs, unique contact IDs, and personally 
identifiable information removed, researchers repeating this study with their own data 
may need to prepare data themselves and make assumptions to distinguish unique 
contacts. In this example, source Table 3.2.3 contains fields for message ID (ID), 
message, contact ID (CID), email, and member. Source Table 3.2.4 only contains 
message, email, and member fields. 
  
42 
 
Table 3.2.3 Example Source Messages 
ID is message ID; CID is Contact ID 
 
ID Message CID Email Member 
1 Thank you for your support 1 Lisander.Snodgrass22@example.com Yes 
2 NULL 2 Bowser.LeMans@thimbleweed.pl Yes 
3 NULL 1 Lisander.Snodgrass22@example.com No 
4 We are affected by this issue 3 Guybrush.Gilbert@mymonkeytownusa.me Yes 
 
Table 3.2.4 Example Source Messages 
 
Message Email Address Member 
Save the whales! Lisander Snodgrass No 
I love whales! Whale Lover Gene No 
 
 
Future studies may be forced to use contact identifiers like email, phone number, full 
name, and address to distinguish individuals and eliminate duplicate contact records. In 
this example, Table 3.2.4 contains no contact ID field, so email is the best contact 
identifier for these two tables. Even if unique contact IDs are given, if one other unique 
identifier is given, COUNT and DISTINCT commands should be used to check table 
consistency between the contact identifiers. Database operations on these tables yield a 
combined messages table (Table 3.2.5). 
  
43 
 
Table 3.2.5 Example Combined Messages Table 
This message table, along with contact Table 3.2.6, are the results of combining message source 
tables (Table 3.2.3 and Table 3.2.4). 
 
 
Source 
ID Message Type CID Table Name 
1 Thank you for your support Personal 1 1 Lisander Snodgrass 
2 NULL Not Personal 2 1 Bowser LeMans 
3 NULL Not Personal 1 1 Lisander Snodgrass 
4 We are affected by this issue Personal 3 1 Guybrush Gilbert 
5 Save the whales! Personal 1 2 Lisander Snodgrass 
6 I love whales! Personal 4 2 Whale Lover Gene 
 
 
Table 3.2.6 Example Derived Contacts Table 
This contacts table, along with messages Table 3.2.5, are the results of combining message 
source tables (Table 3.2.3 and Table 3.2.4). 
 
Not Personal Personal 
ID Messages Messages Messages Member Name 
Lisander 
1 3 1 2 1 
Snodgrass 
2 1 1 0 1 Bowser LeMans 
3 1 0 1 1 Guybrush Gilbert 
4 1 0 1 0 Whale Lover Gene 
 
 
In this database construction example, note that there are no customized messages (CM) 
and that ?not personal messages? are similar to NOTCORP messages found in this 
study?s actual data (Table 3.2.5). Also notice that the resulting contacts table (Table 
3.2.6) categorizes Lisander Snodgrass as a member because the original message tables 
report him as paying for membership dues at least one time, even though it also reports 
him as a non-member one time. (This study categorizes contacts among the actual, non-
44 
 
fictitious data in this same way ? labeling a contact as a member means that the contact 
has been a member at least once during the study period.) 
After this study constructs message and contact tables similar to example tables 
(Table 3.2.5 and Table 3.2.6), it is ready to continue analysis by calculating and storing 
LIWC, VADER, and Flesch scores as message fields in the database. Once this is 
complete, contacts and groups of contacts are ready to be queried by predictor variables 
and the two measures of engagement (?messages? and ?members? fields) in the contacts 
table. The following sections, describing Objective One, continue this example. 
 Objective One Methods: Relationships Between Messages and the Number of 
Messages that Groups of Contacts Send 
Objective One focuses on relating three message metrics (pronoun use rates, personal 
message rates, and the average message length) to the number of messages that contacts 
send. This section describes the methods that this study uses, considerations for lumping 
text metrics for contacts and groups of contacts, and example calculations. It contains the 
following subsections: 
1. Hypothesis One: Personal Pronouns (Section 3.3.1) 
2. Hypothesis Two: Personal Messages (Section 3.3.2) 
3. Hypothesis Three: Message Length (Section 3.3.3) 
4. Example Objective One Calculations (Section 3.3.4) 
3.3.1. Hypothesis One: Personal Pronouns 
Procedure 
To begin relating pronoun use rates with the number of messages that contacts send, 
using the database of message and contact tables described above, this study first counts 
45 
 
the use of personal pronouns in every personal message. It then counts the total number 
of words in every personal message. To do this, this study, 
1. Creates message table fields for each of the eight LIWC pronoun rate dimensions 
a. All pronouns 
b. All personal pronouns 
c. First-person singular ?I? pronouns (e.g. ?I,? ?I?ve,? ?me?) 
d. First-person plural ?we? pronouns 
e. Second person ?you? pronouns 
f. Third-person ?she/he? pronouns 
g. Third-person ?they? pronouns 
h. All impersonal pronouns (e.g. ?it,? ?that,? ?there?) 
2. Creates a message table field for word count 
3. For each message record, and for each of the eight LIWC pronoun dimensions, 
counts and stores pronoun use rates (words per message) and all words in the 
fields created in step one and two in this list, above 
Note: During this process, for exploratory purposes not necessary for answering 
Hypothesis One, this study creates fields and records scores for all other LIWC 
dimensions (e.g. rates of swear words, positive-sentiment words, punctuation, etc.). See 
Pennebaker et al. (2015) for a complete list of LIWC dimensions. 
 This study aims to understand relationships between the use of pronouns by 
contacts with the number of messages that these contacts send. To this end, it, 
1. Creates contact table fields for, and calculates values for, lumped pronoun metrics 
and lumped word counts metrics with database join and math statements 
46 
 
a. Average contact LIWC rate, weighted by message 
b. Contact LIWC rate, for all contact messages 
c. Average message word count, weighted by message 
d. Total word count of all contact messages 
2. Groups contacts by the number of messages that they have sent and calculates, for 
each of these groups, average, minimum, and maximum LIWC rates and word 
counts 
3. Top-codes groups of contacts who have sent more than 20 messages 
4. Plots lumped contact group LIWC rates calculated in step five of this procedure 
against the number of messages contacts all sent in each group 
5. Calculates Pearson correlations between the lumped contact group LIWC rates 
and the number of messages contacts all sent in each group 
Finally, this study compares the resulting relationships and correlations between group 
average LIWC rates and the number of messages groups of contacts send. 
Lumping Text Metrics: Details and Rationale 
This study lumps pronoun and word count text metrics by groups of contacts for contacts 
who have all sent the same number of messages. It first averages text metrics in 491,027 
personal messages for each of the 194,409 contacts who have sent at least one personal 
message. Second, this study groups these contacts by the total number of messages that 
they have sent. It top-codes contacts who have sent more than 20 messages into a single 
group. In this second grouping, text-metrics are once again averaged, this time by 
contact. Top-coding contacts mitigates problems of high pronoun rate variability in small 
47 
 
groups of contacts when calculating averages, viewing plots, and calculating correlations 
described in Section 3.3.1. 
Top-coding contacts to test all three hypotheses in objective one, and explore 
membership in objective two, consists of grouping contacts who have sent more than 21 
messages into a single group and calculating lumped message and text messages for 
them. Results reported in Section 4.1.2.3 show the importance of top-coding contacts. 
Table 3.3.1 compares this method to two other methods. Instead of observing 
relationships between groups of contacts (Column C), this study could have observed 
relationships for either the series of personal messages (Column A) or the series of 
contacts who sent personal messages (Column B). 
Table 3.3.1 Three Methods to Relate Text Metrics to the Number of Messages that Contacts 
Send 
Method A. Message Based B. Contact Based C. Group Based 
Series Length and 491,027 Messages 194,409 Contacts 21 Groups of 
Unit Contacts 
Predictor Fields Pronoun rate, word count Contact-lumped Group-lumped 
pronoun rate  pronoun rate 
Engagement Total number of messages sent Total number of Total number of 
Variable by related contacts messages sent messages sent 
 
The message-based method described in Table 3.3.1 would entail creating a message 
table field for, and assigning values to each message record for, the total number of 
messages sent by each message?s author identified by contact ID. Then, this study could 
compare personal message text metrics (pronoun rate and word counts) to the total 
number of messages sent by personal message authors for each of the 491,027 personal 
messages. This message-based method is problematic in addressing Hypothesis One of 
Objective One. Objective One is interested in the linguistic styles of individual contacts 
48 
 
and this message-based method does not equally consider the linguistic styles of 
individual contacts. Instead, it equally considers the linguistic styles of individual 
messages. Moreover, this message-based method considers very short messages and very 
long messages be equally reflective of individual contact linguistic styles, while longer 
messages are in fact better indicators of linguistic styles.  
The contact-based method described in Table 3.3.1 addresses the problems of the 
message-based method by computing average text metrics per contact. Using this 
method, this study can compare lumped text metrics for each of the 194,409 contacts to 
the total number of messages that they have sent. The quantity of contacts and potentially 
high ranges of pronoun rates in this method, however, could yield hard to read, densely 
covered, plots of average linguistic scores for each contact vs. the number of messages 
each contact has sent. (Section 5.2 further explores this case.) The group-based method 
that this study used, described in Table 3.3.1, addresses the problems with the contact-
based method by summarizing text metrics for each group of contacts with average text 
metrics. Trends are easier to identify in plots of 21 group pronoun rates than in plots of 
194,409 individual contact pronoun rates. (Section 5.2, along with membership data, 
explores trends for ungrouped contacts.) 
Short Example for Calculating Average Pronoun Rates for Groups 
In a hypothetical sample of two contacts, for the category of all personal pronouns, if the 
first contact sent five messages, two personal and three petition-style messages, with the 
two personal messages having 3% and 5% rates of personal pronouns, that first contact 
has an average personal pronoun rate of 4%. If the second contact also sent five 
messages, but all personal, and with the rates of personal pronouns of 4%, 5%, 6%, 7%, 
49 
 
and 8%, that contact has an average personal pronoun rate of 6%. Given these are the 
only two contacts in this short example, an example figure of group pronoun rates would 
show a single point at 5% personal pronouns (ordinate) for contacts who have sent five 
messages (abscissa): ( (3% + 5%) / 2 + (4% + 5% + 6% + 7% + 8%) / 5 ) / 2 = 5%.  
Averaging message metrics weighted by contact gives each contact an equal influence 
over a contact group average regardless of the number of messages that they send. An 
unweighted average across all messages, alternatively, gives contacts who send more 
personal messages more influence on the average. The unweighted average of all 
messages in this example would be influenced more by the second contact, who sent five 
personal messages, than the first contact, who sent two messages. The unweighted 
average of 3%, 5%, 4%, 5%, 6%, 7%, and 8% is 5.4%. While this unweighted average 
can summarize messages of a group of contacts, it does not directly address the objective 
of this research to study individual contacts. Contact-weighted averages, conversely, both 
directly describe contacts and can be compared to values in literature that also do so. 
These contact-weighted averages also soften the effects of outlying contacts who may 
send a high ratio of personal messages to other messages, with an unusually high or low 
percentage of words in a particular category. 
Hypothesis One Method Development: Spreadsheet Limitations, Top-Coding, 
and Software 
This study initially used Excel spreadsheet pivot tables to pilot the processes of grouping 
contacts by the total number of messages sent, and plotting LIWC group average rates 
against this metric. This worked, but slowly, and due to software constraints, only for the 
set of personal messages (vs. all messages) and only for summarizing a partial number of 
50 
 
LIWC scores per database sheet. Results from this precursory analysis showed the 
personal pronouns ?we? and ?you? were related with the number of personal messages 
sent, but did not account for all messages (NOTCORP messages) sent. This spreadsheet 
process required creating new pivot tables for each pronoun metric to avoid spreadsheet 
software and memory limits. (Rendering computation results as spreadsheet views is 
slower and requires more memory than storing values in a database result objects.) In 
addition to looking at personal pronouns, average LIWC dimensions were plotted against 
the groups of contacts who have sent the same number of messages. These plots showed 
a potential negative relationship between perceptual words ( ?see,? ?hear,? and ?feel?) 
and the number of messages sent by contacts. Although no other trends were found, 
spreadsheet plots revealed some high ranges of text metrics for groups of contacts who 
sent high numbers of messages. The spreadsheets showed the number of contacts in 
groups of contacts who sent high numbers of messages are low, and this inspired the top-
coding methods that this study ultimately used. 
To more efficiently test Hypothesis One, to check the initial spreadsheet 
observations, and to begin exploration into the correlations between the number of 
messages contacts sent and all LIWC dimensions, this study used Anaconda, a collection 
of data science Python packages. It used Spyder to create a Python script to rapidly plot 
LIWC dimensions against groups of contacts who have sent the same number of 
messages. This script relied on Pandas, a data analysis library, and Matplotlib, a plotting 
library. Next, this study used Orange3 to review summaries of these data in several ways: 
Orange3 automated initiating Python scripts and SQL queries. Orange3 filtered results by 
group size; specifically, before this study ultimately top-coded groups of contacts who 
51 
 
sent more than 20 messages, it used Orange3 to filter our groups of contacts with small 
numbers of contacts in them. Orange3 plotted correlations and calculated correlation 
coefficients for each LIWC dimension and for each group of contacts who have sent the 
same number of messages to their policymakers. Orange3 tabulated the correlation 
coefficients for every group average LIWC dimension and the number of messages sent 
by each group. Both the graphs created in Spyder with Pandas and Matplotlib, and the 
graphs created in Oranage3, confirmed the preliminary results from the spreadsheet 
analysis for personal messages only, and for the larger sets of NOTCORP messages. The 
following chapter reports results. This study used the same tools that were used to test 
Hypothesis One test Hypothesis Two and Hypothesis Three. Section 3.5 lists all tools and 
their website addresses. 
3.3.2. Hypothesis Two: Personal Messages 
The originally proposed procedures for testing Hypothesis Two prescribed 
1. Calculating, for each contact who sent at least one personal message 
a. The number personal messages that they sent 
b. The ratio of the number of personal messages that they sent to the number 
of messages that they sent that did not have a personal message attached to 
them 
2. Plotting the number of messages sent by each contact against the two potential 
predictor metrics calculated in step one 
This procedure produced plots that hinted at relationships, but revising the procedure to 
average the number of personal messages sent and average the ratio of personal messages 
sent for groups of contacts who have sent the same number of messages, with the same 
52 
 
groups used to test Hypothesis One, revealed a clearer picture. Following final 
procedures to test Hypothesis Two, this study 
1. For each contact, calculates 
a. The number of personal messages sent by the contact 
b. The number of total messages sent by the contact 
c. The rate of personal messages sent by the contact 
(the ratio of the two previous calculations) 
2. For groups of contacts who have sent the same number of messages, calculates 
a. The average number of personal messages sent for all group contacts 
b. The average rate of personal messages sent for all group contacts 
3. Plots the two, group metrics calculated in step two, above, against the number of 
messages sent by contacts in each group 
4. Compares the average rate plots with consideration to the number of contacts in 
each group 
3.3.3. Hypothesis Three: Message Length 
To test Hypothesis Three, the relationship between the length of each message and the 
number of messages sent, this study uses the same groups used for testing hypotheses one 
and two. This study 
1. Uses word count as a measure of message length 
2. Plots average word counts for groups of contacts against the number of messages 
sent by contacts in each group (both variables calculated during the test for 
Hypothesis One) 
53 
 
3. Compares average word count points in the plot created in step two with 
consideration to the number of contacts in each group of contacts who have sent 
the same number of messages 
3.3.4. Example Objective One Calculations 
Given example message and contact tables (Table 3.2.5 and Table 3.2.6), derived from 
source message tables (Table 3.2.3 and Table 3.2.4) at the end of Section 3.2, methods to 
test Hypothesis One for the example data, with no top-coding, for the count of all 
pronouns (a single, example LIWC dimension), yield the example message and contact 
tables, Table 3.3.2 and Table 3.3.3. Table 3.3.2 shows the field for the count of all 
pronouns for example calculation purposes only. The non-fictitious study database does 
not contain this field, but it can be back-calculated from the word count field and the 
LIWC pronoun rate field. 
  
54 
 
Table 3.3.2 Example Message Table with Pronoun Rate and Word Count Fields 
The rows for Lisander Snodgrass are highlighted to emphasize the calculation of lumped metric 
rates. 
LIWC 
Pronoun 
Count of All Word Rate 
Source Pronouns Count (% Example 
ID Message Type CID Table (pronouns) (words) pronouns) Name 
Thank you 
2 50% Lisander 
1 for your Personal 1 1 4 
(you, your) (2/4) Snodgrass 
support 
Bowser 
2 NULL NOTCORP 2 1 NULL NULL NULL 
LeMans 
Lisander 
3 NULL NOTCORP 1 1 NULL NULL NULL 
Snodgrass 
We are 
affected 1 16.66% Guybrush 
4 Personal 3 1 6 
by this (we) (1/6) Gilbert 
issue 
Save the 0% Lisander 
5 Personal 1 2 0 3 
whales! (0/3) Snodgrass 
Whale 
I love 1 33.33% 
6 Personal 4 2 3 Lover 
whales! (I) (1/3) 
Gene 
 
  
55 
 
Table 3.3.3 Example Contact Table with Lumped LIWC and Word Count Fields 
The row for Lisander Snodgrass is highlighted to emphasize the calculation of lumped metric 
rates. 
 
7 
25% 3.5 = 
* (4 + Lisander 1 3 1 2 1 avg(50%, 28.57%  avg(4, 
NULL Snodgrass 
NULL, 0%) NULL, 3) 
+3) 
2 1 1 0 1 NULL NULL NULL NULL Bowser LeMans 
Guybrush 
3 1 0 1 1 16.66% 16.66% 6 6 
Gilbert 
Whale Lover 
4 1 0 1 0 33.33% 33.33% 3 3 
Gene 
 
* Lisander Snodgrass?s average pronoun rate for all messages, weighted by messages length, is 
equal to the sum of all pronouns that Lisander used divided by the sum of all messages words 
that he wrote, calculated for this example as (2+NULL+0) / (4+NULL+3) = 2/7 = 28.57%, or, as 
calculated in the non-fictitious study database with total word counts and pronoun rates, as (4 
words * 50% pronouns / word + NULL + 3 words * 0% pronouns / word) / (4 words + NULL + 3 
words) = (4*0.5 + 3*0.0) /(3+4) = 2/7 = 28.57%. 
 
  
56 
ID 
Messages 
Not Personal Messages 
Personal Messages 
Member 
Average Pronoun Rate 
Weighted By Message 
Average Pronoun Rate 
for All Messages 
Average Word Count 
Total Word Count for All 
Messages 
Example Name 
 
Table 3.3.4 show the four contacts from Table 3.3.3 lumped into groups of contacts who 
have sent the same total number of messages. Notice the null values are ignored in 
average functions, effectively summarizing the group average pronoun rates and group 
average word count rates for contacts who have sent personal messages (Guybrush 
Gilbert and Whale Lover Gene). 
Table 3.3.4 Example Contact Groups 
Number of Group Average Group Average 
Messages Pronoun Rate for Word Count for 
Sent by Personal Contacts who have Contacts who have 
Group Group Size Each Message Sent at Least one Sent at Least on 
ID (Contacts) Contact Rate Personal Message Personal Message 
1 
66.66% 3.5 = 
1 (Just 3 25% 
2/3 avg(4, NULL, 3) 
Lisander) 
3 
25% 
(Bowser, 66.66% 4.5 
2 1 avg(NULL, 
Guybrush, (0+1+1)/3 avg(NULL,6,3) 
16.66%,33.33%) 
and Gene) 
 
 
In this example, for Hypothesis One, Table 3.3.4 reveals that there is no difference 
between the number of messages sent by contacts who have sent at least one personal 
message and the rate that these contacts use pronouns; the single contact who sent three 
messages (Lisander) used pronouns at the same rate as the two contacts who have sent 
one personal messages (Guybrush, and Gene): 25%. For Hypothesis Two, there is also no 
difference between the personal message rate of the single contact (Lisander Snodgrass) 
who sent three messages (2/3 personal) and the three other contacts who sent one 
message each (2/3 personal). Finally for Hypothesis Three, groups of contacts who only 
sent one message in this example sent them with one word, on average, longer than the 
?group? of one contact that sent three messages (4.5 - 3.5 = 1). 
57 
 
 Objective Two Methods: Membership Exploration 
Objective Two explores relationships between message and text metrics with a second 
measure of engagement, membership. Methods and results of these explorations are 
written together, in the Chapter 5. Objective Two begins by reviewing relationships 
between membership and the message and text metrics used in testing objective one 
hypotheses (Section 5.1.1). Next, for ungrouped contacts, Objective Two calculates 
correlations between the use of LIWC dimension words and (a) membership, (b) the 
number of messages that contacts have sent, and (c) the number of personal messages 
that contacts have sent (Section 5.1.2). Finally, it reports membership rates for contacts 
grouped by conditions defined by terms used to search for personal stories (Section 5.3), 
writing complexity defined by the Flesch reading ease test (Section 5.4), sentiment 
defined by the VADER sentiment classifier (Section 5.5), popular words among all 
personal messages in this study (Section 5.6), and words in all LIWC Dimensions 
(Section 5.7). 
Appendix B, Appendix C, and Appendix D support reproducing procedures and 
building on these methods. Appendix B describes and lists MySQL search queries and 
regular expressions used by this study in attempts to find personal stories in messages. 
Appendix C reports methods and results of validating VADER for advocacy messages. 
Appendix D suggests methods for validating the classifying personal stories in messages. 
 Tools 
Table 3.5.1 lists the analysis tools, development environments, and Python libraries that 
this study uses to develop databases, code methods, analyze data, and visualize data. 
Table 3.5.2 lists software platforms and node (https://nodejs.org/en/) packages that this 
58 
 
study uses to collect data during the validation of VADER with human reviewers, 
described in Appendix C. These platforms and packages may also be used in future work 
to prototype advocacy services. Table 3.5.3 lists the programming languages that this 
study uses to accomplish these tasks. 
Table 3.5.1 Analysis Tools, Development Environments, and Python Libraries 
 
Tool Website Address 
Analysis Tools and Development Environments  
Anaconda https://www.anaconda.com  
Atom https://atom.io/  
Excel https://products.office.com/en-us/excel  
Google Sheets https://www.google.com/sheets  
Linguistic Inquiry and Word Count (LIWC) http://liwc.net/  
MySQL Workbench https://www.mysql.com/products/workbench/  
Orange3 https://orange.biolab.si/  
SPSS https://www.ibm.com/products/spss-statistics  
Spyder https://www.spyder-ide.org/  
VS Code https://code.visualstudio.com/  
Python 3 Libraries  
Matplotlib https://matplotlib.org/  
NLTK https://www.nltk.org/  
NumPy https://numpy.org/  
Pandas https://pandas.pydata.org/  
PyMySQL https://github.com/PyMySQL/PyMySQL  
Scipy https://www.scipy.org/  
Seaborn https://seaborn.pydata.org/  
SQLAlchemy https://www.sqlalchemy.org/  
textstat https://github.com/shivam5992/textstat  
Python Core Libraries  
sys https://docs.python.org/3/library/sys.html  
os https://docs.python.org/3/library/os.html  
datetime https://docs.python.org/3/library/datetime.html  
difflib docs.python.org/2/library/difflib.html  
 
  
59 
 
Table 3.5.2 Platforms and Node Packages for Validating VADER Sentiment and 
Prototyping Services 
Tool Website Address 
Platform  
Apache https://www.apache.org/  
Debian https://www.debian.org/  
WordPress https://wordpress.org/  
Node Packages  
create-react-app https://www.npmjs.com/package/create-react-app  
flesch https://www.npmjs.com/package/flesch  
flesch-kincaid https://www.npmjs.com/package/flesch-kincaid  
gender-detection https://www.npmjs.com/package/gender-detection  
react https://www.npmjs.com/package/react  
sentence-splitter https://www.npmjs.com/package/sentence-splitter  
syllable https://www.npmjs.com/package/syllable  
vader-sentiment https://www.npmjs.com/package/vader-sentiment  
wordcount https://www.npmjs.com/package/wordcount  
 
Table 3.5.3 Study Programming Languages 
Tool Website Address 
ECMAScript 6 https://www.ecma-international.org  
MySQL 8 https://www.mysql.com/  
PHP 8 https://www.php.net/  
Python 3 https://www.python.org/  
SCSS 1.24 https://sass-lang.com/  
 
 
  
60 
 
 RESULTS FOR OBJECTIVE ONE: NUMBER OF MESSAGES 
Results from testing Hypothesis One show that relationships exist between the number of 
messages contacts have sent their policymakers and the average use of all pronouns 
(negative relationship), the average use of first-person plural ?we? pronouns (negative 
relationship), and the average use of third-person singular ?you? pronouns (positive 
relationship) (Section 4.1). Hypothesis Two test results show that the average rate of 
personal messages to general messages for groups of contacts who have sent more than 
one message is higher than the average rate of personal messages for the group of 
contacts who only sent a single message (Section 4.2). Hypothesis Three test results show 
that most groups of contacts send the same number of messages (Section 4.3). The 
following three sections of this chapter report these findings along with intermediary 
calculation results that inspire Objective Two explorations (Chapter 5). Each section 
contains plots of predictor variables (LIWC rates, personal messages rates, and average 
message length) along ordinate axes and the number of messages that contacts have sent 
along abscissa axes. 
 Hypothesis One: Relationships Between Pronouns and the Number of 
Messages that Contacts Send 
4.1.1. Hypothesis One Test Results 
LIWC reports word counts of (1) all pronouns (2) all personal pronouns, (3) first-person 
singular ?I? pronouns (e.g. ?I,? ?I?ve,? ?me?), (4) first-person plural ?we? pronouns, (5) 
second-person ?you? pronouns, (6) third-person singular ?she/he? pronouns, (7) third-
person plural ?they? pronouns, and (8) all impersonal pronouns (e.g. ?it,? ?that,? ?there?). 
Table 4.1.1 and Figure 4.1.1 through Figure 4.1.7 summarize the relationships between 
61 
 
the use of each of these categories of pronouns and groups of contacts who have sent the 
same number of messages (personal or otherwise). Each figure plots the use of pronouns 
as a percentage of words used in personal messages (LIWC score), averaged per contact, 
and then averaged per group of contacts who have sent the same number of messages 
(personal or otherwise). See Section 3.3 for example calculations. 
Table 4.1.1 labels correlations strong for R2 ? [0.7,1], moderate for R2 ? [0.3,0.7), 
and weak for R2 ? [0.2,0.3). For the three LIWC dimensions of all pronouns, all personal 
pronouns, and ?she/he? pronouns, the linear-logarithmic relationships listed in Table 
4.1.1 and plotted in Figure 4.1.1, Figure 4.1.2, and Figure 4.1.6, compensate for a 
positive bias in the linear relationship for large groups of contacts that have sent small 
numbers of messages (one, two, and three messages). 
  
62 
 
Table 4.1.1 Relationships Between Group Average Pronoun Use Rates and The Number of 
Messages Sent by Contacts 
 
 
Correlation Range (%)  
 Pronoun 
LIWC Use Rate 
Dimension R2 (%) Min Max Summary 
1 All 0.77 -0.18 ln 13.6 14.4 A strong, negative, linear-log 
Pronouns [Messages] correlation with a small (0.8%) 
+ 14.1 range exists 
2 Personal 0.51 -0.107 ln 8.5 9.1 A moderate, negative, linear-log 
Pronouns [Messages] correlation with a small (0.6%) 
+ 8.9 range exists 
3 I 0.03 2.3e-3 1.4 1.8 No obvious correlation exists. The 
[Messages] rate of first-person singular ?I? 
+ 1.53 pronouns decreases for the bulk 
of the contacts sending between 
one and four messages from 
1.7% to 1.5%, and then increases 
slowly to 1.6% for the smaller 
groups of contacts sending more 
messages 
4 We 0.87 -0.0311 4.0 3.3 A very strong, negative, linear 
[Messages] correlation with a small (0.7) 
+ 4.05 range exists 
5 You 0.70 0.016 2.2 2.8 A moderately strong, positive, 
[Messages] linear correlation with a small 
+ 2.26 (0.6) range exists 
6 She/He 0.29 -0.0234 ln 0.3 0.5 Over all messages, a weak, 
[Messages] negative, linear-log correlation 
+ 0.371 exists. The rate decreases from 
0.5% to 0.3% from one to five 
messages before remaining at 
0.3% for all numbers of messages 
(except at 18 messages, where 
the rate returns briefly to 0.4%). 
7 They Not calculated; 0.6 0.7 The rate remains close to 
close to constant constant between a very small 
0.6% to 0.7% range 
8 Impersonal 0.55 -0.0118 4.9 5.3 A moderate, negative, correlation 
Pronouns [Messages] with a very small (0.4%) range 
+ 0.553 exists 
 
63 
 
 
Figure 4.1.1 Group Average Use of Pronouns (%) vs. Messages Sent 
Trendline: R2 = 0.765 for [Group Average Use Pronouns] = -0.18 ln [Messages] + 14.1 
 
Figure 4.1.2 Group Average Use of Personal Pronouns (%) vs. Messages Sent 
Trendline: R2 = 0.51 for [Group Average Use of Personal Pronouns] = -0.107 ln [Messages] + 8.9 
64 
 
 
Figure 4.1.3 Group Average Use of ?I? Pronouns (%) vs. Messages Sent 
Trendline: R2 = 0.025 for [Group Average Use of ?I? Pronouns] = 2.3e-3 [Messages] + 1.53 
The rate of first-person singular ?I? pronouns decreases for the bulk of the contacts sending 
between one and four messages. Over all messages, no obvious correlation exists. 
 
 
Figure 4.1.4 Group Average Use of ?We? Pronouns (%) vs. Messages Sent 
Trendline: R2 = 0.871 for [Group Average Use of ?We? Pronouns] = -0.0311 [Messages] + 4.05 
A strong, linear, negative correlation exists. 
65 
 
 
Figure 4.1.5 Group Average Use of ?You? Pronouns (%) vs. Messages Sent 
Trendline: R2 = 0.691 for [Group Average Use of ?You? Pronouns] = 0.016 [Messages] + 2.26 
A moderate, linear, positive, correlation exists. 
 
Figure 4.1.6 Group Average Use of ?She/He? Pronouns (%) vs. Messages Sent 
Trendline: R2 = 0.288 for [Group Average Use of ?She/He? Pronouns] = -0.0234 ln [Messages] + 
0.371. The LIWC rate decreases from 0.5 to 0.3 from one to five messages before remaining at 
0.3 for the remaining numbers of messages (except at 18 messages, where the rate returns 
briefly to 0.4). 
66 
 
 
Figure 4.1.7 Group Average Use of ?They? Pronouns (%) vs. Messages Sent 
The rate remains constant between 0.6% and 0.7% 
 
Figure 4.1.8 Group Average Use of Impersonal Pronouns (%) vs. Messages Sent 
Trendline: R2 = 0.553 for [Group Average Use of ?She/He? Pronouns] = -0.0118 [Messages] + 
0.553 
 
  
67 
 
4.1.2. Additional Observations and Calculation Checks for Hypothesis One 
Chapter 6 discusses the results of testing Hypothesis One reported in the previous section 
(Section 4.1.1). This section reports results of additional observations, calculation checks, 
and variations of the test for Hypothesis One. It includes the following sections: 
1. Use of Pronouns: Contact Averages vs. Message Averages vs. Other Corpuses 
This section compares the average pronoun use rates in the study data calculated 
in two different ways (over all messages, weighted message author, and equally 
and over all messages). It then compares these rates to rates in literature. 
2. Minimum, Average, and Maximum Pronoun Rates for Groups and Contacts 
This section discusses testing the use of minimum, average, and maximum 
pronoun rates for (a) groups of contacts and for (b) messages that contacts have 
sent to describe groups of contacts. 
3. Group Sizes 
This section shows the importance of top-coding contacts who sent more than 20 
messages into a single group. 
4. Sensitivity of Word Count on First-Person Plural (?We?) Pronouns: Do pronoun 
relationships hold true or vary for contacts who write longer messages? 
This section shows the effect of limiting study message data to increasingly 
longer messages (from a minimum of 0 words to a minimum of 100 words in 
steps of 10 words) on relationships between the use of ?we? words and the 
number of messages that contacts have sent. 
68 
 
5. First-Person Plural ?We? Group Averages Weighted By Word Count vs. Contact 
This section compares weighting individual messages that contacts write equally 
to weighting these messages by message length. 
6. Limiting ?Our? Time Period 
This section shows the effects on the relationship between the use of ?we? words 
and the number of messages that contacts have sent in the two cases of (a) 
limiting study message data to the four months for which NOTCORP data was 
collected and (b) analyzing only personal messages by removing all NOTCORP 
messages and custom messages from the study data. 
Contact Averages vs. Message Averages and Other Corpuses 
The results to Hypothesis One are found by calculating the LIWC group average rates for 
contacts who have written at least one personal message. (See detailed methods in 
Section 3.3.1.) Table 4.1.2 and Figure 4.1.9 compare the overall, average, personal 
message LIWC pronoun rates weighted by contact (Table 4.1.2 column two) against 
overall, unweighted, average, personal message rates (Table 4.1.2 column three). Both of 
these rates are calculated irrespective of the groups of contacts who sent the same number 
of messages that were used to test Hypothesis One, and the overall unweighted, average, 
message rates are calculated irrespective of contacts. The two overall rates are similar. 
Table 4.1.2 and Figure 4.1.9 also compares these overall, average rates against 
average rates supplied in ?The Development and Psychometric Properties of LIWC2015? 
(Pennebaker et al. 2015) for Twitter messages and the grand mean of six other text 
categories: blogs, expressive writing, novels, natural speech, the New York Times, and 
Twitter. Table 4.1.2 and Figure 4.1.9 report Twitter messages outside the grand mean of 
69 
 
all six LIWC text categories to emphasize the comparison between advocacy messages 
and Twitter messages. Among the six LIWC text categories, Twitter messages most 
closely resemble personal advocacy messages in length. In fact, nonprofit organizations 
often encourage constituents to share their advocacy messages on Twitter. 
Comparing study text to the other text categories, Table 4.1.2 and Figure 4.1.9 
show, first, that contact average pronoun rates are similar to message average pronoun 
rates. Second, in comparison to general writers of Twitter messages and general writers 
of text in the six LIWC text categories, environmental advocates use fewer ?I? pronouns 
and more ?we? pronouns. They use ?he/she? pronouns at similar rates to those found in 
the Twitter messages, and less than those found in the six LIWC text categories. 
Results from the test of Hypothesis One (4.1.1) showed the use of ?we? pronouns, 
among all LIWC pronoun dimensions, has the strongest correlation with the number of 
messages that contacts have sent. While the overall use of ?we? pronouns used per 
message is low (3.9%), comparing the use of this dimension to its use in the six other 
LIWC text categories in Table 4.1.2, its rate of use is more than four times each of the 
values found in blogs (0.91%), expressive writing (0.81%), novels (0.61%), natural 
speech (87%), the New York Times (38%), and on Twitter (0.74%). The percentage is 
closer to that of positive and negative emotional words, which range from 2.1 to 5.5 
percent in these same LIWC categories, have much larger LIWC dimension dictionaries 
(620 for positive emotions and 720 for negative emotions), and are frequently used in 
sentiment analysis studies such as Kouloumpis (2011), for Twitter messages, H Wang 
(2012, 2012), for presidential candidates? Twitter messages, and Luxon, E.M. (2019), for 
environmental policy news coverage. 
 
70 
 
Table 4.1.2 Pronoun Rate Comparison 
This table lists average pronoun rates for study data calculated in two different ways and for 
Twitter messages and six text categories of reported by Pennebaker et al. (2015): blogs, 
expressive writing, novels, natural speech, NY Times, and Twitter 
 
 Average Pronoun Rate (%)  
 Study Data Pennebaker et al. (2015)  
Contacts? 
LIWC Pronoun Message LIWC Twitter Average of Six LIWC Dictionary Size 
Category Average Messages Messages Text Categories (words) 
All 14.12 13.85 13.62 15.22 143.00 
Personal 8.92 8.72 9.02 9.95 93.00 
I 1.62 1.48 4.75 4.99 24.00 
We 3.92 3.83 0.74 0.72 12.00 
You 2.30 2.39 2.41 1.70 30.00 
She/He 0.42 0.36 0.64 1.88 17.00 
They 0.66 0.66 0.47 0.66 11.00 
Impersonal 5.19 5.12 4.60 5.26 59.00 
 
 
 
Figure 4.1.9 Plot of Table 4.1.2. Comparing the Use of Pronouns in Personal Message Data 
with the Use of Pronouns in LIWC Twitter and General Data Sets 
 
Minimums, Averages, and Maximums for Groups and Contacts 
  
71 
 
This study calculated additional series of LIWC values, not reported above, while 
calculating group averages of contact averages. It calculated minimums and maximums 
for both groups and contacts, yielding a total of 9 series (including the average series) for 
each of the LIWC pronoun dimensions. Group minimums were the least practical metric, 
as the minimum use of each dimension was, as expected, zero for this case. A single 
message that does not contain a word in a particular LIWC dimension makes the 
minimum zero for its author and its author?s group. Group maximum rates were also not 
very useful. They contain regular ratio values for the minimum, average, and maximum 
rates (e.g. 100%, 50%, 66%) of contacts. For example, a single contact sending a single 
message with 50% pronouns (e.g. ?love you?) might define the maximum ?you? rate for 
their group (50%). Finally, group minimums of contact average rates and group 
maximums for contact average rates were calculated. These two series show minimums 
and maximums approaching upper and lower average contact rates. These rates express 
the distributions minimum and maximum contact rates. Figure 4.1.10 and Figure 4.1.11 
show, as examples, the group maximum and group average rate series. While less 
pertinent to answering the research questions than the group averages of the contact 
averages, the group minimum and maximum diverging lines in Figure 4.1.11 confirm 
what is expected ? that rate ranges are greater for contacts who send more messages. 
 
72 
 
 
Figure 4.1.10 Average Use of Pronouns (%) vs. Messages Sent Expressed by Group 
Maximums of (a) Contact Minimums, (b) Contact Averages, and (c) Contact Maximums 
Values are regular numerical ratios (e.g. 50%, 66.66%) 
 
 
Figure 4.1.11 Use of All Pronouns vs. Number of Messages Sent Expressed by Group 
Averages of (a) Contact Minimums, (b) Contact Averages, and (c) Contact Maximums 
The group averages of contact minimums and maximums express the range of pronoun usage 
for each group of contacts who have sent the same number of messages. 
 
73 
 
Group Sizes 
Contacts have sent between one and 238 personal messages. For the group of contacts 
who have sent exactly one message, 78,334 out of 442,079 (18%) of these contacts wrote 
a personal message. The group that sent the highest number of personal messages, in 
comparison, is not a group at all: it?s a single contact that sent 238 messages, nine of 
which were personal messages. Figure 4.1.12 and Figure 4.1.13, in comparison to top-
coded Figure 4.1.4, show the effect of group size (number of contacts) on the LIWC 
group average calculations for first-person plural (?we?) pronouns. As the number of 
messages sent by each contact in each group increases along the abscissa, the group size 
(right axis) decreases. On a semi-log plot, Figure 4.1.13 shows that this decrease is 
exponential until the group size starts to dip below 1,000 contacts, at which point it starts 
decreasing more rapidly. As group size decreases, the variability of LIWC group means 
increase, and overall relationships become visually less apparent. 
Figure 4.1.14 shows this effect of decreasing group size on all LIWC pronoun 
attributes. Outliers in Figure 4.1.14 are more common for groups where group size is 
small. For example, the extreme outlier visible in the upper right of Figure 4.1.14 for the 
?group? that sent 147 messages per contact is, like the ?group? that sent 238 messages, 
actually a single contact. This contact has written, in a single personal message, ?thank 
you for your consideration.? Compared to other messages, this short message contains an 
extremely high percentage of ?you? words (40%), that is over 15 times the average ?you? 
rate for all messages shown in Table 4.1.2 (2.39%) and for the group average rates shown 
in Figure 4.1.5 (ranging from 2.2% to 2.6%). This message uses no other pronouns. The 
?group of one? sending 140 total messages is another outlier. This contact sent 138 
74 
 
messages, but only two of them were personal ? and the two that he or she sent were 
comprised of random characters like ?asdlkfhalfjd.? (While the author of these characters 
might have been using them to express an emotion or an exclamation, it is impossible to 
tell for sure.) To avoid small group problems, like these two, this study top-coded groups 
of contacts (19,017 of them; 2.75%) who sent more than 20 messages. This top-coding 
resulted in a minimum group size of 1,371 contacts for the group of contact that each sent 
a total of 20 messages. Figure 4.1.15 shows that top-coding the results in this manner 
accounts for most contacts (671,614/690,631; 97.25%) and messages (70% general 
messages; 71% personal messages) in the groups of contacts who sent 20 or less 
messages. (Winsorizing or eliminating values would produce similar effects in explaining 
correlation variations.) 
 
 
Figure 4.1.12 Group Average Use of ?We? Pronouns (%) (left axis) and Group Size (right 
linear axis) vs. Messages Sent 
Compare to Figure 4.1.4, this figure does not top-code contacts who sent more than 20 
messages into a single group. This shows the variability in ?we? percentages increases as group 
size decreases. 
75 
 
 
Figure 4.1.13 Group Average Use of ?We? Pronouns (%) (left axis) and Group Size (right 
log axis) vs. Messages Sent. 
Compared to Figure 4.1.12, the log scale for group size emphasizes the relationship between the 
use of the ?we? words and the engagement factor, messages per contact, for the bulk of the 
contacts who sent a low number of messages. The log scale for group size also reveals the high 
number of ?groups? with only a single contact in them for contacts who have sent over 100 
messages (bottom right of this figure). 
76 
 
 
Figure 4.1.14 Group Average Use of all LIWC Pronouns Dimensions (%) (left axis) And 
Group Size (right log axis) vs. Messages Sent. 
This plot shows the variability all LIWC pronoun attributes increases as group size (contacts) 
decrease. 
 
77 
 
 
Figure 4.1.15 Number of All Messages and Number of Personal Messages (left axis) and 
Group Size (right axis) vs. Messages Sent 
 
 
Sensitivity of Word Count on First-Person Plural (?we?) Pronouns: Do Pronoun 
Relationships Hold True or Vary for Contacts who Write Longer Messages? 
Although a clear, strong negative relationship exists between the number of messages 
that contacts send and first-person plural ?we? pronouns, R2 = 0.87 (Table 4.1.1, Figure 
4.1.4), the practical use of ?we? pronouns to predict organizational engagement from a 
single message is limited by (a) the low percentage of ?we? pronouns in each message 
and (b) by the low regular word count for a personal messages. Figure 4.1.5 shows the 
frequency distribution of word count is positively skewed with an average of 29 words 
per message and a mode of 11 words per message. 
78 
 
 
Figure 4.1.16 Positively Skewed Distribution of Word Count 
Average = 29 Words; Mode = 11 Words 
 
While many studies do review linguistic attributes that are used in low frequencies (like 
the studies referenced in Section 4.1.2.1), determining the difference of using 3% or 4% 
of ?we? pronouns in a single personal message, given word count, is unrealistic. 
These observations inspire a message length (word count) sensitivity analysis on 
the relationship between ?we? pronouns and the number of messages that contacts send. 
This analysis should answer the question: Is it easier to predict the number of messages 
that contacts send from the use of ?we? pronouns for contacts who write longer 
messages? Repeating the test of the relationship between ?we? pronouns on the number 
of messages sent in Hypothesis One, and limiting the group size in this test for contacts 
who write messages with a minimum average word count in the range of zero to 100 
shows, at first, that the correlation decreases with word count. A weaker, but still strong 
correlation (R2=0.74) exists at a 20 word minimum limit. A moderate correlation exists 
(R2=0.52) for a 30 word minimum limit. Results are shown in Table 4.1.3 and Figure 
79 
 
4.1.17. The decreasing correlation may be due to decreasing group size (also tabulated in 
Table 4.1.3; less contacts send longer messages), but the wavering tail correlation 
increase shown in Figure 4.1.6 may be due to word count. Future work could investigate 
these relationships further. 
Table 4.1.3 Effects of Limiting Data by Minimum Word Count (WC) on the Correlation 
Between the Use of First-Person Plural ?we? Pronouns and Groups of Contacts Who Have 
Sent the Same Number of Messages. 
 
Minimum WC R2 Group Size (Contacts) 
0 0.87 690,631 
10 0.85 248,552 
20 0.74 162,666 
30 0.52 123,411 
40 0.35 100,210 
50 0.25 84,574 
60 0.20 73,736 
70 0.15 65,559 
80 0.23 59,500 
90 0.35 54,752 
100 0.09 50,892 
 
Figure 4.1.17 Effects of Limiting Data by Minimum Word Count (WC) on the Correlation 
Between the Use of First-Person Plural ?we? Pronouns and Groups of Contacts Who Have 
Sent the Same Number of Messages. 
80 
 
 First-Person Plural ?We? Group Averages Weighted By Word Count vs. Contact 
Objective One methods describe two procedures to lump pronoun rates for groups in 
Section 3.3.1. The first procedure calculates average contact LIWC rates, weighted 
equally messages (step 4a in Section 3.3.1). Results reported in Section 4.4.1 at the 
beginning of this chapter follows this procedure. The second procedure (step 4b in 
Section 3.3.1) calculates contact LIWC rates for all contact messages. This is equivalent 
to weighting messages by message length in the calculation of each contact rate, or 
concatenating the personal messages that each individual contact wrote, and calculating 
an overall, single, contact LIWC score. 
In weighting messages equally to calculate average contact pronoun use rates, 
short messages could lead to contact rates that misrepresent the linguistic styles of their 
authors expressed in any long messages also written by them. For example, a contact who 
sends one very short message, ?I love you,? uses zero words in the LIWC ?we? pronoun 
dimension. If that same contact sends a longer message using ?we? words at a rate of 
10%, following the procedure in step 4a in section 3.1, the test for Hypothesis One would 
give the contact an overall 5% rate (0%+10%)/2 = 5%). Repeating Hypothesis One ?we? 
words, but with word count averages instead of contact averages, yields results similar to 
the original results. Results from this ?we? pronoun test are shown in Figure 4.1.7. 
 
81 
 
 
Figure 4.1.18 Group Average Use of ?We? Pronouns Weighted by Message Length and 
then by Contact (%) vs. Messages Sent 
Trendline: R2 = 0.895 for [Group Average Use of ?She/He? Pronouns] = -0.016 [Messages] + 3.55 
Results are similar to those found by equally weighting ?we? rates by message, 
where R2 = 0.895, shown in Figure 4.1.4. 
 
 
Limiting ?Our? Time Period 
As described in the data section, messages that were not customized and were not 
personalized (NOTCORP) were only available between July 1, 2018 through October 31, 
2018 (four months), while personal messages and customized messages were available 
between July 1, 2017 and October 10, 2018 (16 months). This difference in periods 
presented two main options for testing the three initial hypotheses: (1) using all data to 
calculate the number of messages and the average linguistic rates for each contact, and 
(2) constraining the data to the limited set of four months. A third, intermediate option, is 
to (3) remove the NOTCORP data. The advantage of the first option is that it considers a 
longer history for each contact and therefore a larger sample of messages. The 
disadvantage of the first option is that it increases the variability of the average number of 
82 
 
messages sent per contact when contacts are individually more or less active before and 
after July 1, 2017. The advantage of the first option ? the longer time period and greater 
number of messages used to describe a contact ? can also be considered a disadvantage. 
Advertising companies like Google, for example, place less value on older data. In fact, 
Google even allows users to have their data usage purged automatically after three or 
eighteen months (Google 2019) ? but no less. The second option could be, therefore, 
considered advantageous in that only linguistic rates calculated from recent months 
would be considered. (A future temporal analysis could test the sensitivity of time ranges 
on relationships, provided new data.) The third option, like the first option, considers the 
longer 16-month time period, but it eliminates the NOTCORP data, and its limited 
timeframe, completely. This third option still considers messages without personal 
messages ? the customized messages for which linguistic scores were not computed for. 
While the results chapter reports findings from selecting the first option as the 
primary method of conducting the study, results from testing the second and third option 
were similar. Figure 4.1.19 demonstrates the effect of limiting the study time period on 
the test of first-person plural ?we? pronouns as a predictor of engagement. Even though 
the number of contacts in each group has been reduced, a strong, negative, linear 
correlation remains (R2 = 0.827). Figure 4.1.20 shows the third analysis option ? 
eliminating the NOTCORP data. It yields a correlation of R2 = 0.717. These figures 
indicate that considering both the petition-style messages, and the messages that contacts 
choose not to customize, alongside the messages that they do customize, strengthen 
observed correlations. 
83 
 
 
Figure 4.1.19 Group Average Use of ?We? Pronouns (%) vs. Message Sent After July 1, 
2018 
Trendline: R2 = 0.827 for [Group Average Use of ?We? Pronouns] = -0.0358 [Messages] + 3.74 
 
Figure 4.1.20 Group Average Use of ?We? Pronouns (%) vs. Personal Message Sent 
Trendline: R2 = 0.717 for [Group Average Use of ?We? Pronouns] = -0.0166 [Messages] + 3.95 
 
84 
 
 Hypothesis Two: Relationships Between Personal Messages and the Number 
of Messages that Contacts Send 
Figure 4.2.1 shows results from the methods to test Hypothesis Two for groups of 
contacts who sent the same number of messages. The average personal message rate 
increases from 18% to 24% to 26% between groups of contacts who sent one, two, and 
three messages. The rate stays at 26% personal messages before slowly decreasing to a 
minimum of 15% for contacts who sent 35 messages. It then increases to rates 
approaching 50% for small groups of contacts who sent a lot of messages (>40). 
Given the group size discussion in the results for the test of Hypothesis One 
(section 4.1.2.3), most contacts who have sent two or more messages are more likely to 
have sent personal messages at a higher rate than contacts who sent a single message. 
Additionally, the small number of contacts who sent many messages (>40), sent personal 
messages at an increasing rate. Figure 4.2.2 shows a simpler plot of the average number 
of personal messages sent instead of the rate. 
Contacts who write more messages also send personal messages at a higher rate. 
Inversely, and directly answering the research question, contacts who send personal 
messages at high rates also send more messages. For the bulk of the contacts (671,614 
among 690,631; 97%) sending under 20 messages: groups of contacts who send 
messages at the average rate of 18%, send only send one message. The group of contacts 
sending two to twenty messages, send them at an average rate of 25%. 
A final, simpler way to understand the relationship between the use of personal 
messages and the use of all messages is by reviewing the plot of the total number of each 
of these categories of messages as shown in Figure 4.1.15, above, and isolated in Figure 
85 
 
4.2.3, below. The figure shows the ratio of personal messages to all messages increasing 
as the number of messages that contacts send increases from one message to two 
messages. 
 
Figure 4.2.1 Average Personal Message Rate vs. Messages Sent per Contact 
 
Figure 4.2.2 Average Personal Messages vs. Messages Sent per Contact 
86 
 
 
 
Figure 4.2.3 Number of All Messages and Number of Personal Messages for Groups of 
Contacts Who Have Sent the Same Number of Messages from One to Twenty 
 
 Hypothesis Three: Relationships Between Message Length and the Number 
of Messages that Contacts Send 
Figure 4.3.1 shows that, on average, most contacts write messages one word longer (28 to 
29 words) when they write more than a single message. The number of messages then 
begins to drop by a slight 1/10 of a word per message with a moderate strength 
correlation (R2=0.694). The average message length for the group sending more than one 
message, however, is 29 words, and the average message length for all messages is also 
29 words. In summary, groups of contacts who send only one message send very slightly 
shorter messages (1 word). (Interestingly, as shown in Chapter 5 explorations, most 
contacts who send more messages are also more likely to contribute membership dues to 
the organization.) 
87 
 
 
 
Figure 4.3.1 Word Count vs. Number of Messages Sent 
Trendline: R2 = 0.694 for [Word Count] = -0.01 [Messages] + 29.3 
After two messages, contacts who sent more messages, sent slightly shorter messages. 
  
88 
 
 RESULTS FOR OBJECTIVE TWO: 
MEMBERSHIP EXPLORATION 
This chapter both details methods and reports relationships between message and text 
metrics with a second measure of engagement, membership. It labels contacts as 
members if they have paid any amount of membership dues to their organization within 
the past year of any message that they have sent, or have been designated as a lifetime 
member by their organization for a large contribution during or before the study period. It 
refers to the percentage of members in different groups of contacts as a membership rate 
for that group. 
The first section of this chapter, Section 5.1, reports relationship between membership 
and the message and text metrics used in testing Objective One hypotheses: 
1. Number of messages and personal messages (Section 5.1.1) 
2. Use of pronouns (Section 5.1.2) 
3. Message length (Section 5.1.3) 
Next, Section 5.2 reports correlations between the use of LIWC dimension words and 
membership for ungrouped contacts. It also reports correlations between the use of LIWC 
dimension words and the measure of engagement from Objective One, the number of 
messages that contacts send, as well as the number of personal messages that contacts 
send ? but for individual, ungrouped contacts vs. the groups of contacts. 
 Section 5.3 through Section 5.7, finally, reports membership rates for contacts 
grouped by conditions defined by the following text metrics: 
1. Terms used to search for personal stories (Section 5.3) 
2. Writing complexity defined by the Flesch reading ease test (Section 5.4) 
89 
 
3. Sentiment defined by the VADER sentiment classifier (Section 5.5) 
4. Popular words among all personal messages in this study (Section 5.6) 
5. Words in all LIWC Dimensions (Section 5.7) 
Among the 690,631 total contacts who have sent any type or number of messages, 90,698 
are members (13% membership rate), 194,409 have authored personal messages (28%), 
and 52,323 are members who have sent personal messages (7.6%). Compared to the 
overall 13% membership rate (90,698/690,631), the membership rate for those sending 
personal messages is 27% (52,323/194,409). Section 5.3 through Section 5.7 compares 
conditional membership rates to alternative conditions and these baseline membership 
rates (13% and 27%). 
Contacts labeled as members in this study gave a minimum of $15/year. Most 
contacts gave suggested amounts, or more. For reference, Table 4.3.1 shows minimum, 
suggested, and maximum membership rates for large, prominent U.S.-based nonprofit 
environmental organizations that actively host online advocacy systems to send petitions 
and letters to policymakers. The average, regular, annual membership or one-time 
donation rate for these organizations is $52/person/year. Study data did not come from all 
these organizations. 
  
90 
 
Table 4.3.1 Membership Rates for Large, National Environmental Nonprofit Organizations 
with Online Petition or Letter-Writing Campaigns. 
Data comes from organization websites, ProPublica (2019) and the Internal Revenue Service 
(2019) for 501c3 and 501c4 organizations. Organization form 990 revenue comes from several 
sources, including, but not exclusively from membership dues. Study data did not come from all 
of these organizations. 
 Monthly  Annual or One Time 
Form 990 
Advocacy Organization Revenue (M) 
Earthjustice $35 - $1,000  $35 $30 $1,000 $80 
Environmental Defense Fund $15 $25 $75  $35 $50 $1,000 $211 
Greenpeace $15 $25 $55  $30 $50 $120 $17 
National Audubon Society $20 $50 $500  $20 $50 $500 $134 
National Wildlife Federation $8 $50 $50  30 50 1000 $83 
Natural Resources Defense  
Fund $15 $20 $100 $35 $50 $200 $182 
Nature Conservancy $15 $100 $10,000  $15 $100 $10,000 $1,185 
Sierra Club $15 $20 $85  $25 $39 $75 $141 
Wildlife Conservation Society $10 $20 $100  $25 $50 $500 $279 
World Wildlife Fund (WWF) $10 $15 $50  $25 $50 $5,000 $257 
Average $16 $36 $1,202  $28 $52 $1,940 $257 
 
 
 
 Exploration One: Membership as a Measure of Organizational Engagement 
This section reports test results of relating three types of predictor metrics to membership. 
These three types of predictor metrics are similar to the three types of predictor metrics in 
the three initial hypotheses in Objective One; they are: the number of messages and the 
number of personal messages a contact has sent (Section 5.1.1), pronoun use (Section 
5.1.2), and message length (Section 5.1.3). 
5.1.1. Membership and The Number of Messages Sent 
This exploration begins by testing relationships between the number of messages that 
contacts have sent, the measure of engagement of Objective One, to membership, the 
measure of engagement of Objective Two. Figure 5.1.1 shows positive relationships 
91 
Min 
Suggested 
Max 
 
Min 
Suggested 
Max 
 
between the number of messages that contacts have sent as a predictor of membership. 
Similar to Figure 4.1.13 and Figure 4.1.14 in Section 4.1.2.3, Figure 5.1.1a shows the 
effects of groups of contacts with low numbers of contacts, who have sent high numbers 
of messages, on the variability of membership rates. As the number of messages that 
groups send increases, group size rapidly decreases to one or two contacts, and the 
variability of membership rates increases. For example, the average size of groups of 
contacts sending 100 or messages is equal to two. This example explains the points that 
could form a horizontal line at the 50% membership rate in Figure 5.1.1a. Connected, 
other points in Figure 5.1.1a would form other horizontal lines at other regular 
membership rates for small groups of contacts (e.g. 0%, 25%, 33%, 66%, 75%, and 100% 
membership rates). As addressed in the results for Hypothesis One, Figure 5.1.1b top-
codes groups of contacts sending more than 20 messages into a group of 19,017 contacts. 
(See Section 3.3.1.2 for a description of how this study top-codes contacts and Section 
4.1.2.3 for the importance of top-coding contacts who have all sent high numbers of 
messages, over 20.) 
  
92 
 
 
(a) All Groups of Contacts who Sent the Same Number of Messages 
Membership rates are percentages of members for groups of contacts. 
 
 
(b) Groups of Contacts who Sent the Same Number of Messages, Top-Coded for Contacts who 
Sent More than 20 Messages 
 
Figure 5.1.1 Membership Rate (%) vs. Number of Messages Sent 
Membership rates range from 7%, for the group of contacts who sent one message, to 
37%, for the groups of contacts who sent 13, 15, 16, 17, and 21+ messages. Membership 
rates are percentages of members for groups of contacts. 
  
93 
 
Figure 5.1.2 flips looking at average membership rates as a function of groups of contacts 
who sent the same number of messages, to looking at the average number of messages 
sent as a function of membership. The figure shows a total of four groups of two 
averages. The first two groups show that members send more messages than non-
members. They send, on average, 3.753 more messages (6.445 - 2.692 = 3.753; a 139.4% 
increase) and 1.478 more personal messages (1.995 - 0.517 = 1.478; a 285.9% increase). 
The second two groups in Figure 5.1.2 represent contacts who have sent at least 
one personal message. These groups are interesting to this study because the contacts? 
personal messages in these groups permit this study to perform text analysis on them. 
These two groups show, in general, similar results to the first two groups: members send 
messages at higher rates than non-members. Specifically, for contacts who have sent at 
least one personal message, average overall message rates increase by 3.061 messages 
(7.938 - 4.877 = 3.061; a 62.76% increase) and personal message rates increase by 1.277 
(3.459 - 2.182 = 1.277; a 58.52% increase). Comparing the first two groups to the second 
two groups, for each group, contacts who have sent at least one personal message are also 
more likely to send more messages overall. 
94 
 
 
Figure 5.1.2 Average Number of Messages and Personal Messages Sent Per Contact 
Organized by Conditions of Membership and Whether a Contact has Sent a Personal 
Message 
 
5.1.2. Membership and LIWC Pronoun Rates 
Figure 5.1.3 shows that most contacts use no or low rates (? 1%) of pronouns from each 
individual ?i,? ?we,? ?you,? ?she/he? and ?they? pronoun dimension. These contacts have 
membership rates equal to or slightly lower than the baseline membership rate for all 
contacts who write personal messages (27%). The contacts who use pronouns from each 
individual ?i,? ?we,? ?you,? ?she/he? and ?they? dimension at a rate of 1% or lower have 
respective membership rates of 25%, 23%, 25%, 27%, and 26%. The contacts who use 
pronouns from each individual ?i,? ?we,? ?you,? ?she/he? and ?they? dimension at rates 
between 1% and 2% have much higher respective membership rates of 36%, 35%, 36%, 
31%, and 35%, but much fewer contacts use pronouns in these dimensions at rates higher 
than 1%. For the contacts that do use pronouns at rates higher than 1% for ?i,? ?we,? 
95 
 
?you,? ?she/he? and ?they? pronoun dimensions, Figure 5.1.3 shows a negative 
relationship between these pronouns and membership. 
Membership rates are all lower for contacts that use no or low rates (? 1%) of 
pronouns from the all pronoun, all personal pronoun, and impersonal pronoun dimensions 
(19%, 20%, and 22%) compared to the membership rates for  ?i,? ?we,? ?you,? ?she/he? 
and ?they? dimensions. Additionally, Figure 5.1.3 does not show clear trends between 
membership and the use of all pronouns and all personal pronouns for pronoun use rates 
greater than 1%. These observations illustrate the parent-child category relationships 
between pronoun dimensions in LIWC (2018) and show that contacts do not use all types 
of pronouns in all messages.3 
  
 
 
3 Given message time stamps, these plots could inspire future tests of relationships between 
membership, pronoun diversity, and changing perspectives of authors. See Pennebaker (2011) for a 
discussion of the importance of analyzing changing perspectives in text. 
96 
 
 
a. All Pronouns 
 
b. All Personal Pronouns 
 
Figure 5.1.3 Membership Rate (%) vs. LWIC Pronoun Dimensions Rates (%) 
For (a) All Pronouns, (b) All Personal Pronouns, (c) ?I? Pronouns, (d) ?We? Pronouns, (e) ?You? 
Pronouns, (f) ?They? Pronouns, (g) ?She/He? Pronouns, (h) and Impersonal Pronouns 
Contacts are grouped into bins by their individual average message LIWC rates. Membership 
rates are percentages of members for groups of contacts. 
97 
 
 
c. ?I? Pronouns 
 
d. ?We? Pronouns 
 
Figure 5.1.1 Continued. Membership Rate (%) vs. LWIC Pronoun Dimensions Rates (%) 
For (a) All Pronouns, (b) All Personal Pronouns, (c) ?I? Pronouns, (d) ?We? Pronouns, (e) ?You? 
Pronouns, (f) ?They? Pronouns, (g) ?She/He? Pronouns, (h) and Impersonal Pronouns 
Contacts are grouped into bins by their individual average message LIWC rates. Membership 
rates are percentages of members for groups of contacts. 
98 
 
 
e. ?You? Pronouns 
 
f. ?She/He? Pronouns 
 
Figure 5.1.1 Continued. Membership Rate (%) vs. LWIC Pronoun Dimensions Rates (%) 
For (a) All Pronouns, (b) All Personal Pronouns, (c) ?I? Pronouns, (d) ?We? Pronouns, (e) ?You? 
Pronouns, (f) ?They? Pronouns, (g) ?She/He? Pronouns, (h) and Impersonal Pronouns 
Contacts are grouped into bins by their individual average message LIWC rates. Membership 
rates are percentages of members for groups of contacts. 
99 
 
 
g. ?They? Pronouns 
 
h. Impersonal Pronouns 
 
Figure 5.1.1 Continued. Membership Rate (%) vs. LWIC Pronoun Dimensions Rates (%) 
For (a) All Pronouns, (b) All Personal Pronouns, (c) ?I? Pronouns, (d) ?We? Pronouns, (e) ?You? 
Pronouns, (f) ?They? Pronouns, (g) ?She/He? Pronouns, (h) and Impersonal Pronouns 
Contacts are grouped into bins by their individual average message LIWC rates. Membership 
rates are percentages of members for groups of contacts. 
  
100 
 
5.1.3. Membership and Message Length 
Figure 5.1.4 shows that for average word count bins, membership rates quickly increase 
from 17% to 30% as average word counts for individual contacts increase between 1 to 
40 words long. Then, membership rates begin to slowly decrease with increasing average 
word counts and decreasing numbers of contacts. The contacts who sent messages with 
average word counts less than or equal to 40 account for most of the contacts who sent 
personal messages (79%; 152,712 out of 194,409 contacts). The contacts who have sent 
messages with an average word count greater than 40 words have an average membership 
rate of 28%, which is close to the baseline membership rate for all contacts who write 
personal messages (27%). 
The membership rate for the group of contacts that includes contacts who sent 
messages with an average word count equal to the mode word count of all personal 
messages (13 words) is 26%, which is close to the baseline membership rate for all 
contacts who write personal messages (27%). The membership rate for the group of 
contacts that includes contacts who sent messages with an average word count equal to 
the average word count of all personal messages (29 words) is 30%, which is only 
slightly higher than the baseline membership rate (27%). 
Overall, membership rates (left axis) and the positively skewed distribution of 
word count (right axis) show that for most contacts, average word counts between one 
and the overall average word count (29) are better predictors of membership rate 
differences than higher average word counts are. For example, Figure 5.1.4 groups 
contacts who write messages 25 words shorter than the average message (29 - 25 = 4 
words) with contacts that have a 17% membership rate (17% - 30% = -13%; a strong 
101 
 
difference). Alternatively, Figure 5.1.4 groups contacts who write messages 25 words 
longer than the average message ( 29 + 25 = 54 words) with contacts that have a 29% 
membership rate (29% - 30% = -1%; a slight difference). 
 
Figure 5.1.4 Membership Rate vs. Average Word Count 
Contacts are grouped into bins by their individual average message word counts. 
Membership rates are percentages of members for groups of contacts. 
 
 Exploration Two: Ungrouped Correlations 
While the results from Hypothesis One show correlations between pronoun usage and the 
average number of messages that groups of contacts send, correlations between pronoun 
usage and the number (not average) of messages that individual contacts send approach 
zero for all contacts. This makes sense: groups of thousands of contacts sending tens of 
thousands of messages reveal more information than single contacts sending a few 
messages ? the bulk of the data. By limiting the test of individual correlations to more 
prolific writers (as identified by minimum word counts and minimum numbers of 
messages sent), correlations begin to appear. In summary, it is easier to distinguish 
correlations among contacts who write longer and more messages. Figure 5.2.1 shows 
these correlations for contacts sending a minimum number of personal messages equal to 
102 
 
1, 2, 10, 15, and 20 (rows of plots) with minimum word counts of 0, 25, and 75 (columns 
of plots). Pronoun dimension correlations are plotted alongside other LIWC dimensions 
described in LIWC 2018. Notice that, while practically nonexistent, when the data are 
less restricted, the correlation coefficients between ?we? and ?you? words are 
respectively negative and positive as seen in the group test results of Hypothesis One. As 
the minimum number of messages per contact increases, and the minimum average word 
count per contact increases, the trend becomes slightly reversed: For example, the very 
small groups of prolific writers (e.g. the 77 contacts who wrote an average of 75 words 
per message ? three times the average ? among at least 15 messages ? five times the 
average) tend to use ?we? words somewhat more often (R=0.18), and ?you? words 
slightly less often (R=-0.16) for greater numbers of messages. 
 
  
103 
 
 
(a) Contacts sending at least one personal messages (pm ? 1) 
 
 
 
(b) Contacts sending at least two personal messages (pm ? 2) 
 
Figure 5.2.1 Continued. Relationships (R) between Individual Contact Linguistic Score 
Averages and Engagement (Messages, Personal Messages, and Membership) for Minimum 
Average Word Counts ? avg(pm) ? of 0, 25, 50, and 75. 
  
104 
 
 
(c) Contacts sending at least ten personal messages (pm ? 10) 
 
 
(d) Contacts sending at least three personal messages (pm ? 3) 
 
 
Figure 5.2.1 Continued. Relationships (R) between Individual Contact Linguistic Score 
Averages and Engagement (Messages, Personal Messages, and Membership) for Minimum 
Average Word Counts ? avg(pm) ? of 0, 25, 50, and 75. 
 
  
105 
 
 
(e) Contacts sending at least 20 personal messages (pm ? 20) 
 
Figure 5.2.1 Continued. Relationships (R) between Individual Contact Linguistic Score 
Averages and Engagement (Messages, Personal Messages, and Membership) for Minimum 
Average Word Counts ? avg(pm) ? of 0, 25, 50, and 75. 
  
106 
 
 Exploration Three: Personal Stories 
As described in the introduction, research bodies and practitioners encourage nonprofits 
to look for personal stories among personal messages (Karpf 2016, Congressional 
Management Foundation 2017, Social Change Agency 2017a, 2017b, Long 2018) to 
create ?groundbreaking? (Social Change Agency 2018b) digital campaigns. These 
researchers have shown that contacts who share stories about how they have been 
affected by campaign issues have a greater potential to contribute to nonprofits as 
participants and organizers. In fact, an original proposal for this dissertation considered 
training a naive Bayes classifier to attempt to identify personal stories for advocacy 
campaigns. While a machine learning classification model could be developed in the 
future, work on Objective Two begins by using key-phrase searches to identify messages 
with personal stories. It then tests whether specific phrases are related to membership. 
This exploration developed phrases from a collection of words already used by an 
employee of one nonprofit advocacy organization to look for personal stories. This 
employee noticed that the following phrases help identify personal stories for their team: 
1. As a 
2. I am a 
3. I live 
4. In my state/district 
5. My family 
6. My husband 
7. My wife 
8. My children 
107 
 
While some letter-writing campaigns may elicit many personal stories of lived 
experiences, permitting organizations? legal teams to pick and choose testimonials, many 
campaigns do not, and sets of small numbers of identified stories lead to statistically 
insignificant findings about membership even if those stories may be practically applied. 
Categorical chi-squared tests comparing small groups of contacts who have sent personal 
stories to those who have not sent personal stories, on the basis of membership, yield 
high, insignificant, p-values. 
To address this problem of small groups matching conditions, inspired by LIWC 
pronoun dictionaries and work by Gordon et al. (2009), who looks for stories in longer 
passages of text, the study casts a net to catch personal stories by expanding the original 
list of personal story phrases (above) to include subject pronoun variations (first-person, 
second-person, third-person, singular, plural; e.g. ?I? and ?we?), verb tenses (past, 
present, future) and endings (e.g. ?ed? and ?ing?), object single and plural ending 
variations (e.g. child vs. children), and limited consideration to imperfect tense (e.g. was 
vs. had) and some associated hedge phrases (e.g. have been living, had lived, go to, going 
to). 
It tests for the presence of phrases (a) starting a message, (a) anywhere, and (c) at 
the beginning of sentences and prepositions, qualified by patterns of punctuation and 
spaces. Searching for phrases at the beginning of a message automatically finds messages 
that begin with sentences that begin with the phrases, e.g., ?as a scientist?,? but does not 
find messages that contain later sentences that begin with the phrases, e.g., ?stop the 
pipeline. As a scientist?.? Searching anywhere finds both types of messages, increasing 
the number of results. Qualifying messages actually limits the number of search results 
108 
 
returned by queries, but in many cases, aligns results closer to their queries? intent. For 
example, an unrestricted search for ?as a? returns unintentional results for any message 
with a word ending in ?as? and a following word beginning with ?a,? like, ?The 
department has already accepted the contract?,? where ?has already? contains the 
matching phrase. A regular expression to search for this particular unintended result 
(MySQL expression ?[a-z]as a[a-z]?) returns 7,588 similar messages. Most of these 
messages do not tell personal stories like those found by the more complex pattern 
matching for ?as a? at the beginning of sentences and prepositions (MySQL expression 
?(([:punct:][:space:](As a))|(^As a))[:space:]?). (Note: Despite the capital letter ?A? 
shown in the pattern of this example, expressions in this study were matched against 
fields with a case insensitive collation to pick up prepositional phrases, e.g. ?after 
witnessing the dissemination of African Elephants from my family home village over the 
last 20 years, as an African, I hope that you will support the U.S. program to?.?) 
Reading messages resulting from the initial queries reveals most regular 
expressions describing these phrases do, indeed, reveal what an advocacy organization 
might call ?personal stories? ? but not necessarily ?lived experiences? (Sandhu 2017, 
Social Change Agency 2017a). Many resulting messages independently convey other 
meanings, such as volition to act and threats, e.g. ?I will vote against every conservative 
politician I can and switch my party affiliation if Bears Ears National Monument is 
reduced by even one square foot?; family support, e.g. ?my husband and I urge you to do 
this?; specific occupation, e.g. ?I am a carpenter and, therefore?support?sustainable 
logging. The Giant Sequoias are too important??; education and income, e.g. ?I am an 
109 
 
undergrad who with student loans. I try my best every single month to save energy in 
every way possible?.? 
Contacts describing battles with asthma near sources of pollution convey personal 
stories. A search for sentences beginning with ?I?ve had? and ?I live? return results such 
as,  
I?ve had asthma my whole life. I grew up in LA in the 50s and this issue matters to 
me. I want a world where my grandchildren can breathe easy,  
and  
I live along the interstate outside this operation. My family and friends are getting 
sick with asthma and are being forced to exercise indoors since fracking began. 
Methane and VOCs need to be regulated in every way possible  
These messages have subjectively ?more? personal stories in them than results from other 
queries for ?my daughter? and ?I have:? 
My child and I have asthma. Do your job. Protect our air! 
and 
I have asthma and need clean air to breathe. 
While this review exposes varying degrees of personal stories and intents of messages 
found with different expression patterns, this exploration did not rank messages by 
degrees of personal story. This work might be better suited for a machine learning in the 
future. This exploration does, however, evaluate several categories of searches and, 
additionally, tests LIWC dimensions and the most popular words used in messages. It 
reports findings for searching for personal stories with references to family, gender, 
residence, education, activism, volunteering, voting, spending, suffering, and swear 
110 
 
words. Appendix B lists the SQL queries and regular expressions used by this study to 
conduct these searches. 
5.3.1. Personal Stories and Family 
The study began exploration with a search recommended by a nonprofit organization to 
test for the presence of ?my husband? or ?my wife? at the beginning of a message to find 
personal stories. The search found some personal stories ? but more interestingly, 
successfully identified high membership rates. Table 5.3.1. shows that authors who have 
begun their messages with ?my wife? or ?my husband? (n=608, coded in MySQL as 
?LIKE ?my husband%??) have almost double the membership rate of those who did not 
(49% vs. 27%). A chi-square test shows the relationship is significant, X2 (k=1, 
n=194,409) = 154, p < 0.01. Table 5.3.2 shows the calculation of expected values for this 
statistic. Authors who have used these terms anywhere in their messages (n=1,219, e.g. 
"%my husband%") have a 45% membership rate compared to a 27% membership rate for 
those that do not (also p<0.01). Limiting this test to the group of contacts who have sent 
longer-than average messages has little effect on the test results: The number of results 
decreases to 438 contacts and the membership rate increases by one percent, to 50% ? 
producing a 23% increase over the 27% membership rate of the alternative results (p < 
0.01).  
Table 5.3.3 shows results for variations for husband and wife queries tested 
independently along with other family conditions. The chi-squared tests for second-
person and third-person husband and wife search conditions, such as searches for ?your 
wife,? have low enough conditional group sizes that their p-values exceed 0.1; their 
relationships therefore are not significant. Table 5.3.3 also shows that groups of contacts 
111 
 
who begin messages with search terms, despite their lower group sizes, generally have 
one or two percent higher membership rates compared to contacts who use the search 
terms anywhere in their messages. Further, the groups of contacts matching alternative 
conditions have membership rates close to the general membership rate of contacts 
sending personal messages (27% to two significant figures). 
Contacts who discuss children have the highest numbers of matching contacts and 
significant 36% and 37% membership levels compared to the average 27% membership 
rate. Queries that find these contacts match ?my children? and ?our children? in addition 
to singular ?child? expressions. Figure 5.3.1. shows these membership rates.  
In summary, contacts who discuss their family members are more likely to be 
members, and contacts discuss their children more than their spouses. An expansion 
study could test membership against references to grandchildren and other types of 
family members and friends not tested here: sons, daughters, mothers, fathers, etc. 
  
112 
 
Table 5.3.1 Husband and Wife Personal Story Observation Contingency Table and 
Calculations 
A ?personal story? condition in this table is defined by the case where a contact has written a 
message beginning with ?my husband? or ?my wife.? X2 (k=1, n=194,409) = 154, p < 0.01 
 
Did not begin 
Began a message message with ?my 
with ?my husband? husband? or ?my 
Observed or ?my wife? wife? Su m Total Proportion 
Member 299 52,0 24 52,323 0.269 
(52,323/194,409) 
Not a Member 309 141,777 142,086 0.731 
(142,086/194,409) 
Sum 608 193,801 194,409 1 
 
Membership Rate 49% 27% 27% 
(100%*299/608) (100%*52,024 (100% 
/193,801) *52,323 
/194,409) 
 
 
 
 
 
Table 5.3.2 Husband and Wife Personal Story Expected Values Contingency Table and 
Calculations 
Along with observations from Table 5.3.2, this table shows expected values and sub-calculations 
to calculate the chi-squared statistic and the chi-squared test p-value. These two tables serve as 
examples for additional chi-squared tests in Chapter 5. 
 
Did not begin 
Began a message message with ?my 
with ?my husband? husband? or ?my 
Expected or ?my wife? wife? 
Member 164 52,157 
(608*0.269) (194,409*0.269) 
Not a Member 444 141,642 
(608*0.731) (194,409*0.731) 
Sum 608 193,801 
 
 
  
113 
 
Table 5.3.3 Membership Rates (m), Sizes (contacts, n), and Chi-Squared Test P-Values for 
Family Conditions (true, c, and not true, ~c) 
condition term 
starts with my family 586 193,823 38% 27% 12% 11% 0.00 
contains my family 2,258 192,151 38% 27% 11% 11% 0.00 
phrase starts with my family 873 193,536 37% 27% 11% 11% 0.00 
starts with our family 155 194,254 40% 27% 13% 13% 0.00 
contains our family 1,496 192,913 36% 27% 9% 9% 0.00 
phrase starts with our family 219 194,190 39% 27% 12% 12% 0.00 
starts with my child 166 194,243 38% 27% 11% 11% 0.00 
contains my child 2,060 192,349 37% 27% 10% 10% 0.00 
phrase starts with my child 337 194,072 38% 27% 11% 11% 0.00 
starts with our child 578 193,831 38% 27% 11% 11% 0.00 
contains our child 14,005 180,404 36% 26% 10% 9% 0.00 
phrase starts with our child 1,579 192,830 36% 27% 9% 9% 0.00 
starts with my husband 384 194,025 47% 27% 21% 20% 0.00 
contains my husband 811 193,598 41% 27% 15% 15% 0.00 
phrase starts with my husband 528 193,881 45% 27% 18% 18% 0.00 
starts with my wife 224 194,185 52% 27% 25% 25% 0.00 
contains my wife 410 193,999 51% 27% 24% 24% 0.00 
phrase starts with my wife 273 194,136 52% 27% 26% 25% 0.00 
starts with my wife or my husband 608 193,801 49% 27% 22% 22% 0.00 
contains my wife or my husband 1,219 193,190 45% 27% 18% 18% 0.00 
contains your wife or your husband 62 194,347 24% 27% -3% -3% 0.63 
contains his wife or her husband 53 194,356 26% 27% 0% 0% 0.93 
 
 
Figure 5.3.1 Family Words and Membership 
37% membership rate for 20,001 total contacts 
Membership rates are percentages of members for groups of contacts. 
114 
n|c 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
 
5.3.2. Self-Identification Predictors for Personal Stories and Membership 
As describing one?s husband or wife may be indicative of a personal story (and perhaps 
money spent on membership; Figure 5.3.1, above), categorizing one?s self is by definition 
personal, and results from the search for terms like ?as a,? ?I am,? and ?I live,? reveal 
more than just the stories of ?lived experiences? (Sandhu 2017, Social Change Agency 
2017a)  that this exploration set out to look for. Results of these types of searches answer 
questions that advertising companies and banks traditionally asked consumers to judge 
consumer income and make credit determinations. Without taking a survey, some writers 
identify their own family membership, gender, occupations, affiliations, and education, 
writing ?I am a mom,? ?I am a doctor,? ?I am a college student,? ?I am a teacher,? or ?I 
am a Marylander.? Table 5.3.4 shows membership rates for these types of queries. 
 
General Self-Identification 
For general self-identification conditions, Table 5.3.4 shows that contacts who identify 
themselves have above-average membership rates. Contacts who use begin phrases with 
?as a? are the most relevant, with membership levels 13% above the average 27% rate. 
Figure 5.3.2 highlights contacts identifying themselves in first person have 6-7% higher 
membership rates than those identifying themselves in second person. See Appendix B, 
personal story reference Table 1 for regular expressions used to identify these conditions. 
  
115 
 
Table 5.3.4 Self-Identification and Membership 
condition term 
starts with As a 6,266  188,143 40% 26% 14% 13% 0.00 
contains As a 23,230  171,179 37% 26% 11% 10% 0.00 
phrase starts with As a 6,749  187,660 40% 26% 13% 13% 0.00 
starts with I am a 3,338  191,071 38% 27% 12% 11% 0.00 
contains I am a 5,692  188,717 38% 27% 11% 11% 0.00 
phrase starts with I am a 2,323  192,086 38% 27% 12% 11% 0.00 
starts with We are 3,110  191,299 35% 27% 8% 8% 0.00 
contains We are 13,271  181,138 34% 26% 8% 7% 0.00 
phrase starts with We are 6,754  187,655 33% 27% 7% 6% 0.00 
starts with We are a 591  193,818 36% 27% 9% 9% 0.00 
contains We are a 2,598  191,811 37% 27% 10% 10% 0.00 
phrase starts with We are a 250  194,159 32% 27% 5% 5% 0.09 
 
 
Figure 5.3.2 Self-Identification and Membership 
This plot shows data from Table 5.3.4 for ?phrase starts with? conditions. 
37% membership rate for 15,779 total contacts (15,779 = 6,749 + 3,338 + 5,692, where all 2,323 
messages containing ?we are a? also contain ?we are.?) Membership rates are percentages of 
members for groups of contacts. 
  
116 
n|c 
 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
Gender Self-Identification 
For gender, Figure 5.3.3 shows small numbers of contacts identify with male terms (56) 
and female terms (150). The low difference between the observed and expected 
membership rates for females along with the low number of contacts yields a more 
modestly significant chi-squared test p-value of 0.05 compared to the male p-value, 0.01. 
Contacts who state their gender have higher than average membership rates (45% males 
and 34% females), but not many contacts do so (150 + 56 = 256). See Appendix B, 
personal story reference Table 2 for the regular expressions used to find contacts who 
identify themselves as male or female. 
 
 
Figure 5.3.3 Self Gender Identification and Membership 
Membership rates are percentages of members for groups of contacts. 
 
  
117 
 
Residence Self-Identification 
For self-identification of residency, Table 5.3.5 shows higher than average membership 
rates, up to 41%, for contacts stating they live in a place in the first-person, with a good 
number of results for phrases that contain ?I live? (2,819), start with ?I live? (1,784), 
contain ?I live in? (1,465) and start with ?I live in? (1,131). ?We call home? phrases were 
not detected enough to call membership rate differences significant. The last row of the 
table tests a more complex condition for several identifications of ?living? by the MySQL 
expression, ?REGEXP '(I( went| went to| am|\'m| will| will be| was| have| have been|)( go 
to| going| going to|)) (live|living)',? which looks for several first-person singular patterns 
described in the introduction of this chapter, successfully increasing the number of 
matching results while limited false positive detection rates. Figure 5.3.4. plots data in 
Table 5.3.5 for ?phrase starts with? conditions and the complex expression for first-
person singular identification of living. See Appendix B, personal story reference Table 3 
for regular expressions used to find contacts who identify themselves as living in a place. 
Table 5.3.5 Residence and Membership 
condition Term 
contains I live 2,819 191,590 39% 27% 12% 12% 0.00 
phrase starts with I live 1,784 192,625 39% 27% 12% 12% 0.00 
contains I live in 1,465 192,944 41% 27% 14% 14% 0.00 
phrase starts with I live in 1,131 193,278 40% 27% 13% 13% 0.00 
contains We live 1,445 192,964 32% 27% 5% 5% 0.00 
phrase starts with We live 359 194,050 36% 27% 10% 10% 0.00 
contains We live in 639 193,770 34% 27% 7% 7% 0.00 
phrase starts with We live in 206 194,203 39% 27% 12% 12% 0.00 
contains We call home 100 194,309 31% 27% 4% 4% 0.36 
- -
phrase starts with We call home 2 194,407 0% 27% 27% 27% 0.39 
First-Person Singular 
complex Live/Lived/Living 3,178 191,231 39% 27% 12% 12% 0.00 
118 
n|c 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
 
Figure 5.3.4 Residence and Membership 
This plot shows data from Table 5.3.5 for ?phrase starts with? conditions. 
39% membership rate for 3,737 total contacts 
Membership rates are percentages of members for groups of contacts. 
 
Family Role Self-Identification 
Contacts identifying themselves as spouses, parents, grandparents, children, siblings, 
aunts, and uncles, identified with several, long MySQL expressions, like, ?REGEXP '(I 
am|I\'m|I was|I have been|I will be) (a|an|the) ([a-z]+ 
|)(grandma|grandmother|grandpa|grandfather)'? return good membership results, but low 
conditional group sizes. All chi-squared test p-values for these searches are relatively 
high compared to other tests in this exploration, with the exception of the test for self-
identification as a ?son, daughter, child, or kid.? That test returns 216 matching contacts 
with a 44% membership rate (X2 = 34 for k=1 and n=216; p < 0.01). See Appendix B, 
personal story reference Table 2 for regular expressions used to identify these conditions. 
119 
 
Education Self-Identification 
The test for contacts identifying themselves as students, graduates, or teachers ? found 
with the MySQL expression, ?REGEXP '(I am|I\'m|I was|I have been|I will be) (a|an|the) 
([a-z]+ |)(college|student|phd|master\'s|master of|doctor 
of|graduate|professor|ta|teacher|highschool|elementary school|preschool|pre-school|higher 
education|research)? ? yield only 193 results, a membership rate of 28%, only one 
percent greater than the average, and a chi-squared test p-value of 0.74 indicating an 
insignificant relationship. Alternatively contacts who identify themselves as teachers, not 
through declaration such as ?I am? and ?I?m,? but through verbs such as ?teach? and 
variations of ?teach,? return a higher membership of 38%, but a similarly low number of 
results (185 contacts). The chi-squared test p-value for the verb test is significant (X2 = 
12 for k=1 and n=185; p < 0.01). See Appendix B, personal story reference Tables 2 and 
3 for regular expressions used to identify these conditions. 
Working and Occupation Self-Identification 
Table 5.3.6 show results from looking for contacts who explicitly name themselves with 
specific words in a similar way that family roles were identified, above. It also shows 
results for identifying contacts through verb use, in a similar way that teachers were 
identified with ?teach? verbs, above. Although the words ending in ?ist? and ?tor? 
generally find contacts working in professional fields, this analysis is in no way 
comprehensive. The occupation taxonomy from the Bureau of Labor Statistics could 
greatly improve this exploration in future work 
(https://www.bls.gov/oes/current/oes_stru.htm). See Appendix B, personal story 
reference Tables 2 and 3 for regular expressions used to identify these conditions. 
120 
 
 Combined with a taxonomy of occupations, work and occupation searches could 
aid organizations in immediately organizing letter-writing campaign participants. For 
example, a chatbot interacting with a person who self-describes themselves in a message, 
?I?m a hydrologist at ? and I support expanding the Rainscapes program in Montgomery 
County,? could ask that person whether the bot?s controlling organization could send the 
author?s message with other scientists? messages to representatives together. With a 
positive reply, the bot could then ask if the person would like to join an online group of 
concerned scientists who have written to support the Rainscapes program, or flag the 
person for a follow-up call with a legal action team looking for testimony. 
  
121 
 
Table 5.3.6 Work, Occupation, and Membership 
identification root-term 
self *ist, doctor, nurse 386 194,023 41% 27% 14% 14% 0.00 
self *ist 345 194,064 41% 27% 14% 14% 0.00 
self *tor 85 194,324 47% 27% 20% 20% 0.00 
self *or 749 193,660 39% 27% 12% 12% 0.00 
self *er 2,076 192,333 39% 27% 13% 12% 0.00 
self doctor, nurse 61 194,348 49% 27% 22% 22% 0.00 
self lawyer, judge 7 194,402 29% 27% 2% 2% 0.92 
self engineer 34 194,375 26% 27% 0% 0% 0.95 
verb work 898 193,511 38% 27% 11% 11% 0.00 
verb program 6 194,403 67% 27% 40% 40% 0.03 
verb analyz 3 194,406 0% 27% -27% -27% 0.29 
 
 
 
 
 
Figure 5.3.5 Work, Occupation, and Membership 
Membership rates are percentages of members for groups of contacts. 
 
  
122 
n|c 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
Activism, Volunteering, Voting, and Spending Self-Identification 
In addition to the key phrases that identify personal stories listed at the beginning of this 
exploration section, nonprofit organizations are interested in voters and contributors to 
causes. Reading through search results for personal stories reveals evidence of past 
activism in addition to personal stories. Table 5.3.7 shows membership rates for contacts 
who have used verbs ?volunteer,? ?join,? ?protect,? ?guard,? ?save,? ?fight,? and ?spend? 
with first-person singular ?I? words, tested with expressions like those used for tests of 
?teaching,? explained above. Results show limited numbers of matching results and 
modestly significant p-values. Even though contacts discuss ?spending? in higher 
numbers then contacts discuss ?volunteering? and ?joining? combined, the latter two 
metrics reveal much higher, and significant membership rates (52% and 44%). See 
Appendix B, personal story reference Table 3 for regular expressions used to identify 
these conditions. 
Table 5.3.7 Activism Verbs Used in the First-Person 
root term(s) 
volunteer 66 194,343 52% 27% 25% 25% 0.00 
join 63 194,346 44% 27% 18% 18% 0.00 
protect, guard, save, saving, fight, fought 217 194,192 35.02% 26.90% 8.12% 8.11% 0.01 
spend 114 194,295 32% 27% 5% 5% 0.26 
 
  
123 
n|c 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
Outdoor Appreciation Self-Identification 
Advocacy organizations like The Audubon Society, the Sierra Club, and other 
organizations give their members access to outdoor program and events with 
membership. Table 5.3.8 shows that a limited search for outdoor verbs shows campers, 
hikers, and walkers have significantly higher levels of membership than the average 
despite their moderately low regular expression matching rates. See Appendix B, 
personal story reference Table 3 for regular expressions used to identify these conditions. 
Table 5.3.8 Outdoor Verbs 
49% membership rate for 682 total contacts 
root term(s) 
camp 54 194,355 44% 27% 18% 18% 0.00 
hike, hiking 218 194,191 54% 27% 27% 27% 0.00 
trek 1 194,408 0% 27% -27% -27% 0.54 
climb 8 194,401 63% 27% 36% 36% 0.02 
ski 4 194,405 25% 27% -2% -2% 0.93 
hunt, fish 25 194,384 32% 27% 5% 5% 0.57 
bike, biking, cycl 6 194,403 50% 27% 23% 23% 0.20 
hike, hiking, walk 325 194,084 48% 27% 21% 21% 0.00 
swim, swam 19 194,390 47% 27% 20% 20% 0.04 
ride, riding, rode 22 194,387 41% 27% 14% 14% 0.14 
 
Suffering Self-Identification 
Words of suffering surfaced in reading through personal stories identified by the previous 
searches. They uncover lived experiences that negatively impact message writers? lives. 
Stem verbs, including ?suffer,? depriv,? ?die,? ?dying,? ?hurt,? ?curs,? ?broke,? ?break,? 
?lost,? ?lose,? ?endur,? and ?I will go through,? all return small numbers of results with 
p-values greater than 0.01 (mostly insignificant). The test for the presence of base 
?suffer? verbs has the greatest matching number of results among these tests (138 
contacts) with 34% membership rate (a small 7% above the average). The test for 
?endur? yields only five contacts, but three of them are members (60% membership rate, 
124 
n|c 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
p=0.10). See Appendix B, personal story reference Table 3 for regular expressions used 
to identify these conditions. 
 Swear Words 
While phrases derived from searches of personal stories find higher membership rates, 
swear words find lower membership rates. Looking purely for the presence of three four-
letter swear words, along with the presence of any swear word reported by the LIWC 
?swear? dimension, reveals contacts who swear are less likely to pay for membership. 
Table 5.3.9 shows the results. The 260 contacts who begin messages with the first swear 
word have very low membership rates (11%). Individual swear word conditions yield 
chi-square test result p-values lower than 0.01, and all exhibit membership rates lower 
than those found by the LIWC swear word test (23%). See Appendix B, personal story 
reference Table 4 for regular expressions used to identify these conditions. 
Table 5.3.9 Swear Words and Membership 
Condition Term 
Starts with F Swear Word 260 194,149 11% 27% -16% -16% 0.00 
Contains F Swear Word 946 193,463 15% 27% -12% -12% 0.00 
Starts with D Swear Word 30 194,379 20% 27% -7% -7% 0.39 
Contains D Swear Word 925 193,484 26% 27% -1% -1% 0.37 
Starts with S Swear Word 8 194,401 25% 27% -2% -2% 0.90 
Contains S Swear Word 728 193,681 20% 27% -7% -7% 0.00 
Contains a LIWC Swear Word 8667 185,742 23% 27% -4% -4% 0.00 
 
 
Appendix B describes all personal story queries and includes a reference of all the 
database LIKE and REGEX conditions that this chapter uses. 
  
125 
n|c 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
 Exploration Four: Flesch Reading Ease 
Exploration Four tests if members that write more words per sentence and more syllables 
per word, according to the Flesch reading ease score (206.835 - 1.035 * words/sentences 
- 84.6 * syllables/words), have significantly different membership levels than the 27% 
average membership rate for contacts who sent personal messages. Tests consider 
minimum (Figure 5.4.1 and Figure 5.4.2), average (Figure 5.4.3 and Figure 5.4.4), and 
maximum (Figure 5.4.5 and Figure 5.4.6) Flesch scores per contact. The tests of 
minimum Flesch reading ease scores highlight the most difficult-to-read messages that 
each contact has written. The tests of maximum Flesch reading scores highlight the 
opposite ? the simplest-to-read messages that contacts have written. The tests of 
minimum scores show significant differences in membership rates. The tests of maximum 
scores show no differences. 
Figure 5.4.1 (membership rate) and Figure 5.4.2 (conditional group size) show 
that membership rates increase with minimum Flesch ease of reading scores from a 
below average membership rate of 16% (minimum Flesch score > 100; 4th grade and 
lower reading level; <10% below the overall average score of 27%) to an above average 
score of 37% (minimum Flesch score ? 30; college graduate reading level; >10% above 
the overall average score of 27%). All categorical chi-squared tests for these minimum 
Flesch score conditions have p-values of less than 0.001; they are significant. 
126 
 
 
Figure 5.4.1 Membership Rates for Groups of Contacts with Minimum Flesch Reading Ease 
Scores 
Membership rates are percentages of members for groups of contacts. 
 
Figure 5.4.2 Group Size (Number of Contacts) for Groups of Contacts with Minimum 
Flesch Reading Ease Scores 
 
  
127 
 
Figure 5.4.3 (membership rate) and Figure 5.4.4 (conditional group size) show that the 
membership rates increase with average Flesch ease of reading scores from a below 
average membership rate of 17.2% (average Flesch score > 100; 4th grade and lower ease 
of reading level) to a slightly above average score of 31% (average Flesch score ? 30 & ? 
50; college reading level). All categorical chi-squared tests for average Flesch score 
conditions have p-values of less than 0.01 except for the test where the score is greater 
than 70 and less than or equal to 80 (7th grade reading level). The p-value for that test is 
0.01. All tests, therefore, are significant, and contacts who write text at the two lowest 
reading levels (highest scores) have differences in membership rates from their 
alternative conditions of -10.36% and -7.70%. 
  
128 
 
 
Figure 5.4.3 Membership Rates for Groups of Contacts with Average Flesch Reading Ease 
Scores 
Membership rates are percentages of members for groups of contacts. 
 
Figure 5.4.4 Group Size (Number of Contacts) for Groups of Contacts with Average Flesch 
Reading Ease Scores 
 
129 
 
Figure 5.4.5 (membership rate) and Figure 5.4.6 (conditional group size) show that the 
maximum membership rates between groups of contacts defined by their maximum 
Flesch scores (the simplest messages that contacts have written) are almost 
indistinguishable from each other, and very close to the average total membership rate for 
contacts who sent personal messages: 27% membership. Categorical chi-squared tests for 
Flesch score conditional scores of >100, >90 and ? 100, >80 and ? 90, >60 and ? 70, >30 
and ? 50, and ? 30 have respective p-values of 0.00, 0.10, 0.65, 0.02, 0.16, 0.03, 0.73, 
0.16 and respective membership rate differences from opposite conditions of 1.22%, 
0.47%, 0.10%, -0.52%, -0.34%, -0.63%, 0.10%, -0.60%. Differences are, therefore, small 
or insignificant ? and most of the time both ? for this test. 
  
130 
 
 
Figure 5.4.5 Membership Rates for Groups of Contacts with Maximum Flesch Reading 
Ease Scores 
Membership rates are percentages of members for groups of contacts. 
 
 
Figure 5.4.6 Group Size (Number of Contacts) for Groups of Contacts with Average Flesch 
Reading Ease Scores 
 
131 
 
In summary, minimum score Flesch tests are more revealing than maximum score Flesch 
tests. They expose the most difficult-to-read (high grade level) passages that a single 
contact has ever written. Those scores impact membership. Maximum scores, and 
therefore average scores to a lesser extent, are less revealing. If a contact writes two 
messages and one is short and sweet (easy to understand), but the other is complex, the 
complex message can tell an organization more about its author?s potential to pay 
membership dues than the simple message. 
  
132 
 
 Exploration Five: Sentiment 
The compilation of all LIWC scores revealed, by accident, relationships between 
membership and swear words, positive words, and negative words in long messages and 
for contacts who sent many messages. This exploration checks for relationships between 
membership and sentiment. It calculates VADER sentiment scores for each message, and 
then calculates lumped average, minimum, and maximum scores for each person. It then 
calculates membership rates for VADER sentiment scores below the range of [0,-0.95] 
and above the range of [0,0.95] where scores below -0.05 are considered negative and 
scores above 0.05 are considered positive (Hutto and Gilbert 2014; 
https://github.com/cjhutto/vaderSentiment#about-the-scoring). Results are shown for the 
minimum, average, and maximum lumped scores per contact in Figure 5.5.1, Figure 
5.5.2, and Figure 5.5.3. Figure 5.5.4 compares the results in a single plot. 
In chi-square tests for two-by-two contingency tables of members and non-
members for each VADER condition tested, p-values were all less than 0.01 except when 
the average compound VADER score was less than or equal to 0.6, 0.65, 0.7, 0.8, and 
0.95 and the minimum VADER score was greater than 0.95. For maximum compound 
VADER scores, group sizes ranged from 651 to 147,540 as shown in Figure 5.5.5. Figure 
5.5.6 shows the difference between the membership rates for maximum compound 
VADER score conditions and their alternatives conditions. 
 
133 
 
 
Figure 5.5.1 Membership Rates for Contact Minimum VADER Scores 
Membership rates are percentages of members for groups of contacts. 
 
Figure 5.5.2 Membership Rates for Contact Average VADER Scores 
Membership rates are percentages of members for groups of contacts. 
134 
 
 
Figure 5.5.3 Membership Rates for Contact Average VADER Scores 
(Min 12%; Max 38%; 18% for score ? 0.05; 33% for score ? 0.05 ) 
Membership rates are percentages of members for groups of contacts. 
 
 
Figure 5.5.4 Membership Rates for VADER Scores (Average, Min, and Max) 
Membership rates are percentages of members for groups of contacts. 
135 
 
 
Figure 5.5.5 Group Sizes for Max Compound VADER Score Conditions 
(n | total = 194,409) 
 
Figure 5.5.6 The Difference Between the Membership Rates for Maximum Compound 
VADER Score Conditions and Their Alternative Conditions 
Membership rates are percentages of members for groups of contacts. 
 
  
136 
 
Table 5.5.1 compares members and non-members for average compound VADER scores. 
It shows that average membership rate for contacts who write messages with positive 
sentiment is close to the average membership rate overall. The membership rate for 
contacts who write with increasingly negative average sentiment (lower compound 
sentiment scores), however, decreases. Table 5.5.2 shows messages from six contacts, 
selected at random, for positive and negative average scores (within 0.1 of the negative 
and positive sentiment ratings equal to -0.80, -0.5, -0.05, 0.05, 0.50, and 0.80). 
 
Table 5.5.1 Membership Rates and Group Sizes for Contacts Grouped by VADER 
Sentiment Scores 
Average VADER  
Compound Score Group Size Membership Rate 
?0.05 109,765  30% 
? 0.10 104,939  30% 
? 0.50 53,700  28% 
? 0.80 18,525  26% 
? -0.05 61,999  22% 
? -0.10 57,830  22% 
? -0.50 26,399  18% 
? -0.80 74,444  17% 
 
  
137 
 
Table 5.5.2 Example Messages with Positive and Negative Sentiment 
Average VADER 
Contact  
Compound Score Messages 
A  0.889 Message 1/1: I will be going green and buying clean 
energy for my family this year. Please don't pollute our 
environment further than you have. Be smart, and 
invest in our future, not wall-street. 
(Member) 
B  0.516 Message 1/2: Our world should not be sacrificed for 
higher profits for the fossil fuel industry. Let?s put 
Virginia's people first! Our children's future can't be for 
sale - for any amount of money! 
Message 2/2: Do the correct thing! Forests are 
irreplaceable. North American forests are one of those 
forests. These lands need to be protected for all 
humanity. 
(Non-Member) 
C  0.05 Message 1/1: Close down businesses like Monsanto 
who are helping to destroy our land. 
(Non-Member) 
D  -0.06 Message 1/1: Any nonrenewable project here would 
be fool hardy when we know about the emissions that 
would be released. 
(Non-Member) 
E  -0.54 Message 1/1: Fracking is hazardous and dangerous to 
the water we drink and the air we breathe. Gas is no 
longer a sustainable option. We must switch to wind 
power and safe energy sources or we will suffer great 
these bad choices! 
(Non-Member) 
F  -0.80 Message 1/2: You've f***ed up our world with your 
dishonesty and greed 
Message 2/3: When will it stop? It is tragic that life, 
plants, animals, seabirds, and the source of life to 
millions of global citizens are vanishing. We are done 
with your greed in enacting this destructive legislation. 
Please start caring. 
(Non-Member; words censored for this table) 
 
  
138 
 
 Exploration Six: Top Words 
A purely exploratory test shows the 50 most popular words among all messages, 
scrubbed for stop words (NLTK) and American Standard Code for Information 
Interchange (ASCII) punctuation characters, have greater than average membership rates 
ranging from 29% to 40%, with an average membership rate of 30%. These rates are 
comparable to some of the best rates found from more-rationally searching for terms 
related to personal stories in Exploration Three, above. The words detect contact group 
sizes satisfying their conditions of between 16,215 and 61,594 contacts. With high group 
sizes, chi-square test p-values are all lower than 0.01 for each test (significant). Table 
5.6.1 shows results from testing the top 50 words on membership rates. 
Looking individually at the top 5,000 most-used words, the highest membership 
rate, for searches returning more than 1,000 results and significant chi squared test p-
values greater than 0.01 is 50% for both the words ?greenhouse? (1,059th most popular 
word) and ?efficiency? (868th most popular word). Both are subject matter words. The 
alternative membership rates for these conditions (m|~c) are both equal to the average 
membership rate (27%). Conversely, the lowest significant (p<0.01, n>1,000) 
membership rate is 17%, for contacts who have used the word ?impeach? (1,692). 
Interestingly, the four letter swear words and other negative words appear alongside this 
term. 
Finally, note that an early miscalculation revealed the mis-spelling for the word 
?don?t?? as ?dont? without an apostrophe has a negative, significantly below-average 
membership rate of 19% (X2 = 28 for k=1 and n=869; p < 0.01). Future work might study 
misspelled words as a negative predictor of engagement. 
  
139 
 
Table 5.6.1 Popular Words and Membership 
Term 
please 61,594 132,815 33% 24% 9% 6% 0.00 
people 35,540 158,869 33% 26% 7% 6% 0.00 
need 42,658 151,751 33% 25% 8% 6% 0.00 
protect 47,671 146,738 35% 24% 11% 8% 0.00 
clean 34,123 160,286 36% 25% 11% 9% 0.00 
us 96,674 97,735 31% 23% 8% 4% 0.00 
stop 31,377 163,032 30% 26% 3% 3% 0.00 
don't 29,748 164,661 32% 26% 7% 6% 0.00 
future 31,986 162,423 36% 25% 10% 9% 0.00 
environment 38,524 155,885 36% 25% 11% 9% 0.00 
energy 25,979 168,430 37% 25% 12% 10% 0.00 
planet 23,449 170,960 31% 26% 5% 5% 0.00 
water 26,025 168,384 34% 26% 9% 7% 0.00 
oil 23,057 171,352 37% 26% 11% 10% 0.00 
thank 24,754 169,655 37% 25% 12% 10% 0.00 
air 24,813 169,596 37% 25% 12% 11% 0.00 
would 22,253 172,156 35% 26% 9% 8% 0.00 
trump 21,345 173,064 29% 27% 2% 2% 0.00 
right 25,732 168,677 34% 26% 8% 7% 0.00 
children 25,370 169,039 36% 26% 10% 9% 0.00 
want 23,091 171,318 34% 26% 8% 7% 0.00 
must 19,557 174,852 36% 26% 10% 9% 0.00 
country 21,164 173,245 34% 26% 8% 7% 0.00 
public 23,580 170,829 38% 25% 13% 11% 0.00 
lands 20,700 173,709 39% 26% 13% 12% 0.00 
health 24,041 170,368 39% 25% 14% 12% 0.00 
make 23,925 170,484 35% 26% 9% 8% 0.00 
time 22,309 172,100 36% 26% 10% 9% 0.00 
money 17,087 177,322 32% 26% 6% 5% 0.00 
world 18,485 175,924 33% 26% 7% 6% 0.00 
keep 20,832 173,577 35% 26% 10% 9% 0.00 
one 52,777 141,632 31% 25% 6% 4% 0.00 
earth 16,042 178,367 31% 27% 4% 4% 0.00 
national 17,270 177,139 40% 26% 14% 13% 0.00 
land 33,693 160,716 35% 25% 10% 9% 0.00 
generations 17,183 177,226 36% 26% 10% 9% 0.00 
like 16,511 177,898 34% 26% 7% 7% 0.00 
drilling 15,391 179,018 37% 26% 11% 10% 0.00 
life 26,682 167,727 33% 26% 7% 6% 0.00 
take 21,673 172,736 33% 26% 7% 6% 0.00 
climate 14,845 179,564 39% 26% 13% 12% 0.00 
many 15,687 178,722 36% 26% 10% 9% 0.00 
get 21,532 172,877 33% 26% 7% 6% 0.00 
know 17,732 176,677 35% 26% 9% 8% 0.00 
wildlife 13,146 181,263 34% 26% 8% 7% 0.00 
change 16,311 178,098 38% 26% 12% 11% 0.00 
thing 33,600 160,809 32% 26% 6% 5% 0.00 
think 16,596 177,813 34% 26% 8% 7% 0.00 
american 20,656 173,753 36% 26% 10% 9% 0.00 
care 16,215 178,194 33% 26% 6% 6% 0.00 
140 
n|c 
n|~c 
m|c 
m|~c 
m|c-m|~c 
m|c-m 
p 
 
 Exploration Seven: LIWC Scores and Membership 
This exploration reviews relationships between membership and LIWC scores for 
pronouns and other LIWC dimensions. 
5.7.1. Pronoun Exceedance Tests 
Exceedance tests identify contacts who have ever written a message with a score 
exceeding (i.e. above) a threshold. Alternative tests identify contacts who have never 
written a message that has exceeded a threshold. Contacts, however, can send more than 
one message, so alternative exceedance tests are not the same as non-exceedance tests. 
They may hint at results to them with increasing thresholds and linguistic consistency 
between messages written by the same contact. Non-exceedance tests identify contacts 
who have ever written a message not exceeding (i.e. below) a threshold. For example, a 
contact who writes two messages with scores of one and three satisfies an exceedance test 
for a threshold of two; three is greater than two. Because they satisfy the exceedance test, 
they do not satisfy the alternative exceedance test: They have not never sent messages 
with scores above two. They satisfy, however, the non-exceedance test; one is less than 
two. As the threshold increases, in this case to four, alternative exceedance and non-
exceedance test results match; one and three are both less than four. 
5.7.2. Pronoun Exceedance Test Results 
Membership rates shown in Figure 5.7.1 change in small amounts as LIWC pronoun 
score exceedance thresholds increase from 0% to 10%. Membership rate ranges equal 
0%, 1%, 2%, 2%, 1%, 3%, 5%, 3%, and 2% for respective pronoun, personal pronoun, 
?I,? ?we,? ?you,? ?she/he,? ?they,? and impersonal pronoun conditions.  Membership 
rates change the most (5% from 29% to 23%) as ?she/he? rates increase. Some 
141 
 
membership rates for she/he pronoun conditions, however, are insignificant in 
comparison to alternative conditions due to the low use of the she/he pronouns. Chi-
squared tests comparing observed and expected values of groups of members satisfying 
minimum LIWC score exceedance conditions yield p-values less than 0.01 (significant) 
except for tests of membership for ?he/she? pronoun rates >4%, >5%, and >6%. Tests of 
membership rates for ?he/she? pronoun rates >7%, >8%, >9%, and >10% are all 
significant, but catch low numbers of members (2% to 4% of all message writers). 
Overall, membership rate differences are small compared to those found in prior analysis. 
The alternative exceedance tests shown in Figure 5.7.2 shows the presence of any 
pronoun (from the pronoun LIWC dimension) is more revealing than the use of any 
particular pronoun (e.g. from the ?I? dimensions) in two ways: (a) comparing Figure 
5.7.1 to Figure 5.7.2, membership rates are higher for groups who have ever exceeded 
thresholds and (b) membership rates drop from 28% to 19% for those who have not used 
any pronouns at all. The non-exceedance test (not shown) is not able to show this drop; 
contacts that send messages with no pronouns (pronoun rate = 0) still send messages. 
Membership rate ranges, like those shown in the exceedance tests, are all small for 
component pronoun tests. They approach the average personal message membership rate 
of 27% as test conditions identify increasing numbers of contacts that never send 
messages with scores above increasing thresholds. Word count tests show similar results 
to rate tests (Figure 5.7.3 and Figure 5.7.4). They highlight the effects of less frequently 
used pronouns (e.g. they). 
142 
 
 
Figure 5.7.1 Membership Rates for Exceedance Conditions 
Membership rates change in small amounts. 
Membership rates are percentages of members for groups of contacts. 
 
 
Figure 5.7.2 Membership Rates for Alternative Exceedance Conditions 
Membership rates are percentages of members for groups of contacts. 
 
143 
 
 
Figure 5.7.3 Membership Rates for Minimum LIWC Scores 
Membership rates are percentages of members for groups of contacts. 
 
 
 
Figure 5.7.4 Membership Rates for Alternative Minimum LIWC Scores (Maximums) 
Membership rates are percentages of members for groups of contacts. 
  
144 
 
5.7.3. Other Notable LIWC Dimensions: Swear Words, Punctuation, Nonfluencies, 
Family, and Friends 
Among membership tests for each LIWC dimension, only the test for swear words, as 
shown in exploration three, yields a below-average membership rate (23%). The 
membership rate for contacts who do not use swear words equals the average 
membership rate for contacts who write personal messages (27%). 
The presence of five LIWC dimensions yield membership rates 10% or greater 
than the average rate. Contacts who use nonfluencies (written out as ?err,? ?hrm,? ?eh,? 
etc.), parentheses, dashes, semicolons, and colons have respective membership rates of 
37%, 37%, 37%, 37%, and 39%. The tests for the alternative conditions yield slightly 
below average and average membership rates (26%, 26%, 25%, 27%, and 27%, 
respectively). Contacts who use the more common punctuation (periods, commas, 
question marks, and exclamation marks) all have above-average membership rates, but 
only by a 2 to 5% increase (3% average increase). The test for quotes yields a 36% 
membership rate. Members use more punctuation than non-members. 
LIWC categorizes swear words and nonfluencies as informal speech dimensions. 
There are three more categories in the informal speech group, and they show moderate to 
high membership rates: netspeak (e.g. btw, lol; 31%, n=6286), assent words (agree, OK, 
yes; 34%, n=8,222), and filler words (e.g. Imean, youknow; 35%, n=740). 
Finally, supporting results found in looking for personal stories with family 
phrases, two dimensions among the LIWC social sub-processes have 35% membership 
rates: family (e.g. husband, daughter; 27,185 matches) and friends (e.g. buddy, neighbor; 
11,334 matches). Compared to the combined test for first-person references to specific 
145 
 
family members (Figure 4.3.1; 37% membership for 20,001 contacts), the LIWC family 
test returns more contacts, slightly lower membership rates, and fewer personal stories. 
In a random sample of ten messages that include LIWC family words, three of the 
messages are personal stories and seven are not. All of the personal stories contain family 
words prefixed the possessive first-person plural pronoun, ?my:? 
1. The base of the Berryessa Snow Mountains was my home for many years and I 
want this monument preserved for my children and grandchildren. It is a 
majestic?. 
2. My family used to swim and fish along the Anacostia River in the 50s. Please?. 
3. My husband and I are both employed by wind energy providers and?. 
The regular expression search for first-person pronouns followed by family words is 
more specific than the identification of LIWC family words, but it would not miss any of 
the stories found by the LIWC search in the random sample of ten messages. Three of the 
messages in this sample do not contain stories (false positives). The regular expression 
would correctly classify them as not stories. The three messages contain references to (1) 
?big brother,? (2) ?mother earth,? and (3) ?your children? but do not describe any lived 
experiences. 
  
146 
 
 DISCUSSION 
 Objective One Discussion: Messages per Contact as a Measure of 
Organizational Engagement 
6.1.1. Pronouns and Messages per Person 
Consistent with prior studies (e.g. Pennebaker 2017, pp 63, 118; Lenard 2016), small 
differences (<1%) in the use of pronouns yield significant findings. Results show that 
groups of contacts who send personal messages with lower rates of pronouns overall, 
lower rates of personal pronouns overall, lower rates of first-person plural ?we? 
pronouns, and moderately greater rates of ?you? pronouns, also send more messages 
(Table 4.1.1). The decreasing use of ?we? words is the clearest individual pronoun 
predictor of increasing numbers of messages that groups of contacts send (R2 = 0.87). 
This could indicate contacts with positive, personal association (Pennebaker 2011) with 
their state or country (e.g. ?our country...? vs. ?make America...?) send fewer messages. 
A sentiment test on all messages moderately supports this theory. Messages with a high 
use words ?we? words exhibit a higher degree of positive sentiment (>5% ?we? words; 
0.21 VADER compound sentiment) than messages with a low rate of ?we? words (<3% 
?we? words; 0.09 VADER compound sentiment). 
While clear relationships exist between the central tendencies of LIWC pronoun 
rates for groups of contacts, they cannot be used to predict the number of messages that 
most individual contacts will send based on their first message. The observations and 
calculation checks made in the report of results for testing Hypothesis One (Section 
4.1.2) shows this is true for averaging rates in different ways and for different sets of 
messages. Further, Section 5.2 shows Hypothesis One cannot be accepted for individual 
147 
 
contacts, ungrouped, because most contacts do not send enough long messages. Like 
most social media posts, personally authored advocacy messages are short (Figure 
4.1.16). Unlike social media posts, many contacts write just one or two through systems 
controlled by a specific organization. This study looked briefly at these two properties ? 
length and quantity ? in calculating the correlation between the use of first-person plural 
?we? pronouns and the number of messages that contacts send: Table 4.1.3 and Figure 
4.1.17 show that the number of contacts sending messages is more important than the 
length of their messages in establishing correlations. (Future work could test if this 
relationship holds true for other LIWC dimensions.) In simpler terms, ranges of three to 
four percent usage of ?we? of pronouns, 13.6% to 14.4% usage for all pronouns, and 2% 
to 3% usage of ?you? words are small considering the average length of a personal 
messages is 29 words, and the mode length is 11 words (Figure 4.1.16). Four percent of 
11 is zero whole words. 
6.1.2. The Pronouns of Environmental Advocacy 
Given the relevance of groups of messages compared to individual messages, a 
comparison between the biggest group of messages in this study ? all messages ? with 
summaries of other corpuses of text provided by the LIWC manual defines a language of 
environmental advocacy. The other corpuses include tweets, blog posts, essays, news 
articles, and novels. Environmental activists use ?I? words at much lower rates (1.62% 
compared to 4.99%) and ?we? words at much higher rates (3.92% compared to 0.72%) 
than authors of text among the other corpuses (Figure 4.1.9). They use similar rates of 
?she/he? words as those found in tweets, which are much less than those found in longer 
passages of text. Like a parent talking to a child, this low use of ?I? words and high use 
148 
 
of ?we? words indicate environmental activists write to their policymakers from a 
seemingly higher social status (Pennebaker 2011, pp. 174). Supporting this theory, nine 
out of ten randomly selected messages that begin with the word ?we? use the word ?we? 
as the word ?you,? a policy maker, or as the words ?me and you.? In the message, ?we 
must avoid being the country responsible for unleashing the beast of climate change by 
monstrous policies that only benefit big oil and agriculture companies? the word ?we? 
refers first to the U.S. and then directly to the message recipient who, assumedly, can 
enact ?monstrous policies? or not. The low use of ?I? words and low use of ?she/he? 
words that all environmental advocates use do not indicate anything about social status, 
but may help explain the less clear trends for these pronouns identified in the test of 
Hypothesis One. 
Finally, the overall high use of ?we? words and the low use of ?I? indicates that 
either (a) there are more male contacts (Pennebaker 2011), or (b) female activists use a 
typically ?masculine? political vocabulary of personal pronouns ? also more typical of 
the language of modern female politicians ? when writing policymakers (Jones 2017). 
Interestingly, two out of three contacts are female, supporting the latter of these two 
theories. Future work could investigate what this means for the two engagement factors 
given males have higher overall membership rates (37%) compared to females (29%). 
6.1.3. Personal Messages Rates and Word Counts 
Hypothesis Two test results show that both the number of personal messages and the 
number of all messages decrease for groups of contacts sending increasingly large 
numbers of messages per contact. The number of messages sent also increases from 1 to 
10 (the bulk of the data) as the average rate of personal messages increases from 20% to 
149 
 
25% (left side of Figure 4.2.1). As the number of messages sent continues to increase, the 
rate decreases to 15% at the group that sent 40 messages, and then steadily increases to 
50% for groups of contacts who sent 90 messages (left side of Figure 4.2.1). 
In summary, groups of contacts who send personal messages at rates of 18% (one 
personal message for every five messages) are less likely to send a second message. 
Groups of contacts who do send more than one message usually send them with a 
personal message rate of 25% (one personal message for every four sent). These groups, 
however, will be smaller than the groups with lower personal message rates (Figure 
4.2.3). Advocacy organizations and policymakers evaluating an initial wave of messages 
from a specific group, therefore, can expect both continued action (a second message) 
and higher rates of personal messages if this initial wave of messages has a personal 
message rate of 25% or more. 
Advocacy campaign managers who value greater numbers of personal messages 
to look for personal stories in, and building relationships with contacts who send more of 
them, therefore, should not be discouraged by an overall lower number of messages in the 
response from a specific campaign compared to similar campaigns if the rate of personal 
messages returned from the campaign is high (25% vs. 18%). Results emphasize the 
importance of asking contacts to write personal messages, if not to help amplify their 
voices and identify personal stories, to at least help predict future engagement. 
Hypothesis Three test results show that while the number of messages sent 
roughly increases with slightly decreasing word counts for groups of contacts (29 to 26 
words; R2 = 0.69; data top-coded at 30+ messages), contacts sending more than one 
message send them at the overall average word count of 29 words compared to 28 words 
150 
 
for those who send a single message. This difference is small and insignificant for low 
numbers of messages. For large numbers of messages, it will be easier to discern one in 
five personal messages from one in four personal messages as shown above in 
comparison to discerning 28 words per message to 29 words per message. This one-word 
difference could also easily be affected the language of a specific campaign. Message 
analysts should not, therefore, use this metric to predict the number of messages a group 
may send in the future without testing the metric across high rates of similar campaigns. 
 Objective Two Discussion: Exploring Membership, Personal Stories, 
Sentiment, and Writing Simplicity 
The results from the three initial hypotheses inspired an exploration into membership as a 
measure of organizational engagement. Results show that the number of messages 
written, the use of pronouns, the identification of personal stories, sentiment, writing 
simplicity, the use of swear words, the use of punctuation, the use of popular words, and 
potentially the use of misspelled words can all help organizations identify membership 
rates. If the 90,698 contacts categorized as members pay an average of $52/year, 
campaigns receive $5.7M/year. If all 690,631 contacts paid this amount, campaigns 
would receive $35,912,812, more than double the budget of Greenpeace, the smallest 
environmental advocacy organization listed in Table 4.3.1. 
The membership rate for all contacts is 13%. The membership rate for contacts 
sending personal messages is 27%. As described in the introduction to Objective One, 
this study describes 5%, 10%, and 15% membership rate differences from the average 
27% rate, as moderate, strong, and very strong differences, respectively. Tests show: 
151 
 
1. Membership rates increase with message rates. 
Relating the two rates of engagement (messages and membership), membership 
rates more than double, from 16% to 35% for groups sending one to ten messages, 
before leveling off. Groups sending 20 or more messages have an average 
membership rate of 37%. 
2. Membership rates increase with average word count. 
Membership rates increase from 17% to 30% for the contacts who sent messages 
between one and 40 words long, before leveling off. Contacts who have sent 
messages with an average word count greater than 40 words have an average 
membership rate of 28%. 
3. Membership rates increase with certain words and phrases. 
Regular expression searches for personal stories with pronouns, verb variations, 
and LIWC scores return some stories of lived experiences, but they also identify 
authors in other ways. Membership rates increase with first-person pronouns and 
a. References to wives, e.g. ?my wife? (51%; 410 contacts) 
b. References to family members, overall (37%; 2,001 contacts) 
c. Identification with phrases that begin with ?As a,? (40%; 6,749 contacts) 
?I am a,? (38%; 3,338 contacts) ?We are? (33%; 5,692 contacts) and ?We 
are a? (32%; 2,323 contacts). 
d. Self-identification with the male gender, e.g. ?I am a father?? (45%, 56 
contacts) 
e. Self-identification with the female gender, e.g. ?I am a mother?;? (34%, 
150 contacts) 
152 
 
f. Self-identification of residence, e.g. ?I live?? (39%; 3,737 contacts) 
g. Self-identification as a family member, e.g. ?I will be a grandma?? (44%; 
216 contacts) 
h. Self-identification as a teacher with verbs, e.g. ?I teach? (38%; 185 
contacts), but not with titles (e.g. ?I?m a school teacher?) 
i. Self-identification with ?ist? roles, like ?scientist or biologist? (41%; 386 
contacts) and ?er? roles, like ?carpenter? or ?driver? (39%; 2,076 contacts) 
j. Volunteering verbs (52% membership rate; but only 66 matches; p < .01) 
k. Outdoor activity verbs, e.g. ?I have hiked? (49%; 682 contacts) 
5. Membership rates decreased with the use of swear words. 
Membership rates did not significantly increase or decrease for most words 
describing suffering, but they significantly and very strongly decreased for 
members using swear words, as low as 11% for 260 contacts beginning their 
messages with a word beginning with the letter ?F? and 15% for 946 contacts 
using that word in their message. Membership rates for the group of contacts 
using any LIWC swear word (swear rate > 0) decreased to 23%. 
6. Membership rates increase with writing grade-level (i.e. message complexity). 
Membership rates steadily increase from 16% (4th grade level) to 37% (college 
graduate) with decreasing minimum Flesch ease of reading scores (21% range). 
7. Membership rates increase with sentiment. 
Maximum compound VADER scores describe the most positive message a 
contact has sent. They are good indicator of membership (Figure 5.5.3; 12% to 
38%). Minimum and average VADER scores are less descriptive. Contacts with 
153 
 
negative maximum VADER scores (<-0.05) have an average membership rate of 
18%. Contacts with positive scores (>0.05) have 33% membership rates (15% 
range). 
8. Membership rates increase with popular, on-topic words like ?efficiency? and 
?greenhouse? and decrease with negative words. 
Among the top 5,000 subject words used in messages, the two words (a tie) used 
by at least 1,000 contacts with the highest levels of membership (50%) are 
?efficiency? and ?greenhouse.? The word used by at least 1,000 contacts with the 
lowest membership level (17%; n=1,692) is ?impeach? and is found among swear 
words not tested earlier with similarly low membership rates. 
9. Membership rates increase with the presence of any pronoun compared to no 
pronouns. 
Contacts who do not use pronouns at all have low membership rates (19%). For 
contacts who do you use them, individual pronouns rate increases reflect only 
small changes in membership rates. 
10. Membership rates increases with the use of nonfluencies (37%) and less used 
punctuation (38% for colons). 
11. The membership rate is low for contacts who misspell ?don?t? as ?dont? (19%). 
Results sketch a picture of a stereotypical member: An outdoorsy parent with a job and 
spouse that talks about their children. They write for an educated audience and use 
positive, issue-related words in sentences delineated with punctuation. They do not 
complain about impeachment or use swear words, but may informally write nonfluencies 
into their messages. 
154 
 
For one analyst, identifying personal stories in advocacy messages will help their 
organization ?be better set up to recognize what kinds of personal messages we are 
getting, and which have the best value for continued/increased engagement.? This study 
used regular expressions inspired by keywords that campaign managers use. It used first-
person phrases and looked for references to family, home, suffering, and personal 
interests. Matches showed that what makes a personal story ?personal? and a ?story? is 
subjective and a framework that could categorize and measure story attributes in short 
advocacy messages could be helpful. In conducting these searches, matches also revealed 
information about contacts that an advocacy organization or policy office might collect in 
a survey. Contacts reveal personal interests, professions, and family information in there 
stories. Self-written levels of education were only found in small numbers, but the 
writing complexity score and found occupations may hint at these levels. 
 Limitations and Two Database Gotchas 
This study found that compared to predictors of membership investigated in the 
exploration (Objective Two), pronoun predictors for the number of messages a contact 
sends has limited practical application for the initial problem that inspired this research 
? rapid response to a new contact with limited information. The length of most 
messages are too short to be studied individually with pronouns only. Additionally, this 
study (a) was not segmented by location or topic, (b) did not have access to a complete 
contact demographics, but it could have used state as a proxy indicator, (c) did not have 
exact location data so it did not address an originally proposed objective to test 
engagement and personal stories with the proximity to sources of pollution, and (d) did 
155 
 
not have access to political affiliation for any contacts. Future work and case studies 
could address these limitations. 
There are two database problems that all data analysts should watch out for and 
were found in this study: (1) Some raw database IDs for some campaigns were 
alphanumeric case-sensitive strings. In creating a contact table, a new auto increment 
primary key may be created to avoid this problem. This study used a case-sensitive field 
collation to address the problem. (2) Data from different organizations and different for 
different campaigns used different character encodings. A few points of data had quotes 
replaced by questions marks. After correction, LIWC analysis trends became slightly 
more definite. 
  
156 
 
 CONCLUSIONS AND FUTURE WORK 
 Text Analysis for Online Advocacy Organizations 
We stand now where two roads diverge. But unlike the roads in Robert Frost?s 
familiar poem, they are not equally fair. The road we have long been traveling is 
deceptively easy, a smooth superhighway on which we progress with great speed, 
but at its end lies disaster. The other fork of the road ? the one less traveled by 
? offers our last, our only chance to reach a destination that assures the 
preservation of the earth. 
 
? Rachael Carson, Silent Spring, 1962 
My message is that we?ll be watching you. This is all wrong. I shouldn?t be up 
here. I should be back in school on the other side of the ocean. Yet you all come to 
us young people for hope. How dare you. You have stolen my dreams and my 
childhood with your empty words. Yet I am one of the lucky ones. People are 
suffering. 
 
? Greta Thunberg, United Nations Climate Action Summit, 2019 
Carson paints a picture and provides efficacy to her readers to think and make decisions 
? readers without the internet and policymakers without fax machines. Thunberg is 
direct and angry, speaking like a hero in the golden age of distraction. VADER sentiment 
scores rate their respective quotes negative (-0.2) and more negative (-0.4) and Flesch 
scores rate them readable to 7th grade students (Flesch score of 77) and low-grade-level 
students (Flesch score of 98). 
Jones (2017), Lenard (2016), and Pennebaker (2011) would contend that Carson?s 
high use of first-person plural inclusive ?we? words represent a high social status and a 
?masculine,? authoritative linguistic style that female politicians have recently begun to 
adopt. They would say Thunberg?s high use of pronouns (one or more in almost every 
sentence), and especially her high use of the word ?I? reflect a female speaker, self-
focused and aware of the suffering of her generation. 
157 
 
These researchers have shown that the words and public speeches of leaders, 
candidates, and officials who people elect to represent their families and vote for their 
children?s future, are well studied, and a joy to analyze and read about. Philosophers, 
bloggers, and reporters study how these leaders speak to their constituents and each other 
and archive their words as history. What can researchers now learn about the words that 
activists speak back to power? How will organizations use this knowledge to empathize, 
ally, or manage them? 
? 
This dissertation was inspired by the successful development of an online advocacy 
system created for a small nonprofit organization in Maryland in the early 2000s. It 
helped the group of faith-based and union-backed organizers win living wage and 
healthcare legislation by filling state legislators? inboxes with customized form letters, 
properly addressed via a GIS-based zip code matching system. As petitions do, it also 
helped the organization recruit members and grow. ?Slacktivism? worked! But this form 
of activism turned from influencing policymakers to disengaging them (Miler 2014, 
Social Change Agency, 2017a, Congressional Management Foundation 2017) and the 
term ?slacktivism? was coined as such by Morozov in 2009. White, at this time, decried 
the ?ideology of marketing? in activism as ?clicktivism? (2010). Even so, this study and 
advocacy organizations listen to Karpf (2017, 2018), resolutely looking for the potential 
of analyzing and A/B testing everything. Results from this research show that 
environmental advocacy organizations should solicit and analyze personal messages from 
their constituents to both limit slacktivism ? that is, limit disengaging policymakers with 
impersonal messages ? and bolster their understanding of their contacts. In soliciting 
158 
 
personally written messages, in combination with services like Communicating with 
Congress (CWC, 2017), advocacy organizations can help keep policymakers from being 
inundated with form-letters. In analyzing personal messages, organizations can exploit 
and improve on the metrics reviewed in this study. 
 At minimum, organizations need to continue giving individuals the option of 
writing a personal message in online advocacy campaigns. If they are not already doing 
so, by starting they can begin to predict future behavior from their contacts? messages. 
Results show that the membership rate for those sending a personal message in this 
study?s data is 27%, compared to the overall 13% membership rate for those sending any 
type of message, personal or otherwise: more than double. Results also show that groups 
of most contacts who write personal messages at rates of higher than 18% (one in five), 
also send more than one message. Simply asking for and counting personal messages can 
help organizations establish baseline engagement predictions without any text analysis. 
Additionally, given that impersonal messages can disengage policymakers and bury the 
personal messages, organizations should also stop sending impersonal letters along with 
personal ones, or flag them in a way that systems like CWC can recognize them as 
petitions. Without a system like CWC that mitigates the risk of losing personal messages 
among others, organizations should hand-deliver signatures in batches or at strategic 
times to avoid disengagement with policy makers. 
 Once advocacy organizations are collecting personal messages, they should 
analyze their text to help them further predict the number of messages that groups of 
constituents will send and future payments for membership. Results from this study show 
analysts and algorithms can use text in two situations: (a) in analyzing and engaging large 
159 
 
groups of individuals, and (b) in response to a contact immediately after they have sent a 
message (i.e. the chatbot predicament). Results show that, in analyzing groups of 
contacts, low pronoun rates overall and low first-person plural ?we? pronoun rates 
indicate a group will be more likely to send more messages. For either analyzing groups 
of contacts or rapidly responding to a single contact online, the results also show that 
organizations should be able to more readily ask for membership contributions from 
contacts who have sent increasing numbers of messages and word counts approaching a 
threshold. The threshold for this study was 30 words ? one word above the average. It 
may vary between organizations and campaigns.  
 From within the text, to further identify potential members, advocacy 
organizations should look for messages written for higher reading levels (low reading 
ease scores), and use of positive sentiments, self-references, references to family and 
friends, punctuation, and informal speech aside from swear words. Organizations and 
campaigns are unique, so campaign managers should pilot the relevant engagement 
factors discussed above for their own data in order to reveal other trends. They may begin 
doing this by identifying and testing popular words. Message reviewers may use regular 
expressions or future machine learning models to help them identify personal stories, but 
these methods should not replace timely, human review of messages. Text metrics are not 
perfect, and they can be misused. 
 Test Analysis for Policymakers, Service Providers, and Stakeholder 
Managers 
In the same way that online advocacy organizations learn from electoral campaigns, but 
should not mimic them (Karpf 2017), policymakers should conduct the analysis 
160 
 
recommended above for advocacy organizations but adjust them to fit their situations. 
Given term limits in public offices, policymakers may not start their terms with large 
histories of constituent engagement. With smaller databases, they will not be able to run 
the pronoun use tests to anticipate future message frequencies. 
As policymakers build their CRM databases, they will also find themselves in the 
unique situation where they are receiving messages couriered by several advocacy 
organizations about a single policy or project. In this case, they will be able to use text 
metrics to spot and judge power differences between organizations. For advocacy 
organizations, Karpf (2018) emphasizes that data, in general, needs to be delivered in 
ways that decision makers can interact with. This is equally true for policymakers. To 
support policy makers and advocacy organizations alike, online advocacy service 
providers (e.g. CWC) need to build text metrics into their reports. 
Stakeholder management researchers, Kahn et al. 2017, have developed 
psychological attributes that they recommend civil and environmental project managers 
to look for in managing stakeholders: motivation and concern, expectation and 
perception, and attitude and behavior. These researchers share best practices for 
managing supportive, indifferent, and adversarial stakeholders (Kahn et al. 2019). In 
summarizing Petro-Canada?s website, their research praises Petro-Canada?s ?highly-rated 
. . . ?win-win? policy? of ?innovative and diverse strategy execution measures? for its 
?fair, ethical and professional approach in its dealings with its secondary stakeholders in 
all its projects and operations inside and outside Canada.? They highlight an example, 
originally shared by Petro-Canada, of how this fossil fuel company put a local fishing 
community at ease during exploration of drilling sites offshore of the Caribbean islands 
161 
 
of Trinidad and Tobago. The company conducted courses for the fishers on how they 
could learn new ?survival techniques? and safely continue using their equipment during 
the offshore exploration. At the courses, they gave away reflectors and GPS devices. 
They also installed ?fish aggregating devices? to keep fish away from their exploration. 
The researchers also show how Petro-Canada paid for First Nation social programs, like a 
daycare facility, before mining their land in Fort McMurray, Canada. Future studies 
could investigate if psychological attributes described by these researchers could be 
identified through text analysis of constituent messages. If so, in promoting 
environmentally sustainable technologies or not, policymakers could share findings with 
civil and environmental project managers, and they, together, could judge the power of 
influence that the stakeholders have over their projects and researchers could test the 
management frameworks introduced by Kahn et al. For example, stakeholder managers 
and policymakers working with (or for) companies like Petro-Canada, who implement 
education programs, hazard mitigation infrastructure, and social services to ensure safety 
and public acceptance of their projects, could benefit from analyzing advocacy messages. 
Opposition letters to offshore drilling and mining in communities written with relatively 
high rates of pronouns, high writing grade levels, and high numbers of personal stories 
about ?my? or ?our? children could help these managers plan increases to their 
community engagement budgets. If messages come from more than one environmental 
advocacy organization, text analyses could further aid stakeholder managers to determine 
which groups have the highest numbers of dues-paying members and could best fund 
putting the personal stories they are collecting into community forums, legal testimony, 
and advertisements. 
162 
 
 Future Work 
7.3.1. Engagement Framework and Investigating Relationships of Messages and 
Time Use Profiles 
Future work should consider past research to develop a social cognitive theory to 
describe what a regular dues-paying member is, what a person with a lived experience is, 
and what a volunteer is ? three roles that describe people that environmental advocacy 
organizations seek to engage. To start, it could use an online implementation of 
Arnstien?s ?ladder of citizen participation? (1969) as an engagement dimension. Next, it 
could use the Bureau of Labor Statistics (BLS) American Time Use Survey (2019) data 
to determine a dimension for volunteering. 
It should then see if relationships between text analysis metrics found in this study 
and additional data, like more granular membership contribution data and constituent 
event participation histories, can help explain the roles defined by the theory. A 
multivariate model could help predict how well contacts fit into these three roles. Given 
BLS data, for example, occupations reported by contacts in messages could help rate a 
contact along dimensions for volunteering and giving without asking contacts questions 
directly; BLS reports unemployed people volunteer twice as much (0.44 hrs/day) as 
employed people (0.21 hrs/day). Flesch scores, if tied to education and income, could 
help place a contact along a dimension for giving. 
This theory-based approach of making educated guesses of engagement predictors 
and then piloting them contrasts machine-learning approaches and the approach 
employed by Exploration Six in this study. Without any social or behavioral a priori 
observations or theory (or ?bias,? depending on how relevant observations and theories 
163 
 
are to specific situations), Exploration Six brutally tests thousands of the most popular 
words in this study in an attempt to find words that contacts use with minimum 
frequencies indicative of high and low membership rates. Model development should use 
both theoretical and exploratory approaches. Machine learning and unbiased exploration 
might confirm theories, or it may inspire additional theories and tests. Exploration Six 
and Exploration Seven in this study, for example, confirmed the relationships between 
negative words and membership studied in Exploration Three, exposed the relationships 
between nonfluencies and punctuation with membership, and inspired the review of all 
informal word dimensions categorized by LIWC. 
7.3.2. Doing What You Love or Marginalizing ?Lost Voices? 
In the development of any engagement model, as described above, advocacy 
organizations should be wary of focusing on any one measure of engagement at the 
expense of others. During the November 2019 Virginia state elections, non-partisan Get 
Out The Vote (GOTV) canvassers working with Virginians Organized for Interfaith 
Community Engagement (VOICE) were rewarded with more smiles, more residents 
willing to take publicity photos, fewer slammed doors, and fewer guard dogs, when 
canvassing in more affluent neighborhoods where more people were excited to vote or 
could be encouraged to register (pers. exper. 2019). Nall (2018) explores these behaviors, 
investigating situations where on-foot canvassers stay in the neighborhoods where they 
receive positive responses and where they have smaller social distances from residence 
(e.g. language). In response to these perceived successes, canvassers could inadvertently 
marginalize the people that are directly affected by issues that their organizations are 
addressing. Canvassers may miss testimony, miss the opportunities to ally with people 
164 
 
directly affected by issues, and miss opportunities to expand their campaign with new 
leaders. VOICE mitigated these pitfalls by pairing canvassers from different backgrounds 
and targeting districts with low voter turnouts. Future work could investigate if similar 
problems are present in online advocacy campaigns. Nall states ?online mobilization 
presents one challenge to our way of describing the canvass.? Results from this study 
show higher rates of membership contributions from contacts who write at higher grade 
levels with positive sentiments. Future work should investigate if targeting these contacts, 
in particular, hides testimony from potential future campaign leaders personally affected 
by campaign issues. 
7.3.3. Improving Online Advocacy Services 
An early proposal for this dissertation described testing ways to improve online advocacy 
services instead of proposing to study constituent messages passed through them. It 
focused on testing methods to lower transaction costs for constituents to take action 
online and keep them engaged. The problems identified in the original proposal did not 
disappear: 
Citizens need effective ways to regularly engage in policy decisions that 
impact them ? whether these decisions shape civil and environmental projects, or 
other projects. Research shows that both social media and online advocacy 
software services, public and private, have simplified and increased access to 
policymakers in the last two decades, but the efficacy for, sustainability of, and 
timeliness of interactions that they provoke needs improvement (Bimber 2001, 
Boulianne 2009, Karpf 2010, Kenski 2010, Bakker, T.P. et al. 2011, Kim et al. 
2017). Even with the new ease of access to policymakers that online tools give 
165 
 
citizens, it?s hard for citizens to stay informed on multiple issues and strategically 
time their actions. Adser? et al. 2003 and Castells 2007 show that many citizens 
are disenfranchised with this process and that they feel powerless to corrupt 
governments. policymakers in corporations and in government have access to 
advisory boards and cabinets to research different issues and propose issue-
specific solutions at key times. Average citizens do not have these teams. They, by 
default, only have their elected representatives. 
Without time for their own research and without their own issue-matter 
experts to advise them, many citizens become disengaged with policies that affect 
them and do not follow-up with their representatives, trust them (Castells 2007), 
vote (File 2015, U.S. Elections Project 2016, Pew 2017, U.S. Census Bureau 
2018), or even know who their lawmakers are. Lawmakers, in turn, are left out of 
touch with their constituents? positions, and rely on their own heuristics (accurate 
and representative or not), research (peer reviewed or not), advisors (at least they 
can have them ? official or not), and biases (Broockman and Ryan 2016, 
Broockman and Skovron 2017, Butler and Broockman 2011, Haynes et al. 2011, 
2011, 2012, Tversky and Kahneman 1973, 1974, Kahneman 2011). They have 
also long been susceptible to special interest lobbying and campaign 
contributions (Snyder 1990, Claessens et al. 2008). Further, spikes of 
communication on popularized issues leave policy offices unprepared to 
summarize and respond to public comments and questions. Citizens, similarly, 
become fatigued with the effort and timeliness necessary to respond to proposed 
policy revisions. 
166 
 
Researchers and policy campaign managers from public, academic, 
community, and nonprofit organizations strive to limit this disengagement. They 
know that political participation and perceptions of democracy reinforce each 
other (Oni et al. 2017) and that, along with money, continuous and timely contact 
can persuade policymakers (Miler 2014), even with template-driven letters and 
petitions as part of a larger lobbying plan (Karpf 2010). Campaign managers, in 
particular, rely on software services to educate and enlist citizens to engage 
policymakers, often elected, on issues that affect the citizens. They are always 
looking for ways to provoke timely and sustained action and improvements to the 
status quo in advocacy services could directly benefit them. 
 
While this dissertation, the study of relationships between constituent messages and 
organizational engagement, does not directly address these problems, findings may 
support the development of new services that do. This effort may continue as a follow-up 
to the results of this dissertation. 
  
167 
 
APPENDIX A.  STATES AND TERRITORIES 
The 2,199,624 messages in this study were sent from campaigns that targeted 
environmental advocacy issues in either (a) all U.S. states and territories, (b) no state or 
territory, or (c) an individual state or territory. The following is a list of each of these 
location targets and, in parentheses, the number of messages generated from these 
target?s associated campaigns, sorted from the greatest number of messages to the least 
number of messages. Note: messages were sent from campaigns targeted at all 50 states 
except Kansas, North Dakota, and South Dakota. 
All (1,193,389) TN (2,470) PR (367) 
None (877,493) MA (2,235) MS (348) 
MN (21,948) UT (1,815) AL (299) 
CA (10,006) IN (1,712) DE (219) 
VA (9,626) WY (1,709) DC (205) 
PA (8,593) WV (1,554) NH (188) 
OH (8,445) NM (1,490) MT (187) 
WA (8,411) TX (1,312) NE (165) 
NC (7,203) MO (898) VT (165) 
CO (6,259) OK (801) ME (87) 
NY (5,483) WI (777) SC (70) 
MI (3,965) NV (770) IA (66) 
AZ (3,526) KY (729) NJ (62) 
FL (3,511) GA (596) HI (11) 
MD (3,433) LA (472) RI (7) 
IL (2,859) CT (450) ND (1) 
OR (2,851) ID (386) 
  
 
 
  
168 
 
APPENDIX B.  PERSONAL STORY QUERIES 
Exploration three explains how attempts to find lived experiences, as defined by Sandhu 
(2017), in messages began with text searches for ?as a,? ?i am a,? ?i live,? ?my family,? 
?my husband,? ?my wife,? and ?my children.? This appendix lists these searches. All 
searches are case insensitive. For background, please see the MySQL 8.0 reference 
manual, especially documentation on searches and regular expressions: 
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regex 
B.1. Simple MySQL Searches for Personal Stories 
Basic MySQL searches that identify terms anywhere in a message take the form of,  
? SELECT * FROM table WHERE message LIKE ?%Term%? 
Where the following words replace ?Term?:  
1. As a 
2. I am a 
3. We are 
4. We are a 
5. I live 
6. I live in 
7. We live 
8. We live in 
9. We call home 
10. My family 
11. My husband 
12. My wife 
13. My child 
14. My husband 
15. My wife 
Some of these searches can return unintended results when looking for personal stories. 
For example, the first search for ?as a? can return a message containing the words ?has 
already.? Removing the first percentage sign around the term in the ?as a? query helps. In 
this case, the modified search looks for the term at the beginning of a sentence. For 
example: 
? SELECT * FROM table WHERE message LIKE ?Term%? 
169 
 
The modified searches for terms at the beginning for messages eliminate unintended 
results like the "has already" result for the ?as a? query. They also eliminate, however, 
terms that begin sentences and phrases in the middle of messages. While this study still 
uses and reports results from these modified queries, to find terms that begin sentences 
and phrases in the middle of messages, this study uses regular expressions. Personal 
Story Reference Table 1, at the end of this appendix, includes a complete list of these 
basic MySQL search conditions. 
B.2. Regular Expression Searches for Personal Stories 
Simple searches returned some unintended results, like the "has already" result. The 
following regular expressions to find the simple search terms at the start of messages, 
sentences, and prepositions eliminate problems like these. 
1. (([:punct:][:space:](As a))|(^As a))[:space:] 
2. (([:punct:][:space:](I am a))|(^I am a))[:space:] 
3. (([:punct:][:space:](We are))|(^We are))[:space:] 
4. (([:punct:][:space:](We are a))|(^We are a))[:space:] 
5. (([:punct:][:space:](I live))|(^I live))[:space:] 
6. (([:punct:][:space:](I live in))|(^I live in))[:space:] 
7. (([:punct:][:space:](We live))|(^We live))[:space:] 
8. (([:punct:][:space:](We live in))|(^We live in))[:space:] 
9. (([:punct:][:space:](We call home))|(^We call home))[:space:] 
10. (([:punct:][:space:](My family))|(^My family))[:space:] 
11. (([:punct:][:space:](Our family))|(^My family))[:space:] 
12. (([:punct:][:space:](My Child))|(^My Child))[:space:] 
13. (([:punct:][:space:](Our Child))|(^My Child))[:space:] 
14. (([:punct:][:space:](My husband))|(^My husband))[:space:] 
15. (([:punct:][:space:](My wife))|(^My wife))[:space:] 
Personal Story Reference Table 1, at the end of this appendix, includes a complete list of 
these basic MySQL search conditions. As an example of a complete MySQL search using 
one of the patterns above, the search for ?I am a? at the beginning of a sentence or 
preposition looks like this: 
? SELECT * FROM table WHERE message 
? REGEXP '(([:punct:][:space:](I am a))|(^I am a))([:space:])' 
170 
 
B.3. Self-Identification with Nouns 
The search for ?as a? and ?I am a? return messages written by contacts who label 
themselves with specific terms. They identify themselves as belonging to groups such as 
gender categories, family roles (e.g. ?father?), organizations (?member?), occupation 
categories (e.g. ?carpenter?), and contacts living in specific locations (e.g. ?Marylander?). 
The following regular expression expands the ?I am? search to include variations such as 
?I?m a,? ?I have been a,? and ??I will be the.? 
? REGEXP '(I am|I\'m|I was|I have been|I will be) (a|an|the) [a-
z]+' 
Suffixes to this pattern narrow results to specific labels that contacts call themselves and 
also account for a single, optional label modifier ([a-z]+ |). For example, the following 
regular expression identify self-descriptions of male and female family roles: 
? REGEXP '(I am|I\'m|I was|I have been|I will be) (a|an|the) 
([a-z]+ 
|)(male|boy|man|guy|husband|father|dad|papa|grandpa|grandfathe
r|granddad|son|brother|uncle)([:alpha:]|[:space:])' 
? REGEXP '(I am|I\'m|I was|I have been|I will be) (a|an|the) 
([a-z]+ 
|)(female|girl|lady|wife|mother|mom|mama|momma|grandma|grandmo
ther|grandmom|daughter|sister|aunt)([:alpha:]|[:space:])' 
Personal Story Reference Table 2, at the end of this appendix, includes a complete list of 
patterns that identify self-descriptions of family role, gender, some occupations (e.g. 
?doctor,? ?carpenter?) and places of living (e.g. ?Marylander?). 
B.4. Activity Self-Identification with Verbs 
Self-identification can also be found in verbs. While above searches expect sentence 
objects to suffix them, past, present, and future tense verbs can also identify specific task 
and occupation specific verbs. This study uses the following expression to search for a 
generic verb action: 
171 
 
? REGEXP '(I( went| went to| am|\'m| will| will be| was| have| 
have been)( go to| going| going to|))' 
Notice the lack of the pipe (?|?) after the words ?have been? in this generic verb action 
expression that more specific queries use (?have been|?). The pipe makes the verb 
modifiers (e.g. ? will?) optional. Without a specific verb in this generic query, the 
modifiers are necessary. A more sophisticated program could identify verbs with a 
dictionary to improve this generic query. It would identify any verb followed by the word 
?I.? 
Personal Story Reference Table 3, at the end of this appendix, includes a complete 
list of patterns that identify, with verbs, more specific content related to self-
identification, job identification, outdoor activities, suffering, pain, and experience. For 
example, the following expressions were used to search for people who camp and hike: 
? REGEXP '(I( went| went to| am|\'m| will| will be| was| have| 
have been|)( go to| going| going to|)) camp' 
? REGEXP '(I( went| went to| am|\'m| will| will be| was| have| 
have been|)( go to| going| going to|)) (hike|hiking)' 
B.5. Swear Words 
This study looked for three swear words at the beginning and anywhere in sentences, and 
compared membership rates of contacts who have used those words to those using any of 
the LIWC swear words with the following MySQL query parts (words censured with 
?**?): 
? `Message` LIKE 'f**k%%' 
? `Message` LIKE '%%f**k%%' 
? `Message` LIKE 'd**n%%' 
? `Message` LIKE '%%d**n%%' 
? `Message` LIKE 's**t%%' 
? `Message` LIKE '%%s**t%%' 
? `swear` > 0 
The patterns are listed in Personal Story Reference Table 4. 
172 
 
B.6. Finding Members With Matching Messages 
The following MySQL query, defined in Python, describes how this study searched for 
contacts who used messages matching the searches and expressions described above, in 
the variable ?search condition? below: 
command =""" 
SELECT COUNT(*) as 'Contacts' 
FROM ( 
SELECT DISTINCT CID 
FROM messages 
WHERE 
"""+search_condition+""" 
) AS a 
LEFT JOIN contacts b 
ON a.CID = b.CID 
WHERE b.`ever member` = """+str(membership)+"""; 
"""; 
Where ?messages? is a table of personal messages, ?cid? is a unique contact id, 
?contacts? is a table of contacts, and ?member? is a field that contains either one or zero, 
determining if a contact has ever been a member within a year of one of their messages in 
the study period. This query is in the loop 
for membership in [0,1] 
For the calculation of membership rates for conditions and alternative conditions. 
B.7. Personal Story Search Reference Tables 
The following tables provide a reference of all of the MySQL LIKE and REGEX 
condition patterns that Exploration Three uses to search for personal stories. 
1. Basic MySQL searches for personal stories 
2. First-Person Singular Self-Identification with Nouns 
3. First-Person Singular Self-Identification with Verbs 
4. Swear Words 
173 
 
B.8. Personal Story Reference Table 1. Basic MySQL Searches for Personal 
Stories (LIKE and REGEX) 
Note: The word ?basic? in the title of this section refer to basic words and phrases 
developed from those that one nonprofit advocacy organization uses to manually, ad hoc 
search for personal stories in advocacy messages. 
 
Condition Term MySQL Pattern 
Message contains As a LIKE ?%As a%? 
Message starts with As a LIKE ?As a%? 
Phrase starts with As a REGEX ?(([:punct:][:space:](As a))|(^As 
a))[:space:]? 
Message contains I am a LIKE ?%I am a%? 
Message starts with I am a LIKE ?I am a%? 
Phrase starts with I am a REGEX ?(([:punct:][:space:](I am a))|(^I 
am a))[:space:]? 
Message contains We are LIKE ?%We are%? 
Message starts with We are LIKE ?We are%? 
Phrase starts with We are REGEX ?(([:punct:][:space:](We are))|(^We 
are))[:space:]? 
Message contains We are a LIKE ?%We are a%? 
Message starts with We are a LIKE ?We are a%? 
Phrase starts with We are a REGEX ?(([:punct:][:space:](We are))|(^We 
are))[:space:]? 
Message contains I live LIKE ?%I live%? 
Message starts with I live LIKE ?I live%? 
Phrase starts with I live REGEX ?(([:punct:][:space:](I live))|(^I 
live))[:space:]? 
Message contains I live in LIKE ?%I live in%? 
Message starts with I live in LIKE ?I live in%? 
Phrase starts with I live in REGEX ?(([:punct:][:space:](I live 
in))|(^I live in))[:space:]? 
Message contains We live LIKE ?%We live%? 
174 
 
Message starts with We live LIKE ?We live%? 
Phrase starts with We live REGEX ?(([:punct:][:space:](We 
live))|(^We live))[:space:]? 
Message contains We live in LIKE ?%We live in%? 
Message starts with We live in LIKE ?We live in%? 
Phrase starts with We live in REGEX ?(([:punct:][:space:](We live 
in))|(^We live in))[:space:]? 
Message contains We call LIKE ?%We call home%? 
home 
Message starts with We call LIKE ?We call home%? 
home 
Phrase starts with We call REGEX ?(([:punct:][:space:](We call 
home home))|(^We call home))[:space:]? 
Message contains My family LIKE ?%My family%? 
Message starts with My family LIKE ?My family%? 
Phrase starts with My family REGEX ?(([:punct:][:space:](My 
family))|(^My family))[:space:]? 
Message contains Our family LIKE ?%Our family%? 
Message starts with Our family LIKE ?Our family%? 
Phrase starts with Our family REGEX ?(([:punct:][:space:](Our 
family))|(^Our family))[:space:]? 
Message contains My child or LIKE ?%My child%? 
my children 
Message starts with My child or LIKE ?My child%? 
my children 
Phrase starts with My child or REGEXP '(([:punct:][:space:](My 
my children child))|(^My 
child))(ren|)([:punct:]|[:space:]) 
Message contains Our child or LIKE ?%Our child%? 
our children 
Message starts with Our child or LIKE ?Our child%? 
our children 
Phrase starts with Our child or REGEXP '(([:punct:][:space:](Our 
our children child))|(^Our 
child))(ren|)([:punct:]|[:space:]) 
Message contains My husband LIKE ?%My husband%? 
Message starts with My husband LIKE ?My husband%? 
175 
 
Phrase starts with My husband REGEX ?(([:punct:][:space:](My 
husband))|(^My husband))[:space:]? 
Message contains My wife LIKE ?%My child%? 
Message starts with My wife LIKE ?My child%? 
Phrase starts with My wife REGEX ?(([:punct:][:space:](My 
husband))|(^My husband))[:space:]? 
 
B.9. Personal Story Reference Table 2. First-Person Singular Self-Identification 
with Nouns 
Condition MySQL Pattern 
Male '(I am|I\'m|I was|I have been|I will be) (a|an|the) 
([a-z]+ 
|)(male|boy|man|guy|husband|father|dad|papa|grandpa
| 
grandfather|granddad|son|brother|uncle)([:alpha:]|[
:space:])' 
Female REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ 
|)(female|girl|lady|wife|mother|mom|mama|momma|gran
dma| 
grandmother|grandmom|daughter|sister|aunt)([:alpha:
]|[:space:])' 
Doctors, nurses, and REGEXP '(I am|I\'m|I was|I have been|I will be) 
words ending in ?ist? (a|an|the) ([a-z]+ |)([a-z]+ist|doctor|nurse)' 
Words ending in ?ist? REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)([a-z]+ist)' 
Words ending in ?tor? REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)([a-z]+tor)' 
Words ending in ?or? REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)([a-z]+or)' 
Words ending in ?er? REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)([a-z]+er)' 
Doctors and nurses REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)(doctor|nurse)' 
Lawyers and judges REGEXP '(I am|I\'m|I was|I have been|I will be) 
 (a|an|the) ([a-z]+ |)(lawyer|judge)' 
 
Note: Additional terms 
like ?attorney? could 
expand this search 
176 
 
Engineer REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)engineer' 
Husband or wife REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)(husband|wife)' 
Mother or father REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ 
|)(mother|father|mom|dad|mama|papa)' 
Grandmother or REGEXP '(I am|I\'m|I was|I have been|I will be) 
grandfather (a|an|the) ([a-z]+ 
 |)(grandma|grandmother|grandpa|grandfather)' 
Note: The word 
?grandparent? could 
expand this search 
Child REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)(son|daughter|child|kid)' 
Sister or brother REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)(sister|brother)' 
Uncle or aunt REGEXP '(I am|I\'m|I was|I have been|I will be) 
(a|an|the) ([a-z]+ |)(uncle|aunt)' 
Educator REGEXP '(I am|I\'m|I was|I have been|I will be) 
 (a|an|the) ([a-z]+ 
(college, student, phd, |)(college|student|phd|master\'s|master of|doctor 
mater?s, master of, of|graduate|professor|ta|teacher|highschool|element
doctor or, graduate, ary school|preschool|pre-school|higher 
professor, ta, teacher, education|research)' 
high school, 
elementary school, 
preschool, pre-school, 
higher education, 
research) 
 
B.10. Personal Story Reference Table 3. First-Person Singular Self-Identification 
with Verbs 
Condition MySQL Pattern 
Generic first-person REGEXP '(I( went| went to| am|\'m| will| will be| 
singular actions was| have| have been)( go to| going| going to|))' 
Self/Job Identification 
Mary REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(married|mary)' 
177 
 
Teach REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(teach|taught)' 
Vote REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(vote|voting)' 
Work REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
work' 
Live REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(live|living)' 
Program REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
program' 
Analyze REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
analyz' 
Volunteer REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
volunteer' 
Join REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
join' 
Protect, guard, save, REGEXP '(I( went| went to| am|\'m| will| will be| 
fight was| once was| used to|\'m used to| have| have 
been|)( go to| going| going to|)) 
(protect|guard|save|saving|fight|fought)' 
Spend REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
spend' 
Outdoor Activities 
Camp REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
camp' 
Hike REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(hike|hiking)' 
Trek REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
trek' 
178 
 
Climb REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
climb' 
Ski REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
ski' 
Hunt, fish REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(hunt|fish)' 
Bike, cylce REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(bike|biking|cycl)' 
Hike, walk REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(hike|hiking|walk)' 
Sim REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(swim|swam)' 
Ride REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(ride|riding|rode)' 
Suffering, pain, and experience 
Suffer REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
suffer' 
Deprive REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
depriv' 
Die REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(die|dying)' 
Hurt REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
hurt' 
Curse REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
curs' 
Break REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(broke|break)' 
Lose REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(lost|lose)' 
179 
 
Endure REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
endur' 
Bleed REGEXP '(I( went| went to| am|\'m| will| will be| 
was| have| have been|)( go to| going| going to|)) 
(bleed|bled)' 
Go through REGEXP 'I went through|I go through|I\'m going 
through|I will go through' 
 
B.11. Personal Story Reference Table 4. Swear Words 
Words are censored in this table with asterisk. 
Condition Swear word MySQL Pattern 
Message contains F**k `Message` LIKE ?%F**k%? 
Message starts with F**k `Message` LIKE ?F**k%? 
Message contains D**n `Message` LIKE ?%D**n%? 
Message starts with D**n `Message` LIKE ?D**n%? 
Message contains S**t `Message` LIKE ?%S**t%? 
Message starts with S**t `Message` LIKE ?S**t%? 
Message contains Any swear word in `swear` > 0 
the LIWC swear 
dictionary dimension 
 
 
  
180 
 
APPENDIX C.  VALIDATION OF VADER FOR ENVIRONMENTAL 
ADVOCACY MESSAGES SENT TO POLICYMAKERS 
C.1. Validation Summary and Introduction to Precision, Recall, and F-Score 
Measures 
The VADER analysis for rating the sentiment of environmental advocacy messages 
addressed to policymakers was validated by comparing VADER ratings to corresponding 
human ratings of 400 randomly selected personal messages from 491,027 in the 
database.  
Validation of VADER begins with a single human reviewer. It employs the same 
9-point Likert scale that Hutto and Gilbert (2014) use in their validation of VADER for 
social media words: extremely negative, very negative, moderately negative, slightly 
negative, neutral, slightly positive, moderately positive, very positive, and extremely 
positive. It also asks the reviewers to rate messages in a way that reduced variations 
between reviewer scores for Hutto and Gilbert, by asking them to rate messages in a way 
they believe others would rate messages. While Hutto and Gilbert crowd-sourced 
reviewers and screened them with an English language test, this study selected an 
English-speaking reviewer with a college degree.  
VADER identifies messages as either negative, neutral, and positive. It identifies 
messages in these categories with a 56% match rate with the human reviewer, where a 
match rate is the percentage of messages that VADER and the human reviewer rate the 
same. 
Precision, recall, and F1 scores explain the ability of a classification model to 
correctly identify truth (in this case, judged by a human reviewer) in more detail than an 
181 
 
overall match rate. For reference, precision is the number of correct classifications of 
items that a machine makes in that category divided by all the classifications of items that 
the machine makes in that category. Recall is the number of correct classifications of 
items that the machine makes in the category divided by all items in the category whether 
classified by the machine or not (Kent et al. 1955). If the primary goal of an application is 
to correctly classify a small number of items, and avoid incorrect classifications, a high 
degree of precision is more desirable than a high degree of recall. If the primary goal of 
an application is to correctly classify as many items as possible, and incorrectly 
classifying items is not important, a high degree of recall is more important than a high 
degree of precision. The F1 score is the harmonic mean of recall and precision: F1 = 
2/(1/Recall + 1/Precision). The F1 score equally weights recall and precision, irrespective 
of the importance of one over the other. Precision, recall, and the F1 scores are measures 
typically used to validate machine models. Hutto and Gilbert use them in during the 
development of VADER (2014) and Ding uses them in assessing the effectiveness of 
customized sentiment analyzers (2018). 
In this validation, VADER identifies messages with a moderate 0.51 negative 
sentiment F1 score, a low 0.13 neutral sentiment F1 score, and a moderately high 0.66 
positive sentiment F1 score. It finds negative messages with a high precision of 0.71 but a 
moderately low recall of 0.47. It finds positive messages with moderate precision of 0.57 
and a moderately high recall rate of 0.66. It finds neutral messages with low precision 
and recall rates of 0.13 and 0.14. While VADER poorly identifies neutral messages, the 
human reviewer only rated 11% of messages as neutral. They rated 49% of messages 
positive and 41% of messages negative. 
182 
 
C.2. Validation with a Single Human Reviewer 
Validation of VADER with a single human reviewer begins by assessing the accuracy of 
VADER by comparing VADER sentiment ratings and human sentiment ratings in a 
contingency table (Table 1) for the sample of 400 random messages described above. The 
table directly reports the human reviewer responses to the Likert scale as human ratings. 
For VADER ratings, the table reports a classification of VADER compound sentiment 
scores (-1 to 1) into negative, neutral, and positive categories as recommended by Hutto 
and Gilbert (2014) and described in Chapter 2. VADER compound scores less than or 
equal to -0.05 indicate negative sentiment, VADER compound scores greater than or 
equal to 0.05  indicate positive sentiment, and other VADER compound indicate neutral 
sentiment. The match rate for each VADER category (negative, neutral, positive) is equal 
to the number of VADER ratings in a category that match human ratings, all divided by 
the total number of VADER ratings in that category. For example, the match rate for 
negative VADER ratings is equal to the count of all negative VADER ratings that match 
the human ratings for the four negative Likert scale categories (extremely negative, very 
negative, moderately negative, and slightly negative) divided by the total number of 
negative VADER ratings: (18 + 27 + 23 + 23)/129 = 0.71. This negative VADER match 
rate shows that 71% of the negative ratings that VADER makes also match negative 
human ratings. This is high compared to the 0.57 positive VADER match rate, and very 
high compared to the 0.13 neutral VADER match rate.4 These match rates are measures 
 
 
4 VADER neutral sentiment ratings match the human reviewer neutral sentiment ratings with low 
rates when categorizing messages as neutral when their compound VADER scores are in the recommended 
neutral range, between -0.05 and 0.05 (Hutto and Gilbert 2014). Increasing this neutral range, increases the 
neutral match rate. The neutral match rate, similarly, increases if messages rated by the human reviewer as 
183 
 
of VADER precision. While VADER matches negative ratings more precisely than 
positive ratings, the human match rates shown in the last column of Table 1 indicate that 
VADER identifies positive human-rated messages at a higher rate than it identifies 
negative human-rated messages. 
In other words, given just two messages identified by VADER, one negative and 
one positive, because VADER is more precise in identifying negative messages, the one 
negative message is more likely to be rated negative by the human reviewer than the one 
positive message is likely to be rated positive by the human reviewer. Alternatively, 
given all 400 VADER ratings, VADER identifies more of the positive human-rated 
messages than it identifies the negative human-rated messages. It does so, however, with 
a greater likelihood of producing false positive-sentiment ratings compared to false 
negative-sentiment ratings (vs. the human reviewer). 
  
 
 
?slightly positive? and ?slightly negative? are considered neutral ratings. Neutral sentiment rating match 
rates, because they are categorized as such in relatively narrow boundaries, are also more susceptible to 
positive or negative bias by either VADER or the human reviewer in comparison to negative sentiment and 
positive sentiment rating match rates. For example, as shown in Table 4, the human reviewer rated 19 
messages as slightly negative and VADER rated them as neutral. 
184 
 
Table 1. VADER and Human Sentiment for 400 Advocacy Messages 
 
 VADER Rating 
  
Human Rating Negative Neutral Positive Total Human Match Rate 
Extremely Negative 18 1 5 24 0.75 
Very Negative 27 9 7 43 0.63 
Moderately Negative 23 8 22 53 0.43 
Slightly Negative 23 15 36 74 0.31 
Neutral 11 6 26 43 0.14 
Slightly Positive 14 4 37 55 0.67 
Moderately Positive 4 2 34 40 0.85 
Very Positive 4 1 29 34 0.85 
Extremely Positive 5 1 28 34 0.82 
 
Total 129 47 224 400 
  
VADER Match Rate 
0.71 0.13 0.57 
(Precision) 
 
 
Table 2 lumps the scores shown in table one into a three by three confusion matrix in the 
same way that the VADER match rates are calculated in Table 1 ? categorizing all 
positive human ratings as positive, all negative human ratings as negative, and the neutral 
ratings as neutral. For example, there are 91 messages that VADER and humans rated 
negative (18 + 27 + 23 + 23). The last column of table three contains VADER recall 
rates. These rates confirm observations of Table 1 that VADER identifies more positive 
185 
 
human-rated messages than negative human-rated messages, but with relatively greater 
false positive (type one) errors. 
Table 2. VADER and Human Sentiment Rating Confusion Matrix for 400 Advocacy Messages 
  
VADER Rating  
Human Rating Negative Neutral Positive Total Recall 
194 0.47 
Negative 91 33 70 
43 0.14 
Neutral 11 6 26 
163 0.79 
Positive 27 8 128 
 
400 
Total 129 47 224 
  
Precision 0.71 0.13 0.57 
 
Table 3 summarizes the overall match rate, precision, recall, and F1 scores, for negative, 
neutral, and positive VADER ratings compared to the human ratings. 
 
Table 3. Precision and Recall for VADER Sentiment Ratings 
 
VADER Rating Precision Recall F1 
Negative 0.71 0.47 0.56 
Neutral 0.13 0.14 0.13 
Positive 0.57 0.79 0.66 
 
The overall match rate with an individual human reviewer is 56%. 
 
  
186 
 
While Hutto and Gilbert (2014) recommend categorizing sentences into three ordinal 
categories with the VADER compound score at -0.05 and 0.05 thresholds, as calculated 
above, and while Likert scale questions are also ordinal, Table 4 reveals a level of match 
exists when in a confusion matrix with nine equally spaced bins for VADER ratings 
subjectively associated with the nine human ratings. 
 
Table 4. Precision and Recall for VADER Sentiment Ratings 
  
 Subjective VADER Rating 
Human 
Rating -4 -3 -2 -1 0 1 2 3 4 Total Recall 
-4 4 7 4 2 2 1 3 1 0 24 0.17 
-3 3 10 6 6 11 2 1 0 4 43 0.23 
-2 4 6 6 3 13 2 7 6 6 53 0.11 
-1 4 8 7 3 19 5 11 10 7 74 0.4 
0 1 2 6 2 8 5 7 8 4 43 0.19 
1 1 6 2 5 5 6 10 14 6 55 0.11 
2 1 0 1 1 4 4 8 11 10 40 0.20 
3 1 1 1 1 1 1 9 11 8 34 0.32 
4 0 1 3 0 2 4 5 6 13 34 0.38 
Total 19 41 36 23 65 30 61 67 58 400  
Precision 0.21 0.24 0.17 0.13 0.12 0.20 0.13 0.16 0.22   
 
  
187 
 
The subjective VADER scores in Table 4 (-4 to 4) are determined by the function: 
IF( VADER>=0.7777, 4, 
  IF( VADER>=0.5555, 3, 
    IF( VADER>=0.3333, 2, 
      IF( VADER>=.1111, 1, 
        IF( VADER>=-0.1111, 0, 
          IF( VADER>-0.3333, -1, 
            IF( VADER>-0.5555, -2, 
              IF( VADER>-0.7777, -3,-4) 
            ) 
          ) 
        ) 
      ) 
    ) 
  ) 
) 
C.3. Validation with a Multiple Human Reviewers 
Table 5 shows match rates between VADER and six individual reviewers, x1 ? x6, 
rating the same 400 messages and using the same Likert scale survey described for the 
single reviewer (x4) above. It also shows the match rates between VADER and the six 
reviewer?s average ratings rounded to the nearest integer (57% match rate). 
Table 5. VADER Sentiment Match Rates and Correlations 
 
  
Reviewer 
 
x1 x2 x3 x4 x5 x6 round(avg(x)) 
VADER Match Rate 45% 56% 51% 56% 58% 55% 57% 
 
The round(avg(x)) variable is the list of average reviewer sentiment 
scores from -4 to 4, rounded to the nearest integer. 
 
While VADER ratings match those of the average group ratings at slightly higher rates 
than the ratings of most individual reviewers, reviewer scores should only be lumped 
together if their ratings are consistent with one another. This study uses Chonbach?s alpha 
and factor analysis to check if reviewer scores are consistent with each other. Assuming 
an integer ratio scale for human reviewers from -4 to 4 corresponding to extremely 
188 
 
negative to extremely positive ratings, as assumed in Table 4 for compound VADER 
scores, Chronbach?s alpha of 0.90 for the six human reviewers indicates that reviewers 
are fairly consistent in their ratings and it is not unreasonable to take their average rating, 
rounded to the nearest integer, as a better measure of human judgement than using just 
one reviewer. In the calculation of Chronbach?s alpha, the number of reviewers, k, equals 
six, the sum of the variances of each of the reviewer?s scores is equal to 24.83 and the 
variance of all of the sums of the scores for each question is equal to 100.14. The sum of 
the variances of each of the reviewer?s scores is comparatively low compared to the 
variance of all of the sums of the scores for each question. Chonbach?s alpha equals 6/(6-
1) (1 - 24.83/100.14) = 0.90. Factor analysis, furthermore, shows most of the variables 
have similar factor loading (x1=0.68, x2=0.87, x3=0.87, x4=0.84, x5=0.88, x6=0.81). 
Table six compares the precision and recall rates from table three for a single reviewer to 
those of the group of reviewers. Values are similar. The overall accuracy increases to 
57%. Finally, compared to Table 4, for a single reviewer, Table 7 shows precision and 
recall rates for the lumped group score. 
Table 6. Precision and Recall for VADER Sentiment Ratings 
Against an Individual Reviewer and Against a Group of Reviewers 
 
VADER Rating Precision Recall F1 
Individual Negative 0.71 0.47 0.56 
Group Negative 0.65 0.53 0.58 
Neutral Negative 0.13 0.14 0.13 
Group Negative 0.17 0.12 0.14 
Individual Positive 0.57 0.79 0.66 
Group Positive 0.61 0.79 0.69 
 
The overall match rate with an individual human reviewer is 56%. 
The match rate with a group of six human reviewers is 57%. 
 
  
189 
 
Table 7. Precision and Recall for VADER Sentiment Ratings 
 
Group Subjective VADER Rating   
Human 
Rating -4 -3 -2 -1 0 1 2 3 4 Total Recall 
-4 2 1 4 2 0 0 1 1 0 11 0.18 
-3 4 11 4 1 6 0 3 0 2 31 0.35 
-2 4 13 4 6 12 2 2 4 6 53 0.08 
-1 4 5 8 4 16 8 9 6 4 64 0.06 
0 4 5 9 3 11 3 14 11 7 67 0.16 
1 1 6 5 6 16 14 17 24 14 103 0.14 
2 0 0 1 1 3 2 11 16 11 45 0.24 
3 0 0 1 0 1 1 4 5 13 25 0.20 
4 0 0 0 0 0 0 0 0 1 1 1.00 
Total 19 41 36 23 65 30 61 67 58 400  
Precision 0.11 0.27 0.11 0.17 0.17 0.47 0.18 0.07 0.02   
 
 
C.4. Validation Conclusion and Recommendation 
In conclusion, the human validation of VADER shows that Section 5.5 of this study 
reasonably reports that relationships between membership rates and VADER scores are 
descriptive of relationships between membership rates and sentiment. Although neutral 
sentiment rating match rates are low between humans and VADER in this validation, 
190 
 
neutral match rates increase with increasing neutral ranges as shown in Table 4 and 
Table 7. 
In comparison to sentiment language classifiers reviewed and customized by Ding 
(2018), and validated for twitter messages about public infrastructure projects, VADER 
performs well for this study. Ding reports a 20% accuracy rate for the Aylien Text API 
classifier (Aylien 2019), a 50% accuracy rate for the SentiStrength classifier (Thelwall et 
al. 2012), and a 68% accuracy rate for a customized classifier based on a sentiment 
lexicon developed by Hu et al. (2014) and Ding?s study data. These measures of accuracy 
are comparable to the 56% and 57% match rates identified in the human validation of 
VADER sentiment for advocacy messages reported in this Appendix. Given this study?s 
results (Section 5.5, Chapter 6) that sentiment classification can help identify 
membership rates of authors of advocacy messages, future work should be done to 
investigate the ability of other classifiers to identify sentiment in advocacy messages. 
Also, given Ding?s success in customizing a sentiment dictionary for Twitter data, future 
work should investigate the ability of customizing the dictionary of lexicon based 
classifiers like VADER for finding sentiment in advocacy messages. For example, in a 
review of falsely classified messages used to validate VADER in this study, changing a 
misspelled word in one message from ?thenk? to ?thank? in ?thank you? would have 
increased the overall VADER match rate. 
  
191 
 
APPENDIX D.  DEFINING AND VALIDATING A MODEL FOR 
CLASSIFICATION OF PERSONAL STORIES 
As reported in Section 5.3, this dissertation did not develop and validate a model to find 
personal stories in messages because (a) it did not set out to do so and (b) results from 
searches for personal stories revealed other, related content in messages that was 
indicative of high and low membership rates. This dissertation prioritized reporting these 
results to achieve objective two over further developing a model to identify personal 
stories. Future work could be conducted to develop a personal story classifier model. 
Such a model could identify ?lived experience? (Sandhu 2017) content in messages as 
well as and related content (e.g. family references) found by this dissertation in the search 
for lived experiences. It should also consider research from Gordon et al. (2009) who 
classified for personal stories in longer passages of text. This appendix suggests ways to 
validate a model in the future. 
The validation of the classification of messages as personal stories by a model 
depends on the number of descriptive factors that a model classifies messages into, and 
these factors? scales of measurement. This study suggests future work must first better 
define what a personal story is, and what supporting and useful, related factors should be 
reported by a model classifying messages as such. In the most basic case, (a) given a 
random sample of 400 messages, (b) given a single reviewer, and (c) given a model that 
classifies messages containing ?lived experiences,? as defined by Sandhu (2017), or not, 
a person familiar with Sandhu?s work should ideally be consulted to judge if each of the 
400 messages contains a personal story or not. Then, this study should describe the 
accuracy of the model with (a) the model?s match rate to the reviewers classifications, (b) 
192 
 
precision, (c) recall, and (d) F1 scores. This section details these recommendations and 
considers more complex cases for validating multiple factors with multiple reviewers. 
Research (Sandhu 2017) and campaign development guides from the Social 
Change Agency (2017a, 2017b) show advocacy organizations benefit from enlisting 
individuals who have lived experiences affected by campaign issues into organizer and 
leadership positions of campaigns. In comparison to online form-letters and petitions, 
which go unseen by policymakers (Miler 2014), leadership and rhetoric from those with 
lived experiences build trust between advocacy organizations, policymakers, and the 
public. The Congressional Management Foundation (2017) shows that, more generally, 
U.S. congressional representatives say that individualized letters from constituents help 
them take positions on issues. (Chapter 1 and Chapter 2 describe further the state of 
congressional communication.) For reference, as described in Section 3.2, this 
dissertation labels messages originally authored by users of online advocacy systems as 
personal messages. It labels personal messages that contain descriptions and references of 
lived experiences as personal stories. 
This study searches for personal stories with regular expressions (Objective Two). 
In doing so, it exposes the subjective nature of the definition of what a lived experience 
is. It also finds that messages, whether describing experiences of how campaign issues 
directly affect authors, or simply describing an author's occupation or family status, are 
related to membership rates. For example, the following messages could all indicate 
different classifications and degrees of ?lived experiences?: 
  
193 
 
1. I plan on moving to Flint, Michigan, but am worried about water contamination 
2. My uncle died of black lung disease when I was five. Please phase out these coal 
mines in the next 10 years and provide assistance for those working in the 
industry to make the occupation transitions 
3. As a proud Marylander, I support your proposal to make our city a safe place for 
climate refugees 
4. I worry about climate change every day 
5. I drive a car and I support stronger fuel emission standards 
6. My wife and I don?t want our children playing on toxic, synthetic turf proposed in 
the new Downtown Silver Spring update plans 
Sandhu (2017) defines lived experiences as ?the experience(s) of people on whom a 
social issue, or combination of issues, has had a direct personal impact.? Some of these 
messages describe past experiences, some describe worrying about future experiences, 
some describe experiences of family members, some express common experiences, and 
some simply express family associations. Each message may be subjectively classified as 
a lived experience. 
Before validating classification models (deterministic or probabilistic) of personal 
stories, therefore, more specific criteria of what a personal story is needs to be developed 
and incorporated into these models. From an applied point of view, supplementing the 
importance of lived experiences with exploration results from this study, advocacy 
organizations and policymakers may benefit from identification of self-described ?direct 
personal impact? statements, that Sandhu describes, as well as identification of self-
described occupations, places of living, family roles, family relationships, and outdoor 
194 
 
activities. Both models and human judges may classify these factors on Likert scales, like 
VADER classifies sentiment, or in Boolean and null categories (present, not present, and 
undetermined). 
The most general model, with the least number of classification factors, is the 
model that classifies a message describing or not describing lived experiences as defined 
by Sandhu (2017). It reports a single, Boolean classification factor for every message. 
The next most general model adds an undetermined category to this single classification 
factor. The next most general model reports this single classification factor on an ordinal 
scale, and the next most general model reports it on a ratio scale. After this, additional 
classification factors, such as those suggested above (occupation, places of living, family 
role, etc.), with different scales measurement, define more complex models. 
To validate the most general model ? the one with a single Boolean classification 
factor based on the definition of a lived experience ? with only a single human judge of 
truth and a sample of 400 random messages, this study suggests building on lessons 
learned from this study?s validation of VADER with a single reviewer. It suggests 
1. Seeking a college educated, English-speaking expert well-acquainted with 
Sandhu?s definition and research on lived experiences (2017), to classify 
messages as meeting or failing to meet Sandhu?s definition as personal stories 
2. Presenting the reviewer with an online survey that 
a. Shows messages one at a time and requires human interaction between 
messages 
195 
 
b. Asks the reviewer to rate messages as they think other experts might rate 
messages to increase reviewer consistency, as it did for Hutto and Gilbert 
(2014) 
c. Shows the reviewer their progress and rewards the reviewer with positive 
thank you messages as they complete the survey 
d. Shows Sandhu?s definition of what a lived experience is alongside every 
question 
3. Ensuring the reviewer has an environment where they agree that they can focus 
on the survey; if they say the online format doesn?t work for them, the survey 
should be printed 
In the case that multiple experts are able to review messages, validation design work 
should begin by consulting with at least one expert to construct example vignettes of 
what a lived experience is and what it is not in order to ground reviewer understanding of 
what a lived experience is and increase reviewer rating consistency. An odd number of 
reviewers should review messages, or a single expert should be available to break ties. 
Reviewer consistency should be evaluated with factor analysis or a statistic such as 
Greatest Lower Bound (GLB) or Kuder-Richardson Formula 20 (KR-20). If reviewer 
consistency is low, validation will require further investigation to understand why and 
possibly eliminate bad reviewers. 
In the more complex cases, where reviewers are asked to report ordinal and ratio 
judgements for one or more metrics, this study recommends using Chronbach?s alpha and 
factor analysis to check the consistency of reviewers, as this study does for checking the 
consistency of reviewers judging message sentiment. In these cases, where reviewers are 
196 
 
asked to check messages for multiple factors, questions can be grouped by message or by 
factor. Grouping questions by factor would require the user to read each message 
multiple times (equal to the number of factors) and increase the time and effort required 
by reviewers to complete the review. Grouping questions by message, alternatively, 
would allow reviewers to keep a message in their short-term memory and then answer 
questions about each factor in it. In this second case, the survey could present factor 
questions all at once, on a single screen, in sets, or individually for each question. This 
study recommends presenting questions by message, and presenting no more than seven 
factor questions about a message on a single screen at a time. If factor questions could be 
confused with each other, the survey should present them on the same screen with 
distinctions between them highlighted. 
After reviewer data has been collected, classifier validation can employ the same 
match rate, precision, recall, and F1 scores used by this study in validating VADER 
sentiment ratings to access model accuracy. In the more complex model situations, these 
scores should be calculated for each message factor that the model and humans classify. 
 
 
 
 
 
 
 
  
197 
 
GLOSSARY . 
Action Center. A trade name for an online advocacy service. See advocacy service 
Advocacy Campaign. An effort, generally centrally managed by an advocacy 
organization, to support a specific issue. In this study, advocacy campaigns refer 
to online campaigns in which advocacy organization contacts and market targets 
are asked to send petitions and personal messages to their policymakers 
Advocacy Organization. An organization that educates the public and lobbies 
policymakers to support projects and policies. Advocacy Organizations discussed 
in this dissertation are all nonprofit, membership-based organizations which 
collect annual membership dues and contributions to support environmentally 
sustainable policies and projects. Advocacy Organizations discussed in this 
dissertation all use online advocacy services among other methods to achieve 
their goals 
Advocacy Service Provider. A software vendor that develops and provides advocacy 
services to advocacy organizations 
Advocacy Service. A software service used by advocacy organizations to both recruit 
members and enable contacts to conveniently write their policymakers 
Campaign Manager. A staff member or volunteer managing an advocacy campaign. This 
dissertation often refers to campaign managers as campaign organizers 
Campaign Organizer. See campaign manager 
Contact. A person with a relationship to an advocacy organization. Note: new contacts do 
not necessarily have information about them stored in an organizational contact 
relationship management database 
198 
 
Flesch Ease of Reading Test. A popular test that scores text on how easy it is to read by 
people with different levels of education. The Flesch score is a function of 
syllables, words, and sentences in text. See Flesch (1948) 
Linguistic Inquiry and Word Count (LIWC). A software package that counts words in 
text matching collections of words. See LIWC (2018) 
Linguistic Inquiry and Word Count (LIWC) Dimension. A labeled collection of words in 
the LIWC software package. E.g. pronouns, function words, positive emotions, 
etc. 
Linguistic Inquiry and Word Count (LIWC) Score. A LIWC test-result that describes a 
text specimen. LIWC reports all word count rates as percentages matching a 
LIWC dimension (e.g. all pronouns). LIWC reports word count as the number of 
words in text, not a percentage 
Message. Any message sent to a policymaker through an online advocacy system, 
including form letters, custom messages, and personal messages 
Messages, Custom. Prewritten advocacy messages, edited and customized by contacts 
using online advocacy services 
Message, Not Custom And Not Personal (NOTCORP). Messages that have specifically 
not been customized nor individually authored by contacts 
Message, Personal. Individually authored text sent to a policymaker. Contacts compose 
personal messages into blank text area fields on websites and in messenger-
application entry fields 
Message, Personal Story. A message that describes a ?lived experience? (Sandhu 2017); 
also used to describe messages found by searches for ?lived experiences? 
199 
 
Policymaker. A primary target of online advocacy campaigns, many times being state and 
national elected officials or appointees that can vote or influence project and 
policy decisions 
Valence Aware Dictionary for sEntiment Reasoning (VADER). A rule-based model that 
measures sentiment in text, specifically created for short social-media messages. 
See Hutto et al. (2014) and related code at 
https://github.com/cjhutto/vaderSentiment#about-the-scoring 
 
  
200 
 
REFERENCES. 
AddUp (2019). AddUp. Retrieved from https://addup.sierraclub.org/. Accessed March 
2019. 
Adser?, A., Boix, C., & Payne, M. (2003). Are You Being Served? Political 
Accountability and Quality of Government. The Journal of Law, Economics, and 
Organization, 19(2), 445?490. https://doi.org/10.1093/jleo/ewg017 
Aylien (2019). Text Analysis API. Retrieved from https://aylien.com/text-api/. Accessed 
December 2019. 
Bhagat, V. (2005). Online advocacy: How the Internet is transforming the way nonprofits 
reach, motivate, and retain supporters. In Hart, T., Greenfield, J.M., and Johnston, 
M.W. (Eds.), Nonprofit Internet Strategies (pp. 119-134). Hoboken, NJ: John 
Wiley, 2005. ISBN: 0471691887. 
Bimber, B (2001). Information and Political Engagement in America: The Search for 
Effects of Information Technology at the Individual Level. Political Research 
Quarterly, 54(1), 53-67. https://doi.org/10.1177/106591290105400103 
Bird, S., Klein, E., & Loper, E. (2019). Natural Language Processing with Python ? 
Analyzing Text with the Natural Language Toolkit. Retrieved from 
http://www.nltk.org/book/. Accessed February 2019. 
Boulianne, S. (2009). Does Internet Use Affect Engagement? A Meta-Analysis of 
Research. Political Communication, 26(2), 193-211. 
https://doi.org/10.1080/10584600902854363 
Broockman, D. E., & Ryan, T. J. (2016). Preaching to the Choir: Americans Prefer 
Communicating to Copartisan Elected Officials. American Journal of Political 
Science, 60, 1093?1107. https://doi.org/10.1111/ajps.12228 
201 
 
Broockman, D. E., & Skovron, C. (2017). Bias in Perceptions of Public Opinion Among 
American Political Elites. Forthcoming in American Political Science Review. 
https://doi.org/10.2139/ssrn.2930362 
Bureau of Labor Statistics (2015). Volunteers by selected characteristics, September 
2015. Economic News Release. Retrieved from 
https://www.bls.gov/news.release/volun.t01.htm. Accessed December 2019. 
Bureau of Labor Statistics (2019). American Time Use Survey. Retrieved from 
https://www.bls.gov/tus/charts.htm. Accessed November 2019. 
Butler, D. M. & Broockman, D. E. (2011). Do Politicians Racially Discriminate Against 
Constituents? A Field Experiment on State Legislators. American Journal of 
Political Science, 55(3), 463-477. https://doi.org/10.1111/j.1540-
5907.2011.00515.x 
Carpenter, D. (2016). Recruitment by Petition: American Antislavery, French 
Protestantism, English Suppression. Perspectives on Politics, 14(03), 700-723. 
https://doi.org/10.1017/s1537592716001134 
Claessens, S., Feijend, E. and Laevenac, L (2008). Political connections and preferential 
access to finance: The role of campaign contributions. Journal of Financial 
Economics, 88(3), 554-580. https://doi.org/10.1016/j.jfineco.2006.11.003 
Congressional Management Foundation, The (2017). Citizen-Centric Advocacy: The 
Untapped Power of Constituent Engagement. Retrieved from 
http://www.congressfoundation.org/citizen-centric-advocacy-2017-download. 
Accessed October 2018. 
202 
 
Cruickshank, P., Smith, C., and Edelmann, N. (2010). Signing an e-petition as a transition 
from lurking to participation (Vol. 2010). Presented at the Electronic Government 
and Electronic Participation Conference, Lausanne, Switzerland. Retrieved from 
https://www.researchgate.net/publication/261830621_Signing_an_e-
petition_as_a_transition_from_lurking_to_participation. Accessed January 2017.  
Ding, Q (2018). Using Social Media to Evaluate Public Acceptance of Infrastructure 
Projects (Doctoral dissertation). University of Maryland. Retrieved from Digital 
Repository at University of Maryland (DRUM), 2019. 
https://doi.org/10.13016/M27M0437D  
Facebook (2019). Built-in NLP Facebook for Developers. Retrieved from 
https://developers.facebook.com/docs/messenger-platform/built-in-nlp#entities. 
Accessed November 2019. 
File, T. (2015). Who Votes? Congressional Elections and the American Electorate: 1978-
2014. United States Census, Report P20-577. Retrieved from 
https://www.census.gov/library/publications/2015/demo/p20-577.html. Accessed 
February 2018 
Flesch R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 
221?233. https://doi.org/10.1037/h0057532 
Giving Tuesday (2019). Giving Tuesday Retrieved from 
https://www.givingtuesday.org/about. Accessed December 2019. 
Google (2019). Activity Controls. Retrieved from 
https://myaccount.google.com/activitycontrols. Accessed October 2019 
203 
 
Gordon, A. & Swanson, R. (2009). Identifying Personal Stories in Millions of Weblog 
Entries. Presented at the Third International Conference on Weblogs and Social 
Media, Data Challenge Workshop,  San Jose, CA, May 20, 2009. Retrieved from 
http://people.ict.usc.edu/~gordon/publications/ICWSM09-DCW.PDF. Accessed 
October 2018. 
Haynes, A. S., Derrick, G. E., Chapman, S., Redman, S., Hall, W. D., Gillespie, J., & 
Sturk, H. (2011). From ?our world? to the ?real world?: Exploring the views and 
behaviour of policy-influential Australian public health researchers. Social 
Science & Medicine, 72(7), 1047-1055. 
https://doi.org/10.1016/j.socscimed.2011.02.004 
Haynes, A. S., Derrick, G. E., Redman, S., Hall, W. D., Gillespie, J. A., Chapman, S., & 
Sturk, H. (2012). Identifying Trustworthy Experts: How Do Policymakers Find 
and Access Public Health Researchers Worth Consulting or Collaborating With? 
PLoS ONE, 7(3). https://doi.org/10.1371/journal.pone.0032665 
Haynes, A. S., Gillespie, J. A., Derrick, G. E., Hall, W. D., Redman, S., Chapman, S., & 
Sturk, H. (2011). Galvanizers, Guides, Champions, and Shields: The Many Ways 
That Policymakers Use Public Health Researchers. Milbank Quarterly, 89(4), 
564-598. https://doi.org/10.1111/j.1468-0009.2011.00643.x 
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of 
the tenth ACM SIGKDD international conference on Knowledge discovery and 
data mining (pp. 168?177). 
Hutto, C.J. and Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for 
Sentiment Analysis of Social Media Text. Presented at the Eighth International 
204 
 
Conference on Weblogs and Social Media (ICWSM-14), Ann Arbor, MI, June 
2014. 
Internal Revenue Service (2019). SOI Tax Stats - Charities & Other Tax-Exempt 
Organizations Statistics. Retrieved from  https://www.irs.gov/statistics/soi-tax-
stats-annual-extract-of-tax-exempt-organization-financial-data. Accessed May 
2018. 
Jacobs, M. (2016). High pressure for low emissions: How civil society created the Paris 
climate agreement. Juncture, 22, 314-323. https://doi.org/10.1111/j.2050-
5876.2016.00881.x 
Jones, J. J. (2017). Talk "Like a Man": Feminine Style in the Pursuit of Political Power 
(Doctoral dissertation). Irvine, CA: UC Irvine. 
Kahn, A.Z., Skibniewski, M., & Cable, J.H. (2017). Understanding Project Stakeholder 
Psychology: The Path to Effective Stakeholder Management and Engagement. 
PM World Journal, VI(IX). Retrieved from https://pmworldlibrary.net/wp-
content/uploads/2017/09/pmwj62-Sep2017-Khan-Skibniewski-Cable-
understanding-project-stakeholder-pyschology-second-edition2.pdf. Accessed 
May 2018. 
Kahn, A.Z., Skibniewski, M., & Cable, J.H. (2019). The Project Stakeholder 
Management and Engagement Strategy Spectrum: An Empirical Exploration. PM 
World Journal, VIII(III). Retrieved from https://pmworldlibrary.net/wp-
content/uploads/2019/04/pmwj80-Apr2019-Khan-Skibniewski-Cable-Project-
Stakeholder-Management-Strategy-Spectrum.pdf. Accessed November 2019. 
205 
 
Kahneman, D (2011). Thinking Fast and Slow. New York: Farrar, Straus and Giroux. 
ISBN: 0374533555. 
Karpf, D (2006). In Shift From Sponsored Petitions to Crowdfunding, Change.org 
Changes Everything. Civicist. Retrieved from https://civichall.org/civicist/in-shift-
from-sponsored-petitions-to-crowdfunding-change-org-changes-everything/. 
Accessed March, 2018.   
Karpf, D (2010). Online Political Mobilization from the Advocacy Group's Perspective: 
Looking Beyond Clicktivism. Policy & Internet, 2, 7-41. 
https://doi.org/10.2202/1944-2866.1098 
Karpf, D (2016). Analytic Activism: Digital Listening and the New Political Strategy. 
New York: Oxford University Press. ISBN: 9780190266134. 
Karpf, D. (2018). Analytic Activism and Its Limitations. Social Media + Society. 
https://doi.org/10.1177/2056305117750718 
Kenski, K. (2010). Connections Between Internet Use and Political Efficacy, Knowledge, 
and Participation. Journal of Broadcasting & Electronic Media, 50(2), 173-192. 
https://doi.org/10.1207/s15506878jobem5002_1  
Kent, A., Berry, M. M., Luehrs Jr., F. U., and Perry, J. W. (1955). Machine literature 
searching VIII. Operational criteria for designing information retrieval systems. 
American Documentation, 6(2), 93. https://doi.org/10.1002/asi.5090060209 
Kim, Y., Russo, S., and Amn?, E. (2017). The longitudinal relation between online and 
offline political participation among youth at two different developmental stages. 
New Media & Society 19(6), 889-917. https://doi.org/10.1177/1461444815624181 
206 
 
Krippendorff, K (2018). Content Analysis: An Introduction to Its Methodology. SAGE 
Publications. ISBN: 9781506395661. 
Li, L., Bensi, M., Cui, C., Baecher, B., & Huang, Y. (2019). ?Social media crowd-
sourcing for rapid damage assessment following sudden-onset natural hazard 
events.? Unpublished manuscript, University of Maryland, College Park. 
LIWC (2018). LIWC | Linguistic Inquiry and Word Count. Retrieved from 
http://liwc.wpengine.com/. Accessed 2018. 
Long, S (2018). Personal Message Research and Presentation Project Proposal. 
Unpublished manuscript, Sierra Club, Washington, DC. 
McHaney, R., Tako, A. & Robinson, S. (2018). Using LIWC to choose simulation 
approaches: A feasibility study. Decision Support Systems, 111(2018) 1-12. 
https://doi.org/10.1016/j.dss.2018.04.002 
Miler, K (2014). Constituency Representation in Congress. New York: Cambridge 
University Press, 2014. ISBN: 1107677009. 
Morozov, E. (2009). The Brave New World of Slacktivism. Foreign Policy, May 19, 
2009. Retrieved from https://foreignpolicy.com/2009/05/19/the-brave-new-world-
of-slacktivism/. Accessed November, 2019.  
MoveOn (2019) A Short History of MoveOn. Retrieved from https://front.moveon.org/a-
short-history/. Accessed November 2019. 
Nonprofits Source (2019) The Ultimate List Of Charitable Giving Statistics For 2018. 
Retrieved from https://nonprofitssource.com/online-giving-statistics/#Online. 
Accessed November 2019.  
207 
 
Oni, A.A., Oni, S. Mbarika, V., & Ayoa, C.K. (2017). Empirical study of user acceptance 
of online political participation: Integrating Civic Voluntarism Model and Theory 
of Reasoned Action. Government Information Quarterly, 34(2) 317-328. 
https://doi.org/10.1016/j.giq.2017.02.003 
OpenGov Foundation (2018). From Voicemails to Votes: A human-centered 
investigation by The OpenGov Foundation into the systems, tools, constraints, 
and people who drive constituent engagement in Congress. Retrieved from 
https://v2v.opengovfoundation.org/. Accessed November 2018. 
Parry, J.A., Smith, D. A., & Henry, S. (2011). The Impact of Petition Signing on Voter 
Turnout. Political Behavior, 34(1), 117-136. https://doi.org/10.1007/s11109-011-
9161-1 
Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. 
New York: Bloomsbury Press. 
Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and 
properties of LIWC2015. Austin, TX: University of Texas at Austin. Retrieved 
from 
https://repositories.lib.utexas.edu/bitstream/handle/2152/31333/LIWC2015_Lang
uageManual.pdf. Accessed October 2019. 
Pew Charitable Trusts (2017). Why Are Millions of Citizens Not Registered to Vote? 
Retrieved from http://www.pewtrusts.org/en/research-and-analysis/issue-
briefs/2017/06/why-are-millions-of-citizens-not-registered-to-vote. Accessed 
February 2018. 
208 
 
Pew Research (2019). Internet/Broadband Fact Sheet. Retrieved from 
https://www.pewresearch.org/internet/fact-sheet/internet-broadband/. Accessed 
November 2019. 
ProPublica (2019). Nonprofit Explorer. Retrieved from 
https://projects.propublica.org/nonprofits/. Accessed November 2019.  
Putorti, J (2019). Yes, Resistbot Is Effective. Retrieved from https://resistbot.news/yes-
resistbot-is-effective-8e14e72a5ed9. Accessed November 2019. 
Resistbot (2018). Resistbot. Retrieved from https://resist.bot/. Accessed November 2018. 
Ruijuan, Y. (2010). The interpersonal metafunction analysis of Barack Obama?s victory 
speech. English Language Teaching, 3(2), 146?151. 
Sandhu, B (2017) The value of lived experience in social change: the need for leadership 
and organizational development in the social sector. Clore Social Leadership 
Program; 2017. Retrieved from http://thelivedexperience.org/. Accessed October 
2019. 
Schulz, K. (2017). What Calling Congress Achieves (originally titled ?Call and 
Response? in print). American Chronicles, The New Yorker, March 6, 2017 Issue. 
Retrieved from http://www.newyorker.com/magazine/2017/03/06/what-calling-
congress-achieves. Accessed May 25, 2018. 
Snow, D., Rochford, E., Worden, S., & Benford, R. (1986). ?Frame Alignment Processes, 
Micromobilization, and Movement Participation.? American Sociological Review, 
51(4), 464-481. https://doi.org/10.2307/2095581 
209 
 
Snyder, J. M. (1990). Campaign Contributions as Investments: The U.S. House of 
Representatives, 1980-1986. Journal of Political Economy, 98(6), 1195-1227. 
https://doi.org/10.1086/261731 
Social Change Agency, The (2017a). Lost Voices: Digital Campaigning and the Voices 
of Lived Experience. Retrieved from https://thesocialchangeagency.org/wp-
content/uploads/2018/03/full-lost-voices-report.pdf. Accessed October 2018. 
Social Change Agency, The (2017b) A toolkit to help interrogate digital campaigning 
practices. Retrieved from https://thesocialchangeagency.org/wp-
content/uploads/2018/03/framework-booklet.pdf. Accessed October 2018. 
Sokolowski, S. W. (1996). Show Me the Way to the next Worthy Deed: Towards a 
Microstructural Theory of Volunteering and Giving. Voluntas: International 
Journal of Voluntary and Nonprofit Organizations 7(3), 259-278. 
www.jstor.org/stable/27927522. Accessed November 2019. 
Su?rez, D.F. (2009). Nonprofit Advocacy and Civic Engagement on the Internet. 
Administration & Society 41(3), 267-289. 
https://doi.org/10.1177/0095399709332297 
Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the 
social web. Journal of the American Society for Information Science and 
Technology, 63(1), 163?173. 
Tversky, A. & Kahmeman, D. (1973) Availability: A Heuristic for Judging Frequency 
and Probability. Cognitive Psychology, 5, 207-232. Retrieved from 
https://msu.edu/~ema/803/Ch11-JDM/2/TverskyKahneman73.pdf. Accessed 
December 2017. 
210 
 
Tversky, A. & Kahmeman, D. (1974) Judgment under Uncertainty: Heuristics and 
Biases. Science, 185(4157), 1124-1131. Retrieved from 
http://psiexp.ss.uci.edu/research/teaching/Tversky_Kahneman_1974.pdf. 
Accessed December 2017 
U.S. Census Bureau (2018). Voting and Registration Data. Retrieved from 
https://www.census.gov/topics/public-sector/voting/data.html. Accessed February 
2018. 
U.S. Elections Project (2016). Voter Turn Out. Retrieved from 
http://www.electproject.org/home/voter-turnout/voter-turnout-data. Accessed 
February 2018. 
U.S. House of Representatives (2017). Communicating with Congress (CWC). Retrieved 
from https://www.house.gov/doing-business-with-the-house/communicating-
with-congress-cwc. Accessed May 2019. 
WealthEngine (2019). WealthEngine. Retrieved from https://www.wealthengine.com/. 
Accessed March 2019. 
White, M. (2010). Clicktivism is ruining leftist activism. The Guardian August 12, 2010. 
Retrieved from 
https://www.theguardian.com/commentisfree/2010/aug/12/clicktivism-ruining-
leftist-activism. Accessed November, 2019. 
211