Public Discussions of Recovery: Applying Natural Language Processing to #sober Tweets

Authors: Elyse J. Thulin, PhD, Alex Weigard, PhD, & Anne Fernandez, PhD


Background:

Recovery from problematic substance use is integral to reducing harm, but there is no singular definition of recovery agreed upon both in clinical and research settings. Many in the field disagree on what substance-related behaviors constitute “recovery” (e.g., abstinence from all substances, moderation, harm reduction, etc.) and whether a specific method is integral for recovery (e.g., following AA versus being able to achieve change on one’s own). While researchers and institutions have made efforts to identify a cohesive definition, absent from literature and stakeholder discussion are key groups, namely individuals who experience addiction and recovery. The goal of this study is to understand how individuals are describing and communicating about their own experiences with sobriety by analyzing posts on a popular, public social media platform (Twitter).
Methods: In accordance with Twitter API guidelines, we collected 1,469,084 tweets that included “#sober” posted between 2010 and 2021. We retained posts in English (n=940,698), and further classified tweets as original posts (n=558,611) or re-tweeted content (n=382,087). The first stage in this analysis included documenting most common hashtag terms that co-occur with #sober. The second phase of analysis includes collecting additional tweet data based on top hashtags and collating them into a dataset to capture a robust set of tweets for analysis; using natural language processing methods to identify top terms and phrases in the tweets; and then employing topic modeling to identify which terms cluster together.
Results: The first stage is presented on here. Top hashtags included “#recovery” (n=105,561), “#addiction” (n=72,308), “#soberlife” (n=34,990), and “#aa” (n=22,761). In addition to individuals being in recovery, there are several phrases referencing a lapse in sobriety, such as “#detox #relapse” (n=6,157) and “#recovery #rehab” (n=5,731). While there were references to specific recovery programs (e.g., “#sober #12steps” (n=5,275)), treatment was also generally referenced (e.g., “#treatment #intervention” (n=6,535)).
Discussion: In the next stage of our work, we will collect additional tweet data, identify top terms and phrases, and perform topic modeling to identify how terms cluster together. By identifying term clusters, we can evaluate the ways in which individuals are talking about recovery and if these map onto behaviors of recovery (e.g., abstinence, moderation, harm reduction) and/or specific methods (e.g., using AA, achieving change on one’s own). Identifying how individuals talk about recovery will begin to address the gap of representation in identifying a meaningful definition of recovery.