ABSTRACT
The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation.
- A. Abboute, Y. Boudjeriou, G. Entringer, J. Aze, S. Bringay, and P. Poncelet. Mining twitter for suicide prevention. In Natural Language Processing and Information Systems, volume 8455 of Lecture Notes in Computer Science, pages 250--253. Springer, 2014.Google ScholarCross Ref
- D. Baker and S. Fortune. Understanding self-harm and suicide websites. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 29(3):118--122, 2008.Google ScholarCross Ref
- L. Barbosa and J. Feng. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 36--44. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- K. Becker and M. H. Schmidt. When kids seek help on-line: Internet chat rooms and suicide. reclaiming children and youth, 13(4):229--230, 2005.Google Scholar
- L. Biddle, J. Donovan, K. Hawton, N. Kapur, and D. Gunnell. Suicide and the internet. Bmj, 336(7648):800--802, 2008.Google ScholarCross Ref
- L. Breiman. Bagging predictors. Machine learning, 24(2):123--140, 1996. Google ScholarCross Ref
- P. Burnap, O. F. Rana, N. Avis, M. Williams, W. Housley, A. Edwards, J. Morgan, and L. Sloan. Detecting tension in online communities with computational twitter analysis. Technological Forecasting and Social Change, 2013.Google Scholar
- M. D. C. S. Counts and M. Gamon. Not all moods re created equal! a exploring human emotional states in social media. 2012.Google Scholar
- K. Daine, K. Hawton, V. Singaravelu, A. Stewart, S. Simkin, and P. Montgomery. The power of the web: a systematic review of studies of the influence of the internet on self-harm and suicide in young people. PloS one, 8(10):e77555, 2013.Google ScholarCross Ref
- M. De Choudhury, S. Counts, E. J. Horvitz, and A. Hoff. Characterizing and predicting postpartum depression from shared facebook data. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW '14, pages 626--638, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz. Predicting depression via social media. In ICWSM, 2013.Google Scholar
- B. Desmet and V. Hoste. Emotion detection in suicide notes. Expert Systems with Applications, 40(16):6351--6358, 2013. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, pages 23--37. Springer, 1995. Google ScholarCross Ref
- M. Gould, P. Jamieson, and D. Romer. Media contagion and suicide among the young. American Behavioral Scientist, 46(9):1269--1284, 2003.Google ScholarCross Ref
- J. F. Gunn and D. Lester. Twitter postings and suicide: An analysis of the postings of a fatal suicide in the 24 hours prior to death. Present tense, 27(16):42, 2012.Google Scholar
- C. Homan, R. Johar, T. Liu, M. Lytle, V. Silenzio, and C. Ovesdotter Alm. Toward macro-insights for suicide prevention: Analyzing fine-grained distress at scale. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology, pages 107--117, Baltimore, Maryland, USA, June 2014. Association for Computational Linguistics.Google ScholarCross Ref
- Y.-P. Huang, T. Goh, and C. L. Liew. Hunting suicide notes in web 2.0-preliminary findings. In Multimedia Workshops, 2007. ISMW'07. Ninth IEEE International Symposium on, pages 517--521. IEEE, 2007. Google ScholarDigital Library
- A. Ikunaga, S. R. Nath, and K. A. Skinner. Internet suicide in japan: A qualitative content analysis of a suicide bulletin board. Transcultural psychiatry, page 1363461513487308, 2013.Google Scholar
- N. Jacob, J. Scourfield, and R. Evans.Suicide prevention via the internet: A descriptive review. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 35(4):261, 2014.Google ScholarCross Ref
- J. Jashinsky, S. H. Burton, C. L. Hanson, J. West, C. Giraud-Carrier, M. D. Barnes, and T. Argyle. Tracking suicide risk factors through twitter in the us. 2013.Google Scholar
- V. Kolhatkar, H. Zinsmeister, and G. Hirst. Interpreting anaphoric shell nouns using antecedents of cataphoric shell nouns as training data. In EMNLP, pages 300--310, 2013.Google Scholar
- M. T. Lehrman, C. O. Alm, and R. A. Proaño. Detecting distressed and non-distressed affect states in short forum texts. In Proceedings of the Second Workshop on Language in Social Media, pages 9--18. Association for Computational Linguistics, 2012. Google ScholarDigital Library
- M. Liakata, J.-H. Kim, S. Saha, J. Hastings, and D. Rebholz-Schuhmann. Three hybrid classifiers for the detection of emotions in suicide notes. Biomedical informatics insights, 5(Suppl 1):175, 2012.Google Scholar
- P. Matykiewicz, W. Duch, and J. Pestian. Clustering semantic spaces of suicide notes and newsgroups articles. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 179--184. Association for Computational Linguistics, 2009. Google ScholarDigital Library
- A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In LREC, 2010.Google Scholar
- J. Pennebaker, M. Francis, and R. Booth. Linguistic Inquiry and Word Count: A computerized text analysis program. 2001.Google Scholar
- J. Pestian, H. Nasrallah, P. Matykiewicz, A. Bennett, and A. Leenaars. Suicide note classification using natural language processing: A content analysis. Biomedical informatics insights, 2010(3):19, 2010.Google Scholar
- J. P. Pestian, P. Matykiewicz, M. Linn-Gust, B. South, O. Uzuner, J. Wiebe, K. B. Cohen, J. Hurdle, and C. Brew. Sentiment analysis of suicide notes: A shared task. Biomedical informatics insights, 5(Suppl 1):3, 2012.Google Scholar
- J. Pirkis and R. W. Blood. Suicide and the media. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 22(4):155--162, 2001.Google ScholarCross Ref
- C. Poulin, B. Shiner, P. Thompson, L. Vepstas, Y. Young-Xu, B. Goertzel, B. Watts, L. Flashman, and T. McAllister. Predicting the risk of suicide by analyzing the text of clinical notes. PloS one, 9(1):e85733, 2014.Google ScholarCross Ref
- P. R. Recupero, S. E. Harms, and J. M. Noble. Googling suicide: surfing for suicide information on the internet. Journal of Clinical Psychiatry, 2008.Google ScholarCross Ref
- J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A new classifier ensemble method. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(10):1619--1630, 2006. Google ScholarDigital Library
- T. D. Ruder, G. M. Hatch, G. Ampanozi, M. J. Thali, and N. Fischer. Suicide announcement on facebook. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 32(5):280--282, 2011.Google ScholarCross Ref
- I. Spasić, P. Burnap, M. Greenwood, and M. Arribas-Ayllon. A naïve bayes approach to classifying topics in suicide notes. Biomedical informatics insights, 5(Suppl 1):87, 2012.Google Scholar
- H. Sueki. The association of suicide-related twitter use with suicidal behaviour: A cross-sectional study of young internet users in japan. Journal of affective disorders, 2014.Google Scholar
- M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544--2558, 2010. Google ScholarDigital Library
- H.-H. Won, W. Myung, G.-Y. Song, W.-H. Lee, J.-W. Kim, B. J. Carroll, and D. K. Kim. Predicting national suicide numbers with social media data. PloS one, 8(4):e61809, 2013.Google ScholarCross Ref
- C. Yang, K. H. Lin, and H.-H. Chen. Emotion classification using web blog corpora. In Web Intelligence, IEEE/WIC/ACM International Conference on, pages 275--278. IEEE, 2007. Google ScholarDigital Library
- H. Yang, A. Willis, A. De Roeck, and B. Nuseibeh. A hybrid model for automatic emotion recognition in suicide notes. Biomedical informatics insights, 5(Suppl 1):17, 2012.Google Scholar
Index Terms
- Machine Classification and Analysis of Suicide-Related Communication on Twitter
Recommendations
Russian trolls speaking Russian: Regional Twitter operations and MH17
WebSci '20: Proceedings of the 12th ACM Conference on Web ScienceThe role of social media in promoting media pluralism was initially viewed as wholly positive as social media could break the oligopoly of (often state-owned) mainstream media. However, some governments are allegedly manipulating social media by hiring ...
An Ensemble Classification System for Twitter Sentiment Analysis
AbstractTwitter Sentiment Analysis is the way of identifying sentiments and opinions in tweets. The main computational steps in this process are determining the polarity or sentiment of the tweet and then categorizing them into the positive tweet or ...
A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?
ICHIT'11: Proceedings of the 5th international conference on Convergence and hybrid information technologyMicroblogging is a new informal communication medium of blogging that differs from a traditional blog in which content is much shorter. Microbloggers post about topics that describe their current status. Twitter is a popular microblogging service and ...
Comments