Paper by Daniela Paolotti; Elad Yom-Tov; Natalia Adler; Ciro Cattuto; Kyriaki Kalimeri; Michele Tizzoni; Stefaan Verhulst; and Andrew Young in the Journal of Medical Internet Research: “India is home to 20% of the world’s suicide deaths. In India, and around the world, young people are especially at risk of suicide. While statistics regarding suicide in India are distressingly high, data and cultural issues likely contribute to a widespread underreporting of the problem. Social stigma and only recent de-criminalization of suicide are but two factors hampering official agencies’ collection and reporting of suicide rates.
Results of the first model fit (R2) when predicting the suicide rates from the fraction of queries in each of the 5 topics, as well as the fraction of all suicide methods, show a correlation of about 0.5. The correlation increases significantly with the removal of even 3 outliers, and improves slightly when 5 outliers are removed. In all cases, statistically significant correlation is reached, but the best correlation is obtained for suicide methods (hanging, pesticide, and poison), and only to a lesser extent for depression. Results for the second model fit using both query data and demographic data show that for all categories, if no outliers are removed, demographic data predict suicide rates better than query data. However, when 3 outliers are removed, query data about pesticides or poisons improves the model over using demographic data.
Conclusions: Internet search data has been shown in previous work to serve as a proxy for many health-related behaviors, enabling the measurement of rates of different conditions ranging from influenza to suicide. In this work, we used both search data and demographics to predict suicide rates. In this way, search data serves as a proxy for unmeasured (hidden) factors corresponding to suicide rates. Moreover, our procedure for outlier rejection serves to single out states where the suicide rates have substantially different correlations with both demographic factors and query rates….(More)”.