Reddit as a prediction tool for crypto-assets.

AutorCamou, Luis Antonio Loredo
  1. Introduction

    Cryptocurrencies have, once again, gained mainstream attention. According to Google Trends (2021), on April 18, 2021, the term Bitcoin ranked as the ninth trendiest search in the US, and Dogecoin ranked first on April 15th, 2021, with over 5 million searches in the US. Institutional interest has also risen recently. Companies such as Tesla, Square, and Microstrategy have acquired Bitcoin for their balance sheet and everyday applications, like Paypal, now allow their customers to buy and sell cryptocurrencies (Markets Insider, 2021). This has been accompanied by increased social media activity, with mentions of Bitcoin on Twitter reaching its all-time high (Cointelegraph, 2021).

    What are the incentives for someone to post and read financial message boards? Antweiler and Frank (2004) provide an overview of the existing literature. In particular, the authors focus on the theory by DeMarzo et al. (2003) who introduce the concept of persuasion bias under which individuals fail to account for possible repetition in the information they receive. This anomaly may happen if two individuals read the same piece of information and then discuss it among them without revealing their source. While they may both believe they have heard the same information for a second time, they fail to account for repetition. Given this ability to influence people, it may be profitable to be well connected in a community to increase the repeated information others receive. An increased sense of confidence in decisions may push people to read message boards.

    In this paper, we will focus on Reddit. Reddit is most well known for being a collection of forums based on interests. A new community, called a subreddit, can be created about any topic if it complies with general rules. In particular, most crypto-projects have a dedicated subreddit where participants are free to join and share news, express opinions, and discuss ideas. This information sharing is done through posts. Each post is composed of a title set by the author and comments made by people who wish to discuss the submission. According to Alexa Internet (2021), as of April 13, 2021, Reddit is the 19th most popular website worldwide and 7th in the United States.

    Several papers dealing with the impact of sentiment indices on returns of cryptocurrencies precede ours. Among these, we note research done by Kristoufek (2013) studying the effect of search trends in Wikipedia and Google for Bitcoin. Also, Naeem et al. (2021) and Anamika et al. (2021) use the Twitter Happiness Sentiment index and survey-based sentiment measures, respectively. Sentiment analysis using publicly available textual information has also been used previously. Using lexical dictionaries, Karalevicius et al. (2018) identify Twitter posts as a predictor for the price of Bitcoin. Kraaijeveld and De Smedt (2020) reach similar conclusions for other cryptocurrencies --but not all--besides Bitcoin, commonly referred to as altcoins. In contrast, Ahn and Kim (2020) conclude, using posts from a Bitcoin forum that, contrary to future returns, volume and volatility are related to emotional factors.

    Research using Reddit has also been done previously. In particular, Prajapati (2020) makes use of different lexical dictionaries to perform sentiment analysis over Reddit and Google News to predict the price of Bitcoin using data from January 1, 2018, to November 20, 2019. It is concluded that social sentiment captured through Reddit and Google News improves forecasts against past prices-only models.

    Wooley et al. (2019) predicts, using 24 Reddit communities, the following three months of price directions (up vs down) using information from July 1, 2016, to July 24, 2018. It is concluded that predictions benefit when using sentiment variables against a lagged prices-only model.

    To the best of our knowledge, no previous research has been done studying the forecasting power of general use forums, such as Reddit, on volatility and returns and included Granger, Mariano-Diebold, and robustness tests. Further, no other study has performed detailed feature importance analysis as presented in this paper. Our research not only focuses on Bitcoin, as most research does, but also on altcoins.

    This study finds that although sentiment variables derived from Reddit seem to help reduce the mean squared error of our volatility predictions, these results are not statistically different from a HAR-RV model. In contrast, although mean squared error results for returns are mixed, these are consistently different from an in-sample-mean benchmark. Our variables gain relative importance around market-wide and asset-specific events such as market booms and class actions. We use natural language processing and machine learning tools to create sentiment variables and evaluate our results, assessing the impact of including our constructed variables through linear and nonlinear models. Our work falls along the lines of Antweiler and Frank (2004), who concludes that stock messages can be used to help predict volatility and returns for companies in the Dow Jones Industrial Average. Our work also relates to Engle et al. (2011), showing that public information arrival is related to increases in volatility.

    Our work reinforces previous research showing the positive impact of sentiment indices on returns and volatilities of cryptocurrencies. In particular, our conclusions are similar to those found by Ahn and Kim (2020) who show that while sentiment does help reduce forecasting error in volatility, the effect on returns is not clear. While our work coincides with Prajapati (2020) that sentiment seems to improve Bitcoin price prediction, we achieve mixed results when expanding to other assets. However, we find evidence that information extracted from Reddit seems to consistently reduce volatility forecasting error for all assets studied.

    This work is of interest to investors, risk managers, regulators, and academics. From an investing and risk management perspective, the recollection of new features presents unique opportunities to understand and profit from market behavior. For example, accurate predictions can be used in portfolio rebalancing, options trading, and value-at-risk estimation. From a regulatory perspective, the impact of well-connected individuals may imply the existence of possible market manipulation. This implication is of particular interest in anonymous forums such as Reddit. Finally, from an academic perspective, mainstream media has proven beneficial to explain risk premia and above-average stock returns, such as in Manela and Moreira (2017).

    The document is organized as follows. Section 2 includes a description of the data and sources. Section 3 motivates our model selection and methodology. Next, section 4 offers an exploratory analysis of the subreddits for each asset using natural language processing tools. Section 5 presents our prediction results and the relative feature importance analysis. Section 6 presents our conclusions.

  2. Data

    There is a wide range of cryptocurrencies and tokens available. As of May 2, 2021, CoinMarketCap (2021a) reports the existence of 9,527 cryptos. We have decided to focus on five of them, described in the following paragraphs. The first four cryptocurrencies have been selected due to being the top 4 projects by market capitalization at the start of 2021, representing almost 85% of the total market capitalization (CoinMarketCap, 2021b). In particular, Bitcoin and Ethereum represent 70.68% and 10.79%, respectively. Finally, Dogecoin has been selected due to its internet popularity. The assets are presented by market capitalization.

    Bitcoin (BTC): A cryptocurrency invented in 2008 by an unknown persona denominated Satoshi Nakamoto. Bitcoin (BTC) uses peer-to-peer technology to operate in a decentralized manner. Transactions are verified through cryptography and recorded in a public distributed ledger called a blockchain.

    Ethereum (ETH): Ethereum is a blockchain with smart-contract functionality. A smart contract is a computer program intended to execute automatically. Given this, decentralized finance, a movement to offer traditional financial instruments in a decentralized architecture, has made Ethereum the most actively used blockchain (Bloomberg, 2021). The currency used in Ethereum is Ether (ETH).

    Litecoin (LTC): Litecoin is a cryptocurrency based on Bitcoin. Litecoin differs by using a different cryptographic algorithm, more resistant to custom hardware.

    Ripple (XRP): Ripple is a payment solutions company. Ripple makes use of its native cryptocurrency, known as XRP, to allow for prompt payments. This asset is of interest to us given a class action against Ripple in May 2018 given the unregistered sale of its XRP tokens, and in December 2020, two of its founders were sued by the SEC for selling these tokens.

    Dogecoin (DOGE): Introduced on December 6, 2013, Dogecoin is a cryptocurrency based around the figure of the Doge meme, a Shiba Inu dog. Compared to other cryptocurrencies, it is focused on a fun and welcoming community. In January 2021, in a movement influenced by the GameStop short squeeze, Dogecoin's price increased by 800% (CNBC, 2021). Further, in April 2021, a movement to raise its price pushed its value from $0.06 on April 7 to $0.40 on April 19. The currency has been referenced by, among others, Elon Musk, Mark Cuban, Snoop Dogg, and Gene Simmons.

    Our data can be split into two major categories: financial and Reddit variables. In the following paragraphs, we describe each of these.

    2.1 Financial variables

    To obtain our prices, we make use of the Binance API. As of May 2,2021, Binance is the largest crypto exchange by trading volume (CoinMarketCap, 2021c). Through the Binance API, we download all available prices for the given crypto-assets at a 5-minute frequency. The initial observation for each asset varies according to the first listing date in the exchange and is...

Para continuar a ler

PEÇA SUA AVALIAÇÃO

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT