| by Michelle Tang | No comments

What is Alternative Data and Why You Should Care

What do you think of when you hear the word “Alternative Data”?  A neat definition would be non-traditional data sources that can provide indications as to a firm’s future performance other than conventional sources such as corporate filings and analyst predictions. Or to borrow from the Alternative Management Association (AIMA, 2020), “alternative data comes from unconventional information, mostly in an unstructured form, is not broadly distributed within the industry and is being used to deliver both investment alpha and operational alpha”.

Think credit card transaction data, mobile device data, web traffic, ESG data, media sentiment, weather data, social media sentiment, IoT sensor data. The list goes on and is constantly expanding with the advances in technology. What is important to realize is that investors are increasingly turning towards using alternative data to gain an edge in generating alpha while corporations are leveraging alternative data to better assist in their strategic decision making.

Market Size and Users of Alternative Data

The global alternative data market is expected to grow from $2.21 billion in 2021 to $3.23 billion in 2022 at a compound annual growth rate (CAGR) of 46.5%. The market is expected to reach $13.91 billion in 2026 at a CAGR of 44% (The Business Research Company). According to the AIMA, the number of alternative data providers today is 20 times larger than it was 30 years ago.  They estimate that by 2024, there will be more than 5,000 different alternative datasets.1

This skyrocketing growth trend can be largely explained by the growing interest in stock market trading and the quest to access more accurate and speedier insights to outperform the market. This is particularly the case for hedge funds and investment banks who are the top users of alternative data. An AIMA and Bank of Merrill Lynch study reveals that roughly half of all investment firms now embrace alternative data. This number will only continue to grow as more firms continue to seek an innovative and competitive edge by accessing an informational edge over the rest of the market.

How Many Alternative Datasets Does the Average Hedge Fund Use?

In 2020, more than 50% of hedge fund managers classified as alternative data market leaders used seven or more alternative data sets globally, while only 8% of the rest of the market used at least seven alternative data sets. Most hedge funds utilized two or more alternative datasets with 85% of market leaders and 77% of the rest of the market doing this.2

Main Use Cases of Alternative Datasets by Hedge Funds

According to a study conducted by Greenwich Associates, nearly 30% of quantitative funds attribute at least 20% of their alpha to alternative data. Furthermore, 42% of all asset managers reported that the alpha edge they achieved by using alternative data sources lasted for at least four years. 

This finding is supplemented by AIMA who revealed that 69% of hedge funds classified as market leaders in alternative data utilize alternative data sources to improve investment decisions and to generate alpha. Scouting for unique data sources and investment strategies that other companies do not have access is essential for gaining a competitive advantage. On the other hand, the emergence of a pandemic has added to the complexity of this issue and necessitated the need for timely, reliable, and quality data. To a lesser extent, hedge funds may also use alternative data to help improve risk management and compliance models.  

What Are The Most Popular Categories of Alternative Data?

Some of the most popular categories of alternative data utilized by buy side investment firms and corporations are listed below. 

Web Crawled Data

From companies’ internal analytics to satellite photos, scraped website data to tracking consumers’ locations, hedge funds and institutional investors are always looking for creative ways to get the edge in investments.” 

~ CNBC reporter Eric Chemi

Web crawling is when readily available information online is downloaded. Crawling is used for data extraction from search engines and e-commerce websites and afterward, unnecessary information is filtered out and only the required details are scraped.  Web-crawled data can be obtained in different ways, such as crawling through emails, news, images, and social media. 

About 2.5 quintillion bytes of data is generated everyday on the web. Tapping into this information pool equips hedge fund managers with valuable insights they need to make smarter and more lucrative investment decisions. 

Meanwhile, for organizations, data scraping has become the ultimate tool for business development. According to Mckinsey Global Institute, data-driven organizations are 23 times more likely to acquire customers, 6 times more likely to retain customers, and 19 times more likely to be profitable. Leveraging this data enables enterprises to make more informed decisions and improve customer experience. It is proven that data-driven businesses outperform their peers with the insights they derive. By tracking consumer interaction and gaining an in-depth understanding of their behaviors, companies can improve their customer experience. This, in turn, will prolong the lifetime value of the client and thereby increase brand loyalty. 

You may be wondering how hedge funds make money off trades by web scraping. Consider twitter. Investors often use web scraping to quickly analyze millions of tweets to determine which stocks to buy and sell.

In 2015, a 21-word tweet from Hilary Clinton ripping a drug company’s pricing policy sparked a sharp sell-off for the group. The iShares Nasdaq Biotechnology ETF (IBB) dropped more than 4%, closing with a 4.5% loss and only a small subset of its components trading in positive territory. Read more here.  

The same thing happened when Hilary criticized private pensions in a single tweet – also in 2015 – which resulted in the stocks of two top corrections companies dropping by 6% and 4% respectively. 

It is therefore not surprising to imagine that investment firms are willing to invest a considerable size of money to access this type of data.  According to Business Insider, hedge funds spend roughly US$2 billion on web scraping software to gain an edge in the markets. 

Web Traffic and App Data

Any company with an online presence will have web traffic data. Web traffic data include traffic metrics such as number of unique visitors to a website to number of visits to the website. This allows users of the data to understand if a particular website is growing and gaining traction, or if it is struggling to attract new users.  More on why web traffic data is an ideal alternative dataset to subscribe to can be read in this article by Similarweb.

In addition to web traffic data, there can be engagement metrics such as time spent on a particular website or page, and how many sessions did each visitor initiate. Such metrics enable end users to evaluate the stickiness of the user base and perhaps change the UI.UX or copy of the website if the bounce rate is high.

Beyond web traffic data is app data which often complements a website’s offering. Metrics include both quantitative and qualitative data, from the number of downloads an app has, to how many times per day a user opens it on average. Popular use cases for app data could be for gaming, food delivery and streaming services. Combining both web and app data will paint a full digital picture of any business.

Social Sentiment and Product Reviews

A social media sentiment analysis tells you how people feel about your brand online. Given that emotions are the number one factor that drives purchasing decisions, with so many consumers sharing their thoughts and feelings on social media, it pays for brands to have a pulse on how their products make people feel. Rather than a simple count of mentions or comments, sentiment analysis considers emotions and opinions. It involves collecting and analyzing information in the posts people share about the brand on social media.

Product reviews can help firms to decide whether to buy, sell or hold a stock. For example, Thinknum’s media outlet, the Business of Business, noted earlier this year that, before Peloton’s stock tumbled in the wake of a treadmill recall, the number of negative online reviews that included words such as “terrible”, “awful” etc. had shot up threefold – a clear signal to sell. 

Geolocation Data

With the increasing worldwide adoption of smartphones and internet of things, geolocation data has enjoyed considerable growth over the years which will only increase as time goes on. The global low-power geolocation market is projected to grow from $24.88 billion in 2020 to $64.80 billion by 2025, at a CAGR of 21.10% from 2020 to 2025.

Geolocation is a technology that uses data acquired from an internet-connected device (e.g., smartphone, computer, fitness tracker) to identify an individual’s geographical location, both longitudinal and latitudinal. Geolocation data can be collected by GPS, wifi-networks, cell towers and bluetooth.

With geolocation data, one can clearly identify an individual’s physical address, including their exact longitude and latitude at a point in time. Companies can use geolocation data to track an individual’s movements to determine patterns and behaviors. Location data for an individual and the time he/she spent at each location can be cross-referenced, thereby revealing where an individual lives, where they work, their daily schedules, which stores they frequent, their regional preferences, and even vacation times. Scary right?

Given how geolocation information can reveal such intimate personal details about individuals, it is no surprise that the secondary location market is becoming a lucrative business, with the industry expected to have reached $350 million in 2020. For example, companies can purchase this anonymized location data or individual profiles for business purposes, such as for targeted advertising.

Credit Card Data

Credit card data is currently the highest grossing category in the alternative data space and continues to grow in popularity. This data is considered highly accurate when the transaction panel is large (over 3 million) and covers a consistent user sample. 

Credit card forecasts are positively related to actual revenue growth and given credit card data is provided monthly with a six day lag, it helps forecast earnings growth more quickly than traditional methods. GSAM notes that companies with greater credit card sales growth tend to exhibit stronger YoY sales growth in the following year. As such, it is not surprising to see how credit card data is fundamental to financial analysis for buy side firms and investment banks. Understanding how consumers spend their money is one of the strongest indicators of economic performance. 

Source: https://www.gsam.com/content/gsam/global/en/market-insights/gsam-insights/quantinomics/credit-card-data.html

Email Data

Emails contain a tonne of transactional information that can be extracted to derive useful insights. The data is accurate and is often used for retail revenue tracking and to understand customer purchasing behavior better. However, panels are typically smaller than credit/debit card panels and can be subject to biases depending on the nature of the email receipt collection (often via an opt-in email or rewards app). 

At Measurable AI, we specialize in granular email datasets and now own the biggest email receipt dataset for the emerging markets.

Our data has been backtested to ensure its accuracy with actual reported revenues. Moreover, unlike credit card data, another added advantage of email data is its availability in real-time and its level of granularity offered (including SKU intel, geolocation data, items ordered, discounts applied, etc.).  

While transactional email receipt data often exhibits a strong correlation to sales figures, non-transactional email activity data is also valuable to better guide brands with data insights to improve and better target their marketing campaigns and monitor interest trends amongst their users.   

Raw vs. Aggregated Data

Alternative data can either come in aggregated form or as a straight data feed through APIs. 

Data aggregation is a process where raw data is gathered and expressed in the form of a summary for statistical analysis. Data aggregation may be done manually or through specialized software called automated data aggregation. After the data is aggregated and written to view or report, you can analyze the aggregated data to gain useful insights about particular resources or resource groups. Aggregated data is structured and less expensive plus easier to work with. On the flip side, they have less alpha potential. Aggregated datasets may also suffer from selection bias, which means they aren’t truly representative. 

Raw data feed, on the other hand, is considered much more valuable and allows for more ‘data digging’.  Raw data is a term for data collected from a source. It has not been subjected to processing or any other manipulation, and often is referred to as primary data. 

Challenges in Using Alternative Data for Hedge Funds 

“Alternative data is the deepest, least utilized alpha source in the world today.”

~ Quandl, NASDAQ platform for alternative data

Given the informational edge that alternative datasets can offer, why aren’t more hedge funds utilizing alternative datasets? The short answer would be challenges. An AIMA study reveals that lack of infrastructure posits the biggest challenge when it comes to using alternative datasets. Amongst the market leaders in the hedge fund industry, backtesting alternative datasets represents the biggest headwind, followed by the ability to source and choose the right quality dataset. 

With the universe of alternative data sets expanding so rapidly, many of them are not going back in time far enough for models to reveal patterns or capture signals. Moreover, significant resources, time and talent is needed to backtest each dataset. 

While a decade ago there was a lack of alternative datasets to select from, the advances in technology over the years has flipped the problem. There are more than 5,000 datasets being presented to buy side firms these days. You can imagine that having the resources and processes to sift through these datasets and pick the right one is indeed a formidable task.

Even once the correct dataset(s) have been identified, there are more concerns for hedge funds to take into account. Data governance matters, the breath of data coverage, the robustness of its source, permanency risk (how far into the future can they expect to be using that data set) and how malleable that data is (i.e. whether it can be mapped with fixed references, such as SEDOLs or CUSIPs). These are all valid questions hedge funds face when presented with new datasets that could be useful for investment research. 

“Data scientists [are] becoming the new quantitative hedge funds” 

~ Greenwich Associates

When selecting and digesting the alternative datasets, another hurdle arises: finding the right talent to do so. To effectively extract value from the alternative datasets, there needs to be the right people and technology in place such as natural language processing, machine learning (ML) and broader artificial intelligence (AI) platforms that can be used to transform alternative data into practical insights.  

For now, responsibility of selecting alternative datasets falls largely on the investment team. Only 15% of hedge funds have a dedicated Chief Data Officer to provide oversight for alternative data. That said, the industry is changing and there is a trend towards hiring more data scientists and data engineers. 

When working with hedge funds at Measurable AI, we have learnt that for hedge funds to source, integrate and extract insights from our alternative dataset, all their different teams within their business need to work together  in sync – i.e. investment teams, data scientists, risk professionals and compliance and legal teams.

Alternative Data Needs to be Utilized More

As remarked by Rayne Gaisford, Head of Data Strategy and Equity Research at Jeffries, the financial services industry is just starting to scratch the surface when it comes to leveraging alternative data.3 Many firms are still adjusting to using alternative data as their primary source of information instead of as a supplemental afterthought. 

We believe the industry still has a long way to go before it reaches maturity. Due to the breadth and uniqueness of alternative data sets, there are a multitude of ways different alternative datasets can be incorporated into the due diligence and investment process by hedge funds and private equity funds to better aid their quest for generating alpha. Corporations are also increasingly adopting new datasets to strengthen their business intelligence, to better understand their client behavioral patterns, streamline their operational efficiencies and improve their marketing campaigns to generate more sales.   

As such, there remains enormous potential for alternative data to become more mainstream as firms widen their budget to factor in these new sources of data. Until now, we remain steadfastly focused in growing our e-receipt panel across the emerging markets.


About Us

Michelle Tang is a digital growth marketer and manages strategic partnerships for Measurable AI with a passion for data analytics. She enjoys storytelling with insights and trends derived from Measurable AI’s very own e-receipts data.

You can reach her at [email protected].

Measurable AI provides actionable consumer insights based on billions of alternative data for emerging markets. 

*The content is for informational purposes only, you should not construe any such information or other material as investment advicePrior written consent is needed for any form of republication, modification, repost or distribution of the contents. 


Share This Article

Leave a Reply