Cracking the Code

Investment Management Tackles Data Science

June 19, 2019

9 min read

Data science, artificial intelligence (AI) and machine learning are hot topics in investment management, as firms look for new ways to model investing problems and generate differentiated insights. But data science is also relatively new to the industry—and that means growing pains. What does it take to get it right?

Investment managers are increasingly looking beyond simple linear regression to more advanced ways of modeling investing problems. This has led them to data science: a multidiscipline field that applies scientific methods to extract insights from data. Data science includes AI, in which machines imitate human decision-making, and machine learning is one application of AI to solve problems. Of course, big data is the raw input.

But there are challenges that need to be overcome in order for investment managers to take a big step forward in data science, which is a complicated and fragmented landscape. Many vendors purport to do great things, but because the industry is still youthful with few standards, investment managers need to carefully vet data sets, technologies, platforms and partners.

An Age-Old Investment Challenge in a New Age

The basic premise of investment management hasn’t changed in decades; investors still try to beat the market by finding unique data and synthesizing it quicker and better than others. But there’s a lot more data today, and it’s coming at us faster and in diverse formats.

We have better tools and techniques today, too: faster algorithms, better machines, open-source code and cloud-based technologies. But can AI and machine learning tools synthesize financial data better and allow us to extract faster and more differentiated insights? Not necessarily. We need to dig deeper to understand which data and which techniques will advance our investment processes. The key is still to identify the right investment controversies that will drive asset performance.

That’s one of the major focuses at our firm right now in our centralized data-science team: we know the outcome we’re driving toward—solving key investment issues—and we know the right questions to ask. Our fundamental and quantitative analysts are domain experts, so we collaborate with them and they guide our data exploration and modeling research.

Applying Big Data and Data Science

Big data has the potential to change the way investment managers look at companies.

Research teams traditionally burrow through company statements and engage with management teams to assess a firm’s prospects, maybe even talking with customers and suppliers to gain insights from across the value chain. In assessing sales, the emphasis has been on estimates, because companies release financial statements weeks after quarter-end; there’s no real-time public information from that channel.

Today, we can tap a much broader data universe to build a better mosaic around a company. If we’re researching a consumer-products firm, we can crawl websites to dimension its popularity, pricing strategies and inventory levels. What’s the temperature of a company’s consumer reviews? How many people are searching for its products? We can zero in on its online business by combing through email receipts or credit card transactions. Cell-phone data can aid in tracking foot-traffic volume to its brick-and-mortar locations.

Potential Beyond Solving Investment Problems

Investors can also use machine learning to develop better algorithms that help portfolio managers and traders decide what to trade, when to trade it and where to trade it. By continuously evaluating feedback from trades, algorithms can be adjusted dynamically to conduct transactions at the best prices for clients.

Risk-management applications have potential, too. Imagine an automated risk manager that can systematically crawl through a targeted set of data and information sources around the world, process the findings, and highlight specific strategies and holdings that might be at risk from overnight developments. Those timely insights can then be passed on to the risk-management team to discuss and debate with the investment teams. Data science can also help us understand organizational risks, including monitoring for anti–money laundering compliance and offering insights on the impact of new regulations.

The Structural Question: Centralized, Distributed or Hybrid?

As a group, most large investment managers have been later to the game than large, systematic hedge funds in pursuing and adopting data science. And frankly, the efforts so far have mostly been hit or miss. As we see it, data science for investment managers at this stage is a game of iteration. A company tries something, invests in it and sees if it works. If it fails, the firm unwinds the initiative and tries again.

Today, the industry seems very much to be in its second iteration. Some firms are opening data-science centers. Others are building centralized data-science teams to pursue impact across the firm, which requires scale. Teams may include dozens of people looking for, ingesting, cleaning and analyzing big data.

Centralization makes it easier to create uniform standards and share insights broadly, but it can be hard to satisfy individual portfolio teams competing for time and bandwidth. And there’s the inevitable worry that portfolio teams will poach the best analysts.

Other firms take a hybrid approach—the model we’re using. A centralized team focuses on firmwide standards, insights and initiatives. And embedded data scientists focus on business unit–specific projects. In our view, the centralized team is a center of excellence for the firm, sharing insights and providing expertise where needed to help different investment teams on their journeys.

A Data-Science Shakeout Period Is Looming

There’s a lot of spending on data science among large investment managers, but that doesn’t guarantee that money is being applied to the right priorities or structures. There’s still a lot of hype relative to substance, which we expect to continue for the next few years or so.

A shakeout period is likely to follow. Some firms will become less enamored with data science for two key reasons. First, alternative data sets are very hard to work with, and if everyone’s doing the same thing, it can seem harder to be different. Second, some machine learning and AI techniques don’t apply to our investment problems; these techniques will fail, and people won’t be able to explain why. We’ve seen limitations in our own research with machine learning techniques: financial data have an extremely low signal-to-noise ratio.

So, while the theoretical case for using data science in asset management is certainly compelling, plenty of cultural and practical hurdles remain before the industry can take a major leap forward. Let’s take a look at some of the key hurdles and how investment managers can work to clear them.

Harnessing Big Data: A Make-or-Break for Data Science

Models are only as good as the data they’re pointed toward. It may be surprising to some, but data quality generally isn’t great. Many investment managers are working with any number of internal legacy systems; it’s not easy to integrate data spread across the firm, harness it and develop insights.

Data can be in different formats and may be lacking IDs that facilitate matching data across systems. Duplicates have to be weeded through and cleaned. Data-quality issues slow projects down, and many firms (including ours) are finding that they have to tackle this issue before doing any predictive analytics.

Alternative data sets also tend to come in formats that are hard to consume. How do you match a cell-phone location to a store location and then match that to an asset that’s tradable? And newer data sets lack a history that data scientists can use to normalize these sets and make them effective: What types of trends should we expect from a data set?

Finally, while big data is “big,” it’s rarely comprehensive. For example, we can get millions of data points on cell-phone locations, but we won’t capture all of the customers shopping at all of a retailer’s stores. We need to use statistical techniques to extrapolate insights from the samples we have.

How can investment managers put some momentum behind data challenges? We must treat data as an asset vital to success—and invest in the required ingredients. At our firm, we’ve hired data architects, strategists, stewards and analysts to improve and simplify data acquisition, cleansing, storage, analysis and publication. Accessible, accurate, cataloged and timely data form the foundation of any successful data-science effort.

Tackling the Organizational Challenges to Data-Science Progress

Beyond big data, the ability to address other factors—both structural and cultural—will determine how successful investment managers will be in making data science an integral part of their research and investment processes.

Lead from the top—and make collaboration the mantra. Not everyone is embracing data science, including many industry leaders. A recent KPMG survey of asset-management CEOs found that more than half of them believe historical data over predictive analytics—and two-thirds trust their own intuition and experience more than modeled insights.* If data science doesn’t have support from the very top of an organization, success is likely beyond reach. Leadership has to include data science as part of a firm’s strategic goals.

Leaders may have to evolve a firm’s culture to better encourage a collective effort. Companies with collaborative research cultures may have a head start, but management still has to clearly demonstrate a commitment to data science—and a top-down mandate to support work happening on the ground level.

Break down traditional business silos. Data science sometimes fails because structure and business or functional silos get in the way. To succeed, firms need to integrate the efforts of a very broad group of stakeholders with diverse skill sets: portfolio managers, quantitative analysts, investment technologists and even operations teams, which are responsible for data.

It’s not a trivial exercise to get everyone on the same page, especially in large businesses where reporting lines, business unit definitions and strategic priorities can work against a seamless integration. Offering incentives to collaborate across traditional business silos is one way to encourage these partnerships.

Demonstrate key wins to generate enthusiasm. From our firm’s perspective, the combination of human and machine is an extension of integrating fundamental and quantitative analyses, which we’ve been doing for some time. But it was still important to demonstrate early key wins to generate momentum.

A few years ago, we used social media to augment traditional polls in forecasting elections. We found that actual outcomes are better predicted by a combination of both perspectives. As mentioned earlier, we’ve tapped consumer review data to gauge the prospects of new products like the iPhone X and Subaru Ascent. And we’ve researched machine learning models to understand areas where they can have an impact.

Demystify the black box syndrome. “Black box” isn’t necessarily an evil term, but the inability to interpret results can hinder the adoption of data-science techniques. Investment teams that don’t fully understand a process are less likely to trust it. That’s why interpretability is important for new tools and algorithms.

Various techniques can explain the signals a model is producing. They’re not necessarily perfect, but they can help data scientists explain most of a model’s signals, such as why it’s picking a certain stock, country or currency. The industry is developing better tools to remove the black-box stigma, but it’s still a hurdle.

From our experience, one way to help clear that hurdle is to emphasize creating models that have solid economic underpinnings and are fundamentally intuitive, so that we’re not solely reliant on what the data say. As we’ve mentioned, data can be noisy and lead to false insights or correlations.

Don’t make every project a nail with data science as the hammer. Experts have to guard against being too eager in deploying data-science tools. There’s a clear rule of thumb: don’t let the technique drive your research approach; let the question drive it. If it seems that machine learning can help—the iPhone study, for instance—by all means, use it.

But for problems with a relative handful of data points, simple linear regression will do. Using data science instead would likely produce an overcomplicated model that may perform poorly out of sample. As we move forward, many of us will encounter inappropriate uses for machine learning—and we should point them out.

Summing It All Up

There’s a lot of promise in the use of data science in the investment-management industry and some early successes. We’re confident that the human-plus-machine model can work and ultimately be more effective than either on its own. Most firms—except for some quantitative shops—are integrating the two.

At our firm, we consider the domain expertise of industry and quantitative analysts as vital in narrowing down the areas to research, whether by identifying investment controversies or high-potential data sets. Data scientists are sourcing data sets and researching advanced modeling techniques to find new relationships in data. The intersection of these efforts is most likely to result in differentiated insights.

But those of us in the investment-management industry have to overcome hurdles to endure the coming shakeout period and survive to advance data science. Humans must be more open to machine learning and AI. Even skeptics would likely admit that these techniques are a good starting point to helping us find better stocks or bonds. But there’s a long way to go before the industry truly embraces data science.

One thing is for sure: we certainly can’t ignore it altogether.

*KPMG 2018 Global CEO Outlook

The views expressed herein do not constitute research, investment advice or trade recommendations and do not necessarily represent the views of all AB portfolio-management teams.

About the Authors