How To Front-Run Financial Markets With Your Web Browser

twitterlogoEarly access to financial data can mean big profits. Imagine if you could learn the quarterly financial results of a company or macroeconomic statistical data before everyone else? Foreknowledge of the results means being able to take a position before the market moves as a result to everyone else becoming similarly informed. Until now, it took having a white-shoe country-club membership or buddies that hang out at the Eccles building in Washington, DC. What if all it took was a web browser?

At least one  financial-sector news organization is now obtaining and publicizing financial reports prior to their official release, utilizing a technique used for website content enumeration originally pioneered by security testers and attackers. As an artifact of the (bad) design decisions made in the development of many CMS(Content Management Systems) and of the processes used by the organizations deploying such systems, a confluence of factors leads to unintended information exposure through enumeration within a limited time window. Later, the information exposure just becomes intended information release.

Mashable recently published a story about a financial reporting and analytics firm, Selerity, using these simple but too-often effective techniques to retrieve and publish Twitter’s earnings report prior to its official release.

Some relevant definitions:

  • Frontrunning is the the practice of dealing on advance information provided by insiders before their clients or the public have been given the information. While it’s generally illegal, public information that is being further distributed publicly doesn’t fit the criteria set by the Securities Exchange Act, the Investment Company Act, and related rules. The CFAA as currently interpreted, that’s a trickier question.
  •  Forced browsing is an attack where the aim is to enumerate and access resources that are not referenced by the application, but are still accessible.

Once investors learned the reality of Twitter’s earnings report, its stock dove 18%. The Mashable article reported,

Although the company managed to briefly halt trading, and says that it is investigating the source of its leak, it seems likely that what happened was bad information management on Twitter’s part. On its earnings call with investors, Twitter made sure to put the onus of the blame for the situation on the Nasdaq-owned Shareholder.com, the company it pays to maintain its investor relations page.

Selerity maintains that it didn’t hack into anything; it simply found the information on Twitter’s own public investor relations website.

Predictability is a problem that plagues many systems, in many forms. I discussed some in the FuzzDB docs:

Software standardization means that predictable resource locations are the norm. Platforms like IIS, Cold Fusion, and Apache Tomcat store files that are known to leak information about system configuration in predictable places. Because of the popularity of a small number of package managers, log, configuration, and password files for popular software platforms are likely to be stored in a small number of places. Lists of platform-categorized web scripts that have been mentioned in a vulnerability database, lists of login page names from popular applications, all known compressed file type extensions, and countless other data elements on can be leveraged to turn “brute force” into a highly targeted discovery tool.

It’s just as commonly a problem for the file or other resource names and HTTP request data formats chosen for website resources.  In a nutshell, Shareholder.com has a CMS system which loads content in advance but doesn’t turn on the link to it until the appropriate time. Since as the Mashable article explained the information request format was predictable, all it took was making the request after the data had been loaded but before the link was published in order to find the earnings reports sooner.

I have to admit, I’ve thought about doing this before[0] with a job scheduler, some shellscripts wrapping wget,  and monitoring the results for HTTP 200 success responses. Which leads me to conclude that if Selerity is doing this and publishing their results, so there must be 100 or 1000 others doing it and keeping their results to themselves.

How do you defend against this?
Anti-automation controls are relatively easy to bypass and single-use hashes that are returned by a script can be gathered with automation and replayed into a subsequent request.
The only solution is to apply the control to the closest point possible – don’t load the data until the embargo time is up. If it’s not there, it can’t be retrieved.
But first, threat model your applications. Think about the data you’re seeking to protect, understand who it might have value to and why, and based on the risk you’re willing to accept and how the tradeoffs align, describe protections like the one described in to the application or business process. Doing this well takes knowledge of how systems are attacked. Without that knowledge, it’s too easy to design ineffective controls.

0. Thought about it, but ultimately decided against it due to the interpretation of and open questions about the CFAA in the case of Andrew “Weev” Aurenheimer, particularly the conflation of Brute forcing open access data and unauthorized access by the courts. The upside is good, but the risk of being another CFAA test case is not worth it.