“Alternative data” describes forms of data relevant to the investing process that are not “traditional” types of information. While traditional information (such as company disclosures in SEC filings or other formal public statements) has long been the focus of investing professionals, recent years have brought an explosion of interest in alternative data sources such as web scraping, credit card panel data, and satellite imagery. A 2017 report from Deloitte, for example, predicted that alternative data would “transform active investment management” by 2022, arguing that funds that fail to use alternative data may fall behind “competitors that effectively incorporate alternative data into their securities valuation and trading signal processes.” Greenwich Associates, a consulting firm, likewise estimated in 2017 that investment firms’ total spending on alternative data had reached approximately $300 million.
Although there have been no publicly reported securities lawsuits or SEC enforcement actions relating to investment firms’ use of alternative data, investors and their legal advisors have been wary of potential exposure to legal risks from their collection or purchase of alternative data. Perceived risks relating to the use of such data include potential violation of laws relating to computer usage, consumer privacy, and trading while in possession of material non-public information (MNPI). The risk of liability under Section 10(b) of the Exchange Act and SEC Rule 10b-5 is particularly sharp regarding the use of web-scraped data.
Web scraping, perhaps the most common source of alternative data for investment firms today, is the automated extraction of information from web pages. This is accomplished through the use of specialized programs that access a web page and collect specified data, often iterating this process across many pages of a website.
Scraping is a common technique with a broad range of applications, the vast majority of which have nothing to do with investing. But investors have embraced it wholeheartedly. One firm has estimated that in 2018, one in 20 requests on the entire Internet were performed by institutional investors or sell-side researchers scraping websites for information—over 10 billion page visits a day, “equal to the daily users of Google’s search function.”
While most applications of web scraping are routine and non-controversial, some have created legal controversies. The classifieds site Craigslist has sued various companies for scraping listings; similarly, a provider of “workforce data services” which scraped public LinkedIn profiles won a preliminary injunction against the site, a decision which is currently on appeal in the Ninth Circuit. Major websites today regularly include provisions in their terms of service which expressly prohibit web scraping without permission of the site owner.
Section 10(b) and SEC v. Dorozhko
Traders who utilize web-scraped data might incur Section 10(b) liability through a theory grounded in the Second Circuit’s 2009 decision in SEC v. Dorozhko. In that case, the defendant allegedly hacked into the internal computer systems of a healthcare company, gathering financial data which informed his trading in the company’s stock. The SEC brought a civil action, alleging Dorozhko’s hacking activity was a “deceptive” device under Section 10(b) and that his trading therefore constituted securities fraud.
After the lower court ruled for Dorozhko, the Second Circuit overturned, holding that Section 10(b) does not bar prosecution in cases where a trader owes no duty to the source of the information. Pointing to the language of the statute, the court held that use of a fraudulent misrepresentation in obtaining MNPI could potentially subject a trader to liability.
The Second Circuit did not decide, however, whether Dorozhko’s specific computer hacking in the case constituted such a fraudulent misrepresentation, remanding for the district court to make the determination. It offered only some cryptic guidance:
In our view, misrepresenting one’s identity in order to gain access to information that is otherwise off limits, and then stealing that information[,] is plainly “deceptive” within the ordinary meaning of the word. It is unclear, however, that exploiting a weakness in an electronic code to gain unauthorized access is “deceptive,” rather than being mere theft. Accordingly, depending on how the hacker gained access, it seems to us entirely possible that computer hacking could be, by definition, a “deceptive device or contrivance” that is prohibited by Section 10(b) and Rule 10b-5.
Unfortunately for those interested in the topic, the question of exactly what sort of hacking conduct could constitute a “deceptive device” was not answered on remand, as Dorozhko disappeared and the SEC ultimately won an unopposed motion for summary judgment.
In the nearly ten years since Dorozhko, no court has taken up the challenge of determining the extent of potential liability for “outsider traders” who fraudulently misrepresent their identities to obtain MNPI. But the language of the Second Circuit’s opinion is concerning for market participants who trade based on information collected through web scraping. Might some scraping practices constitute “misrepresenting one’s identity in order to gain access to information that is otherwise off limits”?
One potentially deceptive tactic occurs when a web scraper, in exchange for access to website information, agrees to terms and conditions which prohibit scraping activities. The scraper is, in effect, affirmatively representing her own identity as a typical site user in order to get the benefit of the contractual agreement. But the website does not receive the benefit of the agreement, as the scraper is in fact not using the site for permitted purposes.
This deception—the scraper affirmatively stating that she is a certain type of user with knowledge that she is in fact another type—seems arguably similar to fraud in the inducement. As Orin Kerr observed in an influential article on cybercrime, cited in the SEC’s brief in Dorozhko and discussed in the court’s opinion, “if a user registers for an e-mail account and later breaches the terms of service, she in effect convinces the computer to grant her access based on the false representation that she will comply with the terms.” Scraping activities which rely on assent to terms and conditions prohibiting scraping appear analogous to this hypothetical, and therefore potentially meet the Second Circuit’s standard for “deception” under Section 10(b).
Another potential example of “affirmative misrepresentation” occurs when a web scraper takes affirmative steps to evade scrutiny by a website operator and maintain access to the target information. Website operators often block particular Internet Protocol (IP) addresses which they observe have sent a high volume of requests to the site. In response, web scrapers may cycle through ranges of IP addresses, such that individual addresses send only a limited number of requests. Given the ease with which scrapers can change IP addresses, websites have developed additional and increasingly sophisticated measures for identifying undesired users. Web scrapers may then program their bots to introduce a random lag between requests, or to access information in varying sequences, to resemble the users which the site wishes to allow continued information access.
The previous examples suggest that web scrapers may in at least some circumstances “affirmatively misrepresent” their identity to website owners, thus satisfying the “deceptive device” prong of a Section 10(b) and Rule 10b-5 analysis. Though the courts have provided minimal guidance regarding the precise definition of affirmative misrepresentations, this seems liable to change as regulators continue to bring enforcement actions using Dorozkho-grounded theories. Investors using web scraping techniques should therefore carefully consider any processes used and assess whether they involve elements of deceit and identity masking.