Benton writes, “How big is BloombergGPT? Well, the company says it was trained on a corpus of more than 700 billion tokens (or word fragments). For context, GPT-3, released in 2020, was trained on about 500 billion. (OpenAI has declined to reveal any equivalent number for GPT-4, the successor released last month, citing ‘the competitive landscape.’)
“What’s in all that training data? Of the 700 million-plus tokens, 363 billion are taken from Bloomberg’s own financial data, the sort of information that powers its terminals — ‘the largest domain-specific dataset yet’ constructed, it says. Another 345 billion tokens come from ‘general purpose datasets’ obtained from elsewhere.
“The company-specific data, named FinPile, consists of ‘a range of English financial documents including news, filings, press releases, web-scraped financial documents, and social media drawn from the Bloomberg archives.’ So if you’ve read a Bloomberg Businessweek story in the past few years, it’s in there. So are SEC filings, Bloomberg TV transcripts, Fed data, and ‘other data relevant to the financial markets.'”
Read more here.
The Indianapolis Business Journal is looking for our next news editor, a role that focuses…
Axios has chosen Ben Berkowitz to be its next managing editor of business and markets.…
Business Insider editor in chief Jamie Heller sent out the following on Monday: I'm thrilled…
Rest of World editor in chief Anup Kaphle sent out the following on Monday: We are excited…
The Financial Times has hired Veena Venugopal as its India newsletter editor. She has been working at…
Benjamin Parkin has been named Middle East and Africa news editor at the Financial Times, based…