Wall Street Journal chief product and technology officer Louise Story writes about how a reporting tool from its Washington bureau has turned into the Talk2020 feature for its readers.
Story writes, “In the world long before Covid-19, a handful of folks in the D.C. bureau gathered on Monday afternoons for Show & Tell. They were something of a digital braintrust — those who work in data and data visualization — and back in late 2018, data news editor Anthony DeBarros was telling the rest of us about some work he had done to scrape transcripts of President Trump’s speeches into a database.
“That work powered an article DeBarros wrote with a White House reporter in October 2018. He saw the possibility for a full-fledged tool for other reporters to use, too. I admit I cocked my head a bit — how would a database we’d built be any more useful than a traditional search engine?
“Turns out, I was very wrong. Anthony built a prototype tool using a small PostgreSQL database and Django, a web framework with deep news roots. By summer 2019, he was demoing the tool for members of our New York-based R&D team. And that’s where the real alchemy happened.
“Our R&D team was touring news bureaus when Anthony DeBarros showed us his project. We immediately spotted the potential to turn it into a more robust newsroom tool ahead of the elections. We worked with Factiva to expand the database to include public statements from all the presidential candidates, added tweets through a Twitter API, and integrated machine-learning models to identify language patterns and locations, helping our reporters track candidates’ stances on various issues. This helped reporters find information faster and conduct new types of analysis using political rhetoric. Pretty soon the database contained 30 years and tens of thousands of public statements made by a number of political figures.”
Read more here.