AJR co-editor Sean Mussenden is providing live coverage of the Computation + Journalism Symposium 2014 on Friday and Saturday.
Refresh the page to see new posts.
Saturday Oct. 25, 3:30 p.m.
Thanks for Reading the Computation + Journalism Live Blog
Thanks for following along this weekend as we brought you the interesting ideas and projects from the Computation + Journalism Symposium 2014. Hope you enjoyed it as much as we did.
As a reminder, you can read some fantastic, user-friendly summaries of the papers presented at the conference that the authors wrote for AJR, detailing cutting-edge research.
Saturday Oct. 25, 2:45 p.m.
Getting Computer Scientists and Newsrooms to Work Together
Data mining panel pic.twitter.com/CkXBKl9v2d
— Sean Mussenden (@smussenden) October 25, 2014
Everyone at the Computation + Journalism Symposium seems to agree that better journalism will result when serious computer science firepower is embedded in newsrooms on the reporting side.
But it’s not clear that practice is widespread yet. Dan Keating, a data journalist at the Washington Post, noted that people who have a computer science background and serious coding chops exist in the newsroom, but most are tasked with building things for the Internet as opposed to bring computational thinking and building data models to find stories.
Another option: newsrooms and academics could collaborate on joint projects. That presents its own challenges, though. Ifran Essa, a computer science professor at Georgia Tech, noted that computer scientists are interested in working on problems with solutions that are “repeatable” — applicable to more than just one scenario. Newsrooms don’t do “repeatable” well, Keating said, as data journalists typically move from story to story, each one with its own unique data set and collection of unique problems.
— Miriam Boon (@milara) October 25, 2014
There’s also a knowledge gap. Journalists have one way of approaching data that differs from how computer scientists approach data, and there’s not enough communication between the two sides to develop a better understanding of how the two sides could work together. That’s why it’s important for the two sides to talk, said Hanna Wallach of Microsoft Research.
I’ve had a lot of these discussions at this conference this weekend, so here’s hoping we see more collaboration in the future.
"Tool" to computer scientists means prototype; "Tool" for journalist means *product*. #cj2014
— Nick Diakopoulos (@ndiakopoulos) October 25, 2014
This super interesting panel on data mining has me thinking how social science sensibilities can augment computing problem-solving #CJ2014
— Will Allen (@williamlallen) October 25, 2014
Saturday Oct. 25, 12:45 p.m.
Automatically Generating Cartoons(!) Using City Complaint Data
This is fun. Kati London, of Microsoft Research, wrote a program to automatically generate cartoons based on data from complaints made to a city through a 311 hotline.
— Will Allen (@williamlallen) October 25, 2014
Saturday Oct. 25, 12:30 p.m.
Using Drones to Capture Data and Amazing Video
Does Ben Kreimer have the coolest job in journalism, or what?
The two developed a system to use a drone with a GPS-enabled camera to take photos that created a 3D model of an archeological dig site. They wrote up a piece for AJR on the project, which is worth a read.
Because the FAA strictly limits how drones can be used by journalists in the U.S., Kreimer has had to leave the U.S. in order to do a lot of work. Which meant he spent a lot of time over the last year in India and Africa. In Africa, he used a drone to capture these amazing wildlife videos.
In Kenya, he used a drone to do 3D mapping of a massive garbage dump where hundreds of poor people search for food on a daily basis, and captured this amazing video that quickly and elegantly shows detail and scale of the dump.
Saturday Oct. 25, 11:30 a.m.
Investigative Reporting to Hold Algorithms Accountable
We live in an era when algorithms dictate a large part of how we experience the world. Our interactions with friends on social networks. The prices we see on shopping sites. Information that search engines give us about political candidates.
Each of these algorithms do specific things, some of which have broad consequences for society. But because they’re controlled by private companies, the inner workings are secret. You can’t FOIA Facebook, Twitter or Amazon.
That stinks because, to me, figuring out how these hugely important pieces of technology affect our lives is one of the most important — and largely unanswered — questions of our time.
It’s obviously a rich field of exploration for both journalists and computer and data scientists. Journalists and computer scientists have done important work in this area, attempting to reverse engineer how some of these algorithms work. At a panel on algorithmic accountability here, the presenters discussed some of these efforts.
Julia Angwin, of ProPublica, talked about work she did at the Wall Street Journal that discovered that Staples was showing higher prices to online users in rural areas who lived a long way from an OfficeMax or Office Depot. Another WSJ project found that searching for a presidential candidate on Google affected future search results.
Christo Wilson, of Northeastern University, did research that found that mobile phone users on Travelocity were shown different prices than desktop users and that Home Depot showed higher priced items to desktop users than mobile users.
These were excellent efforts. What struck me is just how much effort they took to complete, which has obvious implications for the ability of news organizations to do this kind of work in the future.
The Staples project took nine months. Wilson’s project took five people to complete and took four months. That’s a huge effort to answer a handful of questions.
I hope that doesn’t deter people from doing this kind of work. Because We need more people doing investigative reporting because, as Kelly McBride of the Poynter Institute pointed out, the operation of these algorithms has important implications for the political discussions that grease the wheels of democracy.
Facebook’s algorithm is designed to show people things they want to talk about. It looks at things we’ve liked or commented on in the past, and shows us more stuff like that.
One problem, McBride noted, is that people often don’t like talking about important things — like racial issues. The result: when people talk about race on Facebook, it gets to fewer people than less controversial topics. And that, McBride said, has important implications for the future of democracy.
For those interested in learning more on this topic, check out this link from University of Maryland computational journalist and assistant professor Nick Diakopoulos:
— Nick Diakopoulos (@ndiakopoulos) October 25, 2014
Saturday Oct. 25, 10:25 a.m.
A Program That Rewrites News Stories to Be Simpler?
As a young journalist, I was told that I should be writing for an audience that reads at an 8th grade level. (I’ve heard other says 5th grade level). I’ve long felt that this requirement was both good and bad.
The good: it forced me to get very good at explaining complex ideas very simply and clearly. The bad: I’ve often felt this sometimes results in the dumbing-down of news stories. But what if a journalist could write a story once, for a savvy audience, and then use a computer program to automatically write a simpler version?
And what if a reader could select what version they wanted to read? In talking about other research she’s doing, Hille van der Ka casually mentioned a text-to-text generator that appears to do something like this.
I’m posting this without knowing anything about how well — or if — it works. But the concept is so intriguing that I thought it was worth mentioning.
Saturday Oct. 25, 10:06 a.m.
Bad News for Journalists Worried That Robots Will Steal Their Jobs
There are certainly journalists out there who are worried that advances in technology will put them out of a job. Right now, in 2014, there are computer programs that automatically write text stories based on data inputs — financial stories, sports stories and stories on other topics.
These are earnings reports stories and game recap stories that were once exclusively written by humans. I would argue that these programs free up journalists to do more interesting, complex work.
But some might worry that all this does is make it possible for news companies to employ fewer journalists. And for those of you in the second camp (not me), you’re not going to like research by Hille van der Kaa and her colleagues.
She studied the credibility of machine written stories and found that the public trusts trusts them just as much as stories written by humans. Read more on her research in a piece she wrote for AJR.
— Turo Uskali (@TuroUskali) October 25, 2014
Saturday Oct. 25, 9:45 a.m.
Here’s Why Every Newsroom Needs an R&D Lab Full of Hacker Journalists
When I dream about the future, I see hoverboards, roads full of self-driving cars — and teams of nerds in every newsroom building experimental technology to find better stories and tell them more effectively. We may not have hoverboards yet (seriously, why not?), but there are news nerds in R&D labs doing great, experimental work at a select few news organizations, including the New York Times and BBC News. Basile Simon, a “hacker journalist” from BBC News Labs, talked about two projects he and his colleagues developed. The first, called Juicer, is a “semantic tagging engine.” It scans news content for key concepts and automatically matches it to related content. This makes possible some pretty cool things. For example, imagine a text story about lawmakers’ views on smoking bans.
This engine could find and display — with no human intervention — lawmakers votes on smoking bans or other related health bills. The BBC has used this to automatically add related text stories to local election result pages, something that would take a human very, very long to complete. Here’s a shot:
The second, called Datastringer, is a beat reporter’s dream. Working with BBC journalists in London, they created a system that automatically scans crime data on a regular basis and looks for big, potentially newsworthy increases or drops.
When it finds one — say that it notices that thefts in a given area are dramatically up compared with last year — it emails a reporter and suggests they investigate further. You can read more about these ideas in a piece Simon wrote for AJR on his work.
Saturday Oct. 25, 9:15 a.m.
Cutting Edge Research at Computation + Journalism
AJR invited the people who authored academic papers to present at Computation + Journalism Symposium 2014 to write user-friendly summaries of their research. Our idea: to share their cutting-edge ideas with a wider audience of journalists. You can read the summaries here. Check them out if you’re interested in knowing more about how :
- Journalists are mashing up drones with GPS-equipped cameras to automatically create 3D models of newsworthy structures.
- Bots are programatically writing news stories that the public views as just as credible as those written by humans, according to one study.
- Computer scientists are creating software programs to help journalists identify and correct false rumors spreading on Twitter.
- Journalism students are using electronic sensors that monitor dust and noise to investigate construction sites.
- Journo-hackers are developing tools that use artificial intelligence to pull story ideas from big, complicated data sets.
- And so much more.
Saturday Oct. 25, 8:55 a.m.
Day 2 of Computation + Journalism Livestream and Day 1 Archive
If you can’t make it to New York today, you can watch the symposium livestream and follow along
The archived video from yesterday’s sessions is also available.
Friday Oct. 24, 3:55 p.m.
The Challenge of Convincing Students to Do Data Journalism
As a journalism educator who specializes in working with students on data journalism projects, I feel Jonathan Hewett’s pain.
Even in 2014, most of the journalism students I encounter think of journalism through the frame of text stories or broadcast videos. I have to work very hard to recruit students who are interested in data reporting, statistical analysis and data visualization.
Hewett, of City University London, introduced a graduate track in data journalism three years ago. He told the audience here that it can be challenging to attract students interested in data journalism. That’s true, even though skilled practitioners have very strong career prospects in this golden age of data news.
For one, he said, there’s the fact that many journalism students seem to be allergic to math and statistics, for some reason. Two, few students come into the program understanding that “data journalist” is a real job that exists.
His students, like many I interact with, often come into school with too narrow of an idea of what it means to be a journalist. He thinks it’s more of a challenge in the U.K., which does not have the same legacy of data journalism as the U.S.
Friday Oct. 24, 3:30 p.m.
Using Sensors to Do Investigative Reporting
Journalism schools looking to do a cutting-edge student project would do well to look at the workshop led by Fergus Pitt at Columbia University this summer.
He taught a team of eight students to use sensors — electronics that collect environmental data — for an investigative journalism project. The students used sensors that monitor noise and dust levels to look at the impact of a building construction site in New York.
We’re starting to see more sensor-driven journalism projects in the wild. Pitt said he expects these projects to become more commonplace as the practice of data journalism spreads and the skills to build sensors gains wider distribution.
That’s a good reason to expose students to this kind of reporting before they leave school. For those who are interested in learning more about the sensor workshop, Pitt wrote up a piece for AJR offering more details.
Friday Oct. 24, 2:25 p.m.
Use These Tools to Stop Tweeting Fake Stuff
We’ve all been there. A salacious, seemingly newsworthy tweet scrolls by, and a split-second decision is required: to retweet or not retweet? No one wants to participate in the spreading of a hoax or a false rumor.
But it’s hard to find out quickly if a tweet is true or false. Takis Metaxas (of Wellesley College) and Paul Resnick (of the University of Michigan), in separation presentations, discussed prototype applications developed by their teams that detect the spread of false information on Twitter. Tikas and Resnick both wrote user-friendly summaries of their projects for AJR, which are worth a read. Metaxas’ is called TRAILS, Resnick’s is called Rumorlens.
Friday Oct. 24, 2:05 p.m.
The View from Here
A quick look at the symposium room, to give you all a sense of place:
Friday Oct. 24, 2:00 p.m.
Towards an Automatic Fact Checking Site
Fact checking sites like PolitiFact and Factcheck.org perform a valuable public service, letting us know when a politician bends the truth. A team at Duke University has developed a prototype system that automates the process of finding things to fact check and then, fact checking it.
(The team wrote up a user-friendly summary of their research for AJR).
Automating the process is not an easy task, explained Brett Walenz, a Duke University computer science doctoral student. Consider the following real claim made by the Republican opponent of Sen. Kay Hagan, D-N.C.: Kay Hagan “rubber stamped” the Obama agenda 95 percent of the time. How do you fact check that? In order for a human to check that claim, they need to determine what “Obama agenda” means.
In this case, her opponent was referring to votes on bills publicly supported by the president. But human fact checkers go one step further, “beyond correctness,” according to Walenz. They evaluate the context of the claim, to see whether it’s both correct and fair.
So while it’s true that Hagan voted with Obama 95 percent of the time, Walenz pointed out that she actually voted for Obama less than many Democrats, as this graphic shows:
A good human fact checker will take all of those things into account when deciding whether a claim is true, false or “true, but.” It’s not an easy task for a human fact-checker. Automating the process so that a computer can do it is a real challenge.
Friday Oct. 24, 1:30 p.m.
Symposium Live Stream
The conference is starting up again after lunch with some interesting discussions of rumor detection. If you didn’t make it to New York, You can watch a livestream of the proceedings here:
Friday Oct. 24, 12:05 p.m.
A Data Collection Nightmare
The next time someone hears me complain about acquiring data for a journalism project, please remind me to shut up and recall Wall Street Journal reporter Rob Barry’s efforts. For a series of stories on problem stock brokers, Barry had to gather securities violation data directly from states, many of which disputed that they even owned the data (spoiler: they did own the data). One example freedom of information act response that was not factually correct:
And some states had no idea how to get the data to him, so he had to do this:
Ultimately, he received the data in an impressive array of formats — XLS, CSV, XML and more — so he had to hack them together into a usable format.
For those of you who aren’t used to working with large data sets: 111 million rows is…a lot of data. Ultimately, the effort produced stories like this one:
Friday Oct. 24, 11:15 a.m.
Making Rankings More Transparent
Nick Diakopoulos and his colleagues took a stab at building a user interface to make the logic behind ranking systems more transparent and adjustable by the user. (Diakopoulos wrote an explanation of the transparent ranking system for AJR.) News organizations love to rank things. U.S. News is the powerhouse in college rankings. Money publishes a ranking of the best cities. BuzzFeed ranks everything from Disney Channel movies to the men of “Gilmore Girls”.
The last two are obviously not based on any kind of thought out system — they’re just the author’s opinion. (Are they right? I have no idea. Didn’t watch Gilmore Girls or the Disney Channel). The colleges and cities rankings are based on criteria established by the editors and are based on a variety of data points. But the algorithm used to produce the rankings from the data should be much easier for users to understand.
The system Diakopoulos described is one I’d love to see broadly applied by news organizations doing rankings. It allows users to see exactly what data is used to determine the rankings (in this case of software languages) and, importantly, adjust the weights of each data set to get a tailored set of rankings.
BuzzFeed, are you listening?
Friday Oct. 24, 10:55 a.m.
Investigating Textbook Shortages with Artificial Intelligence
To investigate a shortage of textbooks in Philadelphia schools, Meredith Broussard built a software tool that uses artificial intelligence to find stories in a complex data set. She wrote a piece for AJR describing the system. The cool thing: the code is open source and and available on Github. Broussard said she thinks reporters in other cities could use the system to find similar stories.
Friday Oct. 24, 10:30 a.m.
Predicting the Spread of Photos on Facebook
It’s a question that all social media editors want to know: how can I tell if something is going to go viral on Facebook? News organizations like Buzzfeed seem to know. And computer scientists who study social networks have been working hard to answer that question by studying the massive flow of information on Facebook. Keynote speaker Jon Kleinberg, a computer science professor at Cornell University, kicked off the symposium by discussing research on virality on social networks. He asked: Is it possible to predict how far a Facebook post will spread?
Some researchers think the problem is just too hard to answer well, he said, because they think there are too many variables to know whether something will be widely shared.
But Kleinberg, who has collaborated with Facebook’s data science team on research, pointed out that researchers have been able to learn some things about what makes content go viral by studying the spread of photos on Facebook.
First, he noted, that half of all photos are never reshared on Facebook. And those that do get shared, most are only shared a few times. But ones that get shared widely get very big; half of all reshares of photos occur in viral events — which he called “cascades” — of greater than 500 shares, according to one study.
In another study, researchers asked this question: if we look at the number of shares a photo has at a given point in time, can we predict how far it will spread eventually spread? It is possible, Kleinberg said, and the following attributes of the shared photos — and the people who share them — are predictive of the spread:
- Photos with a lot of shares soon after posting.
- An early share by someone outside of the network of the original poster.
- Photos with text overlaid (a funny cat meme, for example).
- Photos shared by people with a lot of friends and people who are active on Facebook.
Good point by Tyler Dukes:
Friday Oct. 24, 10:20 a.m.
Symposium Live Stream
If you didn’t make it to New York, You can watch a livestream of the proceedings here: