- Mining of Massive Datasets (PDF) — book by Stanford profs, focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.
- Lessons from Iceland’s Failed Crowdsourced Constitution (Slate) — Though the crowdsourcing moment could have led to a virtuous deliberative feedback loop between the crowd and the Constitutional Council, the latter did not seem to have the time, tools, or training necessary to process carefully the crowd’s input, explain its use of it, let alone return consistent feedback on it to the public.
- Thread a ZigBee Killer? — Thread is Nest’s home automation networking stack, which can use the same hardware components as ZigBee, but which is not compatible, also not open source. The Novell NetWare of Things. Nick Hunn makes argument that Google (via Nest) are taking aim at ZigBee: it’s Google and Nest saying “ZigBee doesn’t work”.
"data journalism" entries
Knight news announced the seven newest winners of their news challenge grants this week. For this round, the challenge focused on health data. The projects include a personal monitor which will allow people to do their own chemical analysis of their environments, and an online portal where people can volunteer their personal health information to aid in medical research.
German journalists at the online news outlet Mittendrin, in partnership with Open Data City, have developed an app that allows members of the public to alert a journalist when they witness a newsworthy event, like a police action or spontaneous demonstration. The ‘Call A Journalist’ app will contact a journalist and deliver your GPS information along with your report. The best part is that after your information is relayed, the app will let you know that a journalist is on the way. Now, why didn’t I think of that?
According to the Committee to Protect Journalists, 2013 was the second worst year on record for imprisoning journalists around the world for doing their work.
Which makes this story from PBS Idea Lab all the more important: How Journalists Can Stay Secure Reporting from Android Devices. There are tips here on how to anonymize data flowing through your phone using Tor, an open network that helps protect against traffic analysis and network surveillance. Also, there is information about video publishing software that facilitates YouTube posting, even if the site is blocked in your country. Very cool.
The Neiman Lab is publishing an ongoing series of Predictions for Journalism in 2014, and, predictably, the idea of harnessing data looms large. Hassan Hodges, director of innovation for the MLive Media Group, says that in this new journalism landscape, content will start to look more like data and data will look more like content. Poderopedia founder Miguel Paz says that news organizations should fire the consultants and hire more nerds. There are 51 contributions so far, and counting. It’s good reading.
A new data working group from BBC News, a data library adds uploading capabilities, and a timeline of data journalism.
BBC News is the latest media company to create a working group tasked with developing “innovative and experimental” journalism projects. The BBC ‘NewsLabs’ team will focus on data journalism and data visualization. The Guardian calls it a ‘back to the future’ move by the BBC’s new managing editor, James Harding.
After Washington Post owner Jeff Bezos announced this week that that Amazon may soon be making customer deliveries by drone, USA TODAY wondered whether newspaper delivery boys in Bezos’ jurisdiction should be worried.
The New York Times is replacing Nate Silver’s FiveThirtyEight blog (which Silver took to ESPN back in July) with a brand new site intended to “produce clear analytical reporting and writing on opinion polls, economic indicators, politics, policy, education, and sports.” The venture will be headed by D.C. bureau chief David Leonhardt, who also helmed the search committee and selected himself for the job. Naturally, his colleagues are teasing Leonhardt for “pulling a Dick Cheney.” The new team will also include presidential historian Michael Beschloss, Nate Cohn of The New Republic, and economist Justin Wolfers.
Take it from me — If you are short on time, do not even attempt to play around on the new Spending Stories website. Developed by the folks at Open Knowledge Foundation and Journalism++, Spending Stories is intended to help journalists understand and contextualize spending data by making easy comparisons to other data. For example, using the site, I was able to see that $15,000 US dollars is equal to 3% of private ambulance costs in Yorkshire, England; 0.02% of the cost of the contract awarded to IT company CGI for implementing healthcare.gov; and 90% of government spending per person per year in the UK in 2012. It’s a fun tool!
The ProPublica Nerd Blog this week features an article by Hassel Fallas, a data journalist at La Nación in Costa Rica. Fallas was a 2013 Fellow at the International Center for Journalists, where she studied up on Data-Driven Journalism’s Secrets. Spoiler alert: The secret is…don’t keep secrets.
Over at the data-driven journalism blog, A Fundamental Way Data Repositories Must Change includes some fascinating examples of how data has been historically manipulated in Romania and Rwanda, including some examples from the present day.
Google Chrome’s new extension, Knoema, provides access to more than 500 data repositories and provides visualization tools for use with those databases. Knoema’s CTO says the platform can be used solely as a data source, but more importantly, it can be used as a tool for journalists to create embeddable visualisations. Pretty cool.
Data journalism and social media merge, a call-out for ‘crap’ data journalism, and tips for creating a data resume.
I suppose it was only a matter of time before the worlds of data journalism and social media cozied up and got comfortable. The London office of the Trinity Mirror announced that their new initiative, Mysterious Project Y, will focus on creating data journalism that will be compelling to share on the social Web. The site will focus on visualizations; “charts, graphs, facts, and figures” that “people care passionately about.”
You may have heard the statistic floating around lately that it is more difficult to land a job at a new location of the supermarket chain Wegmans than it is to gain admittance to Harvard University. The dragonflyeye blog refutes the numbers, and says that the story is an example of “crap data journalism.” Ouch.
As an aspiring journalist, I worked in the Washington Post newsroom in an entry-level position that used to be known as a “copy boy”. (Later updated to the more inclusive “copy aide.”) I loved taking in the energy of the reporters, especially when they had pulled off a “scoop,” or a story that the other papers didn’t have yet. There was a pall over the newsroom when other papers “scooped” us, and published a story that the Post reporters had been too slow to report.
Finnish data journalist Esa Makinen says that data visualizations are journalism’s “new scoop.” Text stories can be quickly re-published by competitors, Makinen told journalism.co.uk, but data visualizations can not be copied. Makinen works on the data desk at Finland’s daily paper and website, Helsingin Sanomat, and spoke this week at the Digital Journalism Days conference in Warsaw.
A new tablet-first investigative publication is in the works from a team of data journalists around the world. Acuerdo (an old Spanish word for ‘agreement’) bills itself as “long-form journalism for pissed off readers.” The first edition will be published next month in three languages. If you self-identify as a pissed-off reader, consider making a contribution to Acuerdo’s Kickstarter campaign.
Making corrections to data stories, a Brazilian hackday, and the ‘truth’ about big data and agnostic storytelling.
A few weeks ago in this space, I wrote about efforts to create a corrections policy for data journalists. It turns out, the Toronto Star needed this policy sooner rather than later, after a summer intern pitched and created a project that featured a searchable database of banned license plates, which included material from another Star reporter’s article three years prior. The Star published a public editor’s note about the issue. Does the problem of plagiarism become more complicated when it includes previously-reported data?
For those who are interested in the field of data journalism but unsure of where to start, The Data Journalism Heist offers a quick introduction. The e-book’s tagline: How to get in, get the data, and get the story out – and make sure nobody gets hurt