- Free Book Sifter — lists all the free books on Amazon, has RSS feeds and newsletters. (via BoingBoing)
- Whom the Gods Would Destroy, They First Give Realtime Analytics — a few key reasons why truly real-time analytics can open the door to a new type of (realtime!) bad decision making. [U]ser demographics could be different day over day. Or very likely, you could see a major difference in user behavior immediately upon releasing a change, only to watch it evaporate as users learn to use new functionality. Given all of these concerns, the conservative and reasonable stance is to only consider tests that last a few days or more.
- Web Book Boilerplate (Github) — uses plain old markdown and generates a well structured HTML version of your written words. Since it’s sitting on top of Pandoc and Grunt, you can easily make your books available for every platform. MIT-style license.
- Raspberry Pi Education Manual (PDF) — from Scratch to Python and HCI all via the Raspberry Pi. Intended to be informative and a series of lessons for teachers and students learning coding with the Raspberry Pi as their first device.
Free Books, Analytics Goofs, Book Boilerplate, and Learn CS with the Raspberry Pi
Shark is 100X faster than Hive for SQL, and 100X faster than Hadoop for machine-learning
Hadoop’s strength is in batch processing, MapReduce isn’t particularly suited for interactive/adhoc queries. Real-time1 SQL queries (on Hadoop data) are usually performed using custom connectors to MPP databases. In practice this means having connectors between separate Hadoop and database clusters. Over the last few months a number of systems that provide fast SQL access within Hadoop clusters have garnered attention. Connectors between Hadoop and fast MPP database clusters are not going away, but there is growing interest in moving many interactive SQL tasks into systems that coexist on the same cluster with Hadoop.
Having a Hadoop cluster support fast/interactive SQL queries dates back a few years to HadoopDB, an open source project out of Yale. The creators of HadoopDB have since started a commercial software company (Hadapt) to build a system that unites Hadoop/MapReduce and SQL. In Hadapt, a (Postgres) database is placed in nodes of a Hadoop cluster, resulting in a system2 that can use MapReduce, SQL, and search (Solr). Now on version 2.0, Hadapt is a fault-tolerant system that comes with analytic functions (HDK) that one can use via SQL. Read more…
Inside Personalized Advertising, Printing Presses Were Good For The Economy, Digital Access, and Ebooks in Libraries
- Web-Scale User Modeling for Targeting (Yahoo! Research, PDF) — research paper that shows how online advertisers build profiles of us and what matters (e.g., ads we buy from are more important than those we simply click on). Our recent surfing patterns are more relevant than historical ones, which is another indication that value of data analytics increases the closer to real-time it happens. (via Greg Linden)
- Information Technology and Economic Change — research showing that cities which adopted the printing press no prior growth advantage, but subsequently grew far faster than similar cities without printing presses. […] The second factor behind the localisation of spillovers is intriguing given contemporary questions about the impact of information technology. The printing press made it cheaper to transmit ideas over distance, but it also fostered important face-to-face interactions. The printer’s workshop brought scholars, merchants, craftsmen, and mechanics together for the first time in a commercial environment, eroding a pre-existing “town and gown” divide.
- They Just Don’t Get It (Cameron Neylon) — curating access to a digital collection does not scale.
- Should Libraries Get Out of the Ebook Business? — provocative thought: the ebook industry is nascent, a small number of patrons have ereaders, the technical pain of DRM and incompatible formats makes for disproportionate support costs, and there are already plenty of worthy things libraries should be doing. I only wonder how quickly the dynamics change: a minority may have dedicated ereaders but a large number have smartphones and are reading on them already.
Feedback, Open Source Marketing, Programming in the Browser, and Twitter's Open Source Realtime Engine
- Implicit and Explicit Feedback — for preferences and recommendations, implicit signals (what people clicked on and actually listened to) turn out to be strongly correlated with what they would say if you asked. (via Greg Linden)
- Pivoting to Monetize Mobile Hyperlocal Social Gamification by Going Viral — Schuyler Erle’s stellar talk at the open source geospatial tools conference. Video, may cause your sides to ache.
- Twitter Storm (GitHub) — distributed realtime computation system, intended for realtime what Hadoop is to batch processing. Interesting because you improve most reporting and control systems when you move them closer to real-time. Eclipse-licensed open source.
Hilary Mason on how Bitly applies the Internet's real-time data.
In this interview, Bitly chief scientist and Strata speaker Hilary Mason discusses the application of real-time data and the difference between analytics and data science.
Data and education, real-time data, what publishers can learn from startups.
This week on O'Reilly: We looked at how data can help education, Theo Schlossnagle made the case for real-time business data, and we learned that tech startups can teach publishers a thing or two.
OSCON's co-chairs dig into the OSCON Data program.
OSCON's co-chairs discuss sessions in the OSCON Data conference and the people who might be interested in the associated topics.
Jud Valeski on how Gnip handles the Twitter fire hose.
Gnip CEO Jud Valeski talks about managing Twitter's fire hose and how the Internet's architecture must adapt to real-time needs.
A Pew report breaks down ereaders stats, Random House and Politico team up, and lessons from Pottermore
In the latest Publishing News: U.S. readers are buying ereaders at a good clip, the 2012 election will be covered in realtime by a book publisher, and publishers can learn a couple things from JK Rowling.
Poor Economics, Shrinking Web, Orphans Put to Work, Realtime Log Monitoring
- Poor Economics — this is possibly the best thing I will read all year, an insightful (and research-backed) book digging into the economics of poverty. Read the lecture slides online, they’ll give you a very clear taste of what the book’s about. Love that the website is so very complementary to the book, and 100% aligned with the ambition to convince and spread the word. Kindle-purchasable, too. Sample boggle (one of many): children of children born during the Chinese famine are smaller, and children who were in utero during Ramadan earn less as adults.
- The Web Is Shrinking (All Things D) — graph that makes Facebook look massively important and the rest of the web look insignificant. It doesn’t take into account the nature of the interaction (shopping? research? chat?), and depends heavily on the comScore visits metric being a reliable proxy for “use”. I’d expect to see other neutral measures of “use” decreasing (e.g., searches for “school holidays”) if overall web use were decreasing, yet they don’t seem to be. Nonetheless, Facebook has become the new millennium’s AOL: keywords, grandparents, and a zealous devotion to advertising. At least Facebook doesn’t send me #&#^%*ing CDs.
- Orphan Works Project (University of Michigan) — library will digitize orphaned works for researchers. Lovely to see someone breaking the paralysis that orphaned works induce. (via BoingBoing)
- log.io — node.js system for real-time log monitoring in your browser. (via Vasudev Ram)