"realtime" entries

On reading Mike Barlow’s “Real-Time Big Data Analytics: Emerging Architecture”

Barlow's distilled insights regarding the ever evolving definition of real time big data analytics

Reading Barlow on a Sunday Afternoon

Reading Barlow on a Sunday afternoon

During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?”

“Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without saying anything aloud. He’s really good at that.

In my utterly meager defense, Mike Loukides is the editor on Mike Barlow’s Real-Time Big Data Analytics: Emerging Architecture. As Loukides is one of the core drivers behind O’Reilly’s book publishing program and someone who I perceive to be an unofficial boss of my own choosing, I am not really inclined to worry about things that I really don’t need to worry about. Then I started getting not-so-subtle inquiries from additional people asking if I would consider reviewing the manuscript for the Strata community site. This resulted in me emailing Loukides for a copy and sitting in a local cafe on a Sunday afternoon to read through the manuscript.

Read more…

Four short links: 14 February 2013

Four short links: 14 February 2013

Malware Industrial Complex, Indies Needed, TV Analytics, and HTTP Benchmarking

  1. Welcome to the Malware-Industrial Complex (MIT) — brilliant phrase, sound analysis.
  2. Stupid Stupid xBoxThe hardcore/soft-tv transition and any lead they feel they have is simply not defensible by licensing other industries’ generic video or music content because those industries will gladly sell and license the same content to all other players. A single custom studio of 150 employees also can not generate enough content to defensibly satisfy 76M+ customers. Only with quality primary software content from thousands of independent developers can you defend the brand and the product. Only by making the user experience simple, quick, and seamless can you defend the brand and the product. Never seen a better put statement of why an ecosystem of indies is essential.
  3. Data Feedback Loops for TV (Salon) — Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.
  4. wrka modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
Four short links: 21 January 2013

Four short links: 21 January 2013

School District Saves With Open Source, Apple ][ Presentation Tool, Tech Talks, and Realtime Dashboard

  1. School District Builds Own SoftwareBy taking a not-for-profit approach and using freely available open-source tools, Saanich officials expect to develop openStudent for under $5 million, with yearly maintenance pegged at less than $1 million. In contrast, the B.C. government says it spent $97 million over the past 10 years on the B.C. enterprise Student Information System — also known as BCeSIS — a provincewide system already slated for replacement.
  2. Giving a Presentation From an Apple ][A co-worker used an iPad to give a presentation. I thought: why take a machine as powerful as an early Cray to do something as low-overhead as display slides? Why not use something with much less computing power? From this asoft_presenter was born. The code is a series of C programs that read text files and generate a large Applesoft BASIC program that actually presents the slides. (via Jim Stogdill)
  3. AirBnB TechTalks — impressive collection of interesting talks, part of the AirBnB techtalks series.
  4. Gawker’s Realtime Dashboard — this is not just technically and visually cool, but also food for thought about what they’re choosing to measure and report on in real time (new vs returning split, social engagement, etc.). Does that mean they hope to be able to influence those variables in real time? (via Alex Howard)
Four short links: 17 January 2013

Four short links: 17 January 2013

Free Books, Analytics Goofs, Book Boilerplate, and Learn CS with the Raspberry Pi

  1. Free Book Sifter — lists all the free books on Amazon, has RSS feeds and newsletters. (via BoingBoing)
  2. Whom the Gods Would Destroy, They First Give Realtime Analytics — a few key reasons why truly real-time analytics can open the door to a new type of (realtime!) bad decision making. [U]ser demographics could be different day over day. Or very likely, you could see a major difference in user behavior immediately upon releasing a change, only to watch it evaporate as users learn to use new functionality. Given all of these concerns, the conservative and reasonable stance is to only consider tests that last a few days or more.
  3. Web Book Boilerplate (Github) — uses plain old markdown and generates a well structured HTML version of your written words. Since it’s sitting on top of Pandoc and Grunt, you can easily make your books available for every platform. MIT-style license.
  4. Raspberry Pi Education Manual (PDF) — from Scratch to Python and HCI all via the Raspberry Pi. Intended to be informative and a series of lessons for teachers and students learning coding with the Raspberry Pi as their first device.

Shark: Real-time queries and analytics for big data

Shark is 100X faster than Hive for SQL, and 100X faster than Hadoop for machine-learning

Hadoop’s strength is in batch processing, MapReduce isn’t particularly suited for interactive/adhoc queries. Real-time1 SQL queries (on Hadoop data) are usually performed using custom connectors to MPP databases. In practice this means having connectors between separate Hadoop and database clusters. Over the last few months a number of systems that provide fast SQL access within Hadoop clusters have garnered attention. Connectors between Hadoop and fast MPP database clusters are not going away, but there is growing interest in moving many interactive SQL tasks into systems that coexist on the same cluster with Hadoop.

Having a Hadoop cluster support fast/interactive SQL queries dates back a few years to HadoopDB, an open source project out of Yale. The creators of HadoopDB have since started a commercial software company (Hadapt) to build a system that unites Hadoop/MapReduce and SQL. In Hadapt, a (Postgres) database is placed in nodes of a Hadoop cluster, resulting in a system2 that can use MapReduce, SQL, and search (Solr). Now on version 2.0, Hadapt is a fault-tolerant system that comes with analytic functions (HDK) that one can use via SQL. Read more…

Four short links: 12 March 2012

Four short links: 12 March 2012

Inside Personalized Advertising, Printing Presses Were Good For The Economy, Digital Access, and Ebooks in Libraries

  1. Web-Scale User Modeling for Targeting (Yahoo! Research, PDF) — research paper that shows how online advertisers build profiles of us and what matters (e.g., ads we buy from are more important than those we simply click on). Our recent surfing patterns are more relevant than historical ones, which is another indication that value of data analytics increases the closer to real-time it happens. (via Greg Linden)
  2. Information Technology and Economic Change — research showing that cities which adopted the printing press no prior growth advantage, but subsequently grew far faster than similar cities without printing presses. […] The second factor behind the localisation of spillovers is intriguing given contemporary questions about the impact of information technology. The printing press made it cheaper to transmit ideas over distance, but it also fostered important face-to-face interactions. The printer’s workshop brought scholars, merchants, craftsmen, and mechanics together for the first time in a commercial environment, eroding a pre-existing “town and gown” divide.
  3. They Just Don’t Get It (Cameron Neylon) — curating access to a digital collection does not scale.
  4. Should Libraries Get Out of the Ebook Business? — provocative thought: the ebook industry is nascent, a small number of patrons have ereaders, the technical pain of DRM and incompatible formats makes for disproportionate support costs, and there are already plenty of worthy things libraries should be doing. I only wonder how quickly the dynamics change: a minority may have dedicated ereaders but a large number have smartphones and are reading on them already.
Four short links: 22 September 2011

Four short links: 22 September 2011

Feedback, Open Source Marketing, Programming in the Browser, and Twitter's Open Source Realtime Engine

  1. Implicit and Explicit Feedback — for preferences and recommendations, implicit signals (what people clicked on and actually listened to) turn out to be strongly correlated with what they would say if you asked. (via Greg Linden)
  2. Pivoting to Monetize Mobile Hyperlocal Social Gamification by Going Viral — Schuyler Erle’s stellar talk at the open source geospatial tools conference. Video, may cause your sides to ache.
  3. repl.it — browser-based environment for exploring different programming languages from FORTH to Python and Javascript by way of Brainfuck and LOLCODE.
  4. Twitter Storm (GitHub) — distributed realtime computation system, intended for realtime what Hadoop is to batch processing. Interesting because you improve most reporting and control systems when you move them closer to real-time. Eclipse-licensed open source.

The application of real-time data

Hilary Mason on how Bitly applies the Internet's real-time data.

In this interview, Bitly chief scientist and Strata speaker Hilary Mason discusses the application of real-time data and the difference between analytics and data science.

Top Stories: July 25-29, 2011

Data and education, real-time data, what publishers can learn from startups.

This week on O'Reilly: We looked at how data can help education, Theo Schlossnagle made the case for real-time business data, and we learned that tech startups can teach publishers a thing or two.

Who are the OSCON data geeks?

OSCON's co-chairs dig into the OSCON Data program.

OSCON's co-chairs discuss sessions in the OSCON Data conference and the people who might be interested in the associated topics.