- Is It The Internet of Things? — we’ve moved from “they ignore you” to “they laugh at you”. Next up, “they fight you”, then finally the earless RFID-enabled location-aware ambient-sensing Network of All wins. (via BERG London)
- The 2012 We Could Have Had — list of famous and interesting works which would have entered the public domain had we not had the 1976 extension of copyright law.
- Web Engineer’s Online Toolbox — a list of online, Web-based tools that Web engineers can use for their work in development, testing, debugging and documentation.
- Indianapolis Museum of Art Dashboard — everyone should have a HUD showing the things they care about. (via Courtney Johnston)
"real time" entries
The simplest and quickest way to mine your data is to deploy efficient algorithms designed to answer key questions at scale.
For many organizations real-time1 analytics entails complex event processing systems (CEP) or newer distributed stream processing frameworks like Storm, S4, or Spark Streaming. The latter have become more popular because they are able to process massive amounts of data, and fit nicely with Hadoop and other cluster computing tools. For these distributed frameworks peak volume is function of network topology/bandwidth and the throughput of the individual nodes.
Scaling up machine-learning: Find efficient algorithms
Faced with having to crunch through a massive data set, the first thing a machine-learning expert will try to do is devise a more efficient algorithm. Some popular approaches involve sampling, online learning, and caching. Parallelizing an algorithm tends to be lower on the list of things to try. The key reason is that while there are algorithms that are embarrassingly parallel (e.g., naive bayes), many others are harder to decouple. But as I highlighted in a recent post, efficient tools that run on single servers can tackle large data sets. In the machine-learning context recent examples2 of efficient algorithms that scale to large data sets, can be found in the products of startup SkyTree.
Barlow's distilled insights regarding the ever evolving definition of real time big data analytics
During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?”
“Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without saying anything aloud. He’s really good at that.
In my utterly meager defense, Mike Loukides is the editor on Mike Barlow’s Real-Time Big Data Analytics: Emerging Architecture. As Loukides is one of the core drivers behind O’Reilly’s book publishing program and someone who I perceive to be an unofficial boss of my own choosing, I am not really inclined to worry about things that I really don’t need to worry about. Then I started getting not-so-subtle inquiries from additional people asking if I would consider reviewing the manuscript for the Strata community site. This resulted in me emailing Loukides for a copy and sitting in a local cafe on a Sunday afternoon to read through the manuscript.
Internet of Zings, Public Domain Alternate Universe, Web Engineers Tools, and Dashboards for All
Civil Drones, Fencing the Public Domain, Quantified Spy, and Data Daemons for Fun and Metrics
- Helping Drones Play Nice With Other Aviation — The U.S. airspace is quickly being filled with simultaneously flying drones. To such an extent, unmanned aircraft could soon become a nightmare for the ATC controllers. The ADS-B will improve Predator B’s crew situational awareness making the drone capable to operate more freely and safely in domestic and international airspace in accordance with civilian air traffic and airspace rules and regulations.
- Reclaiming NZ’s Digitised Heritage — Out of a sample of 100 books: 50% of NZ Heritage Books (published before 1890) have been digitised; 90% of digitised texts are fully accessible; 98% of accessible texts are downloadable; Despite all works being in the public domain, only one did not have any licencing restrictions applied to its use. Most groups who digitise then go on to put restrictions around their use. [T]here are also many instances where arbitrary restrictions are being applied to the detriment of the public good.
- Self-Spy (GitHub) — Log everything you do on the computer, for statistics, future reference and all-around fun!
- statsd (GitHub) — Etsy’s data-gathering daemon, written up in an excellent blog post.
Theo Schlossnagle on the state of real-time data analysis and where it needs to go.
Real-time data analysis has come a long way, but Theo Schlossnagle, principal and CEO of OmniTI, says some technology improvements are actually causing a data analysis devolution.
Open Road gets aggressive with adaptation and real-time marketing.
Being digital isn’t the novelty it once was, so some publishing companies are shifting focus to competitive differentiation within digital. Jane Friedman’s company Open Road Integrated Media believes aggressive marketing is the key to digital success.
WYSIWYG HTML5 UIs, Hacker News, Real Time, and Web 2.0
- Maqetta — open source (modified BSD) WYSIWYG HTML5 user interface editor from the Dojo project. (via Hacker News)
- Hacker News Analysis — interesting to see relationship between number of posts, median score, and quality over time. Most interesting, though, was the relative popularity of different companies. (via Hacker News)
- Real Time All The Time (Emily Bell) — Every news room will have to remake itself around the principle of being reactive in real time. Every page or story that every news organisation distributes will eventually show some way of flagging if the page is active or archived, if the conversation is alive and well or over and done with. Every reporter and editor will develop a real time presence in some form, which makes them available to the social web. When I say “will” I of course don’t mean that literally . I think many of them won’t, but eventually they will be replaced by ones who do. (via Chris Saad)
- Changes in Home Broadband (Pew Internet) — Jeff Atwood linked to this, simply saying “Why Web 1.0 didn’t work and Web 2.0 does, in a single graph.” Ajax and web services and the growing value of data were all important, but nothing’s made the web so awesome as all the people who can now access it. (via Jeff Atwood)
Data markets, real-time technology, and the race for developers
To conclude our Strata Gems series, we take a look at the important drivers for the data world in 2011: data markets, real-time data processing, and developers.
Crowdfunding, Biogrown Blood, MakerBot Spawn, and Real-Time Data
- Reasons for Artists and Fans to Consider Crowdfunding — the number of fans acquiring music outside traditional and/or legal means is, well, the majority. Plenty of examples of bands raising money outside the label system.
- DARPA’s Blood Makers Start Pumping (Wired) — biomanufactured blood. The blood was produced using hematopoietic cells, derived from embryonic cord-blood units. Currently, it takes Arteriocyte scientists three days to turn a single umbilical cord unit into 20 units of RBC-packed blood. The average soldier needs six units during trauma treatment. (via rdiva on Twitter)
- Self-Reproducing Makerbot — a community member popped up, out of the blue, and posted the designs for a MakerBot assembled from 150 pieces that a MakerBot can print, a-la the RepRap (whose design MakerBot is based on). (via Quinn Norton)
- Real Time Real World Statistics — I can’t wait to see what happens when we get real-time AND open data together. (via jessykate on Twitter)