- Teaching Programming to a Highly Motivated Beginner (CACM) — I don’t think there is any better way to internalize knowledge than first spending hours upon hours growing emotionally distraught over such struggles and only then being helped by a mentor. Me, too. Not struggle for struggle’s sake, but because you have built a strong mental map of the problem into which the solution can lock.
- Corona (GitHub) — Facebook opensources their improvements to Hadoop’s job tracking, in the name of scalability, latency, cluster utilization, and fairness. (via Chris Aniszczyk)
- One Man’s Trash (Bunnie Huang) — Bunnie finds a Chumby relic in a Shenzhen market stall.
- Dronestagram — posting pictures of drone strike locations to Instagram. (via The New Aesthetic)
ENTRIES TAGGED "Hadoop"
Four short links: 12 November 2012
Motivated Learning, Better Hadoopery, Poignant Past Product, and Drone Imagery
Four short links: 25 October 2012
Big Data's Big Picture, Real-Time Queries, Real-Time Queries, Single-Process Real-Time Queries
- Big Data: the Big Picture (Vimeo) — Jim Stogdill’s excellent talk: although Big Data is presented as part of the Gartner Hype Cycle, it’s an epoch of the Information Age which will have significant effects on the structure of corporations and the economy.
- Impala (github) — Cloudera’s open source (Apache) implementation of Google’s F1 (PDF), for realtime queries across clusters. Impala is different from Hive and Pig because it uses its own daemons that are spread across the cluster for queries. Furthermore, Impala does not leverage MapReduce, allowing Impala to return result in real-time. (via Wired)
- druid (github) — open source (GPLv2) a distributed, column-oriented analytical datastore. It was originally created to resolve query latency issues seen with trying to use Hadoop to power an interactive service. See also the announcement of its open-sourcing.
- Supersonic (Google Code) — an ultra-fast, column oriented query engine library written in C++. It provides a set of data transformation primitives which make heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs. It is designed to work in a single process. Apache-licensed.
Seven reasons why I like Spark
Spark is becoming a key part of a big data toolkit.
Heavy data and architectural convergence
Data is getting heavier relative to the networks that carry it around the data center.
Imagine a future where large clusters of like machines dynamically adapt between programming paradigms depending on a combination of the resident data and the required processing.
Four short links: 9 July 2012
Personalized Medicine, Reporting on Execution, Software-Defined Radio, and Beyond Hadoop
- Personalized Leukemia Treatment (NY Times) — sequenced the tumor’s DNA, found the misbehaving gene, realized there was an existing experimental treatment to tackle that gene, and it worked. Reminds me of My Daughter’s DNA, which had its origin in the poignant story of Hugh Reinhoff sequencing his daughter’s DNA to diagnose her condition. It’s all about medical professionals now, but that’s no different from the Internet starting with geeks and moving out to the masses.
- Bullseye HD — web app which allows you to make the most of the time you spend with your team, by focusing your attention on the projects and actions that are off-track or not getting enough focus, rather than wasting precious time on status updates. (via Rowan Simpson)
- Per Vices — selling software-defined radio boards (for Linux only at the moment). (via Ars Technica)
- Post-Hadoop (GigaOm) — Google have moved beyond the basic software that Hadoop was copying. Lots of interesting points in this article, including one fundamental reality – MapReduce (and thereby Hadoop) is purpose-built for organized data processing (jobs). It is baked from the core for workflows, not ad hoc exploration.
Strata Week: Data prospecting with Kaggle
Kaggle now accepting data before a contest, HP's Autonomy purchase comes into focus, Cloudera's new Hadoop distribution.
In this week's data news, Kaggle launches Prospect, HP unveils its big data plans, and Cloudera releases CDH4 (the latest version of its Hadoop distribution).
Strata Week: Visualizing a better life
A visualization tool from the OECD, concerns about open data and research, and updates to Hadoop.
In this week's data news, a visualization tool charts your "better life," researchers have concerns about access to data, and updates to Hadoop.
Strata Week: Google unveils its Knowledge Graph
Google shows off its Knowledge, Yahoo stumbles, and a bill cuts some census funding.
In this week's data news, Google updates its search features with a Knowledge Graph, while the U.S. House of Representatives de-funds surveys that helped businesses construct theirs.
The chicken and egg of big data solutions
Are solution vendors waiting for broad Hadoop adoption before jumping in?
So, here we are with all of this disruptive big data technology, but we seem to have lost the institutional wherewithal to do anything with it in a lot of large companies, at least until package solutions come along.
Radar
Radar on
Radar on
Radar on
Radar on 