- China’s $122BB Boom in Shadow Banking is Happening on Phones (Quartz) — Tencent’s recently launched online money market fund (MMF), Licai Tong, drew in 10 billion yuan ($1.7 billion) in just six days in the last week of January.
- The Weight of Rain — lovely talk about the thought processes behind coming up with a truly insightful visualisation.
- Data on Video Streaming Starting to Emerge (Giga Om) — M-Lab, which gathers broadband performance data and distributes that data to the FCC, has uncovered significant slowdowns in throughput on Comcast, Time Warner Cable and AT&T. Such slowdowns could be indicative of deliberate actions taken at interconnection points by ISPs.
ENTRIES TAGGED "visualization"
A new operator from the magrittr package makes it easier to use R for data analysis.
In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily flow from one tool to the next, focusing on asking and answering questions of the data, not struggling to jam the output from one function into the format needed for the next. Wouldn’t it be nice if the world worked this way! I spend a lot of my time thinking about this problem, and how to make the process of data analysis as fast, effective, and expressive as possible. Today, I want to show you a new technique that I’m particularly excited about.
R, at its heart, is a functional programming language: you do data analysis in R by composing functions. However, the problem with function composition is that a lot of it makes for hard-to-read code. For example, here’s some R code that wrangles flight delay data from New York City in 2013. What does it do? Read more…
D3 doesn’t stand for data-design dictator
Designers and developers making data visualizations on the web are buzzing about d3.js. But why? Read more…
- SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
- madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
- Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
- Disguise Detection — using Raspberry Pi, Arduino, and Python.
3D Fossils, Changing Drone Uses, High Scalability, and Sim Redux
- CT Scanning and 3D Printing for Paleo (Scientific American) — using CT scanners to identify bones still in rock, then using 3D printers to recreate them. (via BoingBoing)
- Growing the Use of Drones in Agriculture (Forbes) — According to Sue Rosenstock, 3D Robotics spokesperson, a third of their customers consist of hobbyists, another third of enterprise users, and a third use their drones as consumer tools. “Over time, we expect that to change as we make more enterprise-focused products, such as mapping applications,” she explains. (via Chris Anderson)
- Serving 1M Load-Balanced Requests/Second (Google Cloud Platform blog) — 7m from empty project to serving 1M requests/second. I remember when 1 request/second was considered insanely busy. (via Forbes)
- Boil Up — behind the scenes for the design and coding of a real-time simulation for a museum’s science exhibit. (via Courtney Johnston)
Disk Over Ethernet, Inside Elite, Polar Charts, and R Videos
- Seagate Kinetic Storage — In the words of Geoff Arnold: The physical interconnect to the disk drive is now Ethernet. The interface is a simple key-value object oriented access scheme, implemented using Google Protocol Buffers. It supports key-based CRUD (create, read, update and delete); it also implements third-party transfers (“transfer the objects with keys X, Y and Z to the drive with IP address 188.8.131.52”). Configuration is based on DHCP, and everything can be authenticated and encrypted. The system supports a variety of key schemas to make it easy for various storage services to shard the data across multiple drives.
- Masters of Their Universe (Guardian) — well-written and fascinating story of the creation of the Elite game (one founder of which went on to make the Raspberry Pi). The classic action game of the early 1980s – Defender, Pac Man – was set in a perpetual present tense, a sort of arcade Eden in which there were always enemies to zap or gobble, but nothing ever changed apart from the score. By letting the player tool up with better guns, Bell and Braben were introducing a whole new dimension, the dimension of time.
- Micropolar (github) — A tiny polar charts library made with D3.js.
- Introduction to R (YouTube) — 21 short videos from Google.
Visual Arduino Coding, Hardware Iteration, Segmenting Images, and Client-Side Adjustable Data View
- Visually Programming Arduino — good for little minds.
- Rapid Hardware Iteration at Scale (Forbes) — It’s part of the unique way that Xiaomi operates, closely analyzing the user feedback it gets on its smartphones and following the suggestions it likes for the next batch of 100,000 phones. It releases them every Tuesday at noon Beijing time.
- Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images (PLoS One) — We propose an active learning approach for performing hierarchical agglomerative segmentation from superpixels. Our method combines multiple features at all scales of the agglomerative process, works for data with an arbitrary number of dimensions, and scales to very large datasets.
- Kratu — an Open Source client-side analysis framework to create simple yet powerful renditions of data. It allows you to dynamically adjust your view of the data to highlight issues, opportunities and correlations in the data.
Rich Text Editing, Structural Visualisation, DDoS Protection, Realtime DDoS Map
- Sir Trevor — nice rich-text editing. Interesting how Markdown has become the way to store formatted text without storing HTML (and thus exposing the CSRF-inducing HTML-escaping stuckfastrophe).
- Slate for Excel — visualising spreadsheet structure. I’d be surprised if it took MSFT or Goog 30 days to acquire them.
- Project Shield — Google project to protect against DDoSes.
- Digital Attack Map — DDoS attacks going on around the world. (via Jim Stogdill)
No Managers, Bezos Pearls, Visualising History, and Scalable Key-Value Store
- No Managers — If we could find a way to replace the function of the managers and focus everyone on actually producing for our Students (customers) then it would actually be possible to be a #NoManager company. In my future posts I’ll explain how we’re doing this at Treehouse.
- The 20 Smartest Things Jeff Bezos Has Ever Said (Motley Fool) — I feel like the 219th smartest thing Jeff Bezos has ever said is still smarter than the smartest thing most business commentators will ever say. (He says, self-referentially) “Invention requires a long-term willingness to be misunderstood.”
- Putting Time in Perspective — nifty representations of relative timescales and history. (via BoingBoing)
- Sophia — BSD-licensed small C library implementing an embeddable key-value database “for a high-load environment”.