ENTRIES TAGGED "Big Data"

Four short links: 5 December 2013

Four short links: 5 December 2013

R GUI, Drone Regulations, Bitcoin Stats, and Android/iOS Money Shootout

  1. DeducerAn R Graphical User Interface (GUI) for Everyone.
  2. Integration of Civil Unmanned Aircraft Systems (UAS) in the National Airspace System (NAS) Roadmap (PDF, FAA) — first pass at regulatory framework for drones. (via Anil Dash)
  3. Bitcoin Stats — $21MM traded, $15MM of electricity spent mining. Goodness. (via Steve Klabnik)
  4. iOS vs Android Numbers (Luke Wroblewski) — roundup comparing Android to iOS in recent commerce writeups. More Android handsets, but less revenue per download/impression/etc.
Comment: 1 |
Four short links: 3 December 2013

Four short links: 3 December 2013

  1. SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
  2. madliban open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
  3. Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
  4. Disguise Detection — using Raspberry Pi, Arduino, and Python.
Comment |
Four short links: 11 November 2013

Four short links: 11 November 2013

Squid in the Dark, Beautiful Automation, Fan Criticism, and Petabyte Queries

  1. Living Light — 3D printed cephalopods filled with bioluminescent bacteria. PAGING CORY DOCTOROW, YOUR ORGASMATRON HAS ARRIVED. (via Sci Blogs)
  2. Repacking Lego Batteries with a CNC Mill — check out the video. Patrick programmed a CNC machine to drill out the rivets holding the Mindstorms battery pack together. Coding away a repetitive task like this is gorgeous to see at every scale. We don’t have to teach our kids a particular programming language, but they should know how to automate cruft.
  3. My Thoughts on Google+ (YouTube) — when your fans make hatey videos like this one protesting Google putting the pig of Google Plus onto the lipstick that was YouTube, you are Doin’ It Wrong.
  4. Presto: Interacting with Petabytes of Data at Facebooka distributed SQL query engine optimized for ad-hoc analysis at interactive speed. It supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. For details, see the Facebook post about its launch.
Comment |
Four short links: 5 November 2013

Four short links: 5 November 2013

Time Series Database, Cluster Schedulers, Structural Search-and-Replace, and TV Data

  1. Influx DBopen-source, distributed, time series, events, and metrics database with no external dependencies.
  2. Omega (PDF) — flexible, scalable schedulers for large compute clusters. From Google Research.
  3. GraspJSSearch and replace your JavaScript code based on its structure rather than its text.
  4. Amazon Mines Its Data Trove To Bet on TV’s Next Hit (WSJ) — Amazon produced about 20 pages of data detailing, among other things, how much a pilot was viewed, how many users gave it a 5-star rating and how many shared it with friends.
Comment: 1 |
Four short links: 31 October 2013

Four short links: 31 October 2013

Flying Robot, State of Cyberspace, H.264, and Principal Component Analysis

  1. Insect-Inspired Collision-Resistant Robot — clever hack to make it stable despite bouncing off things.
  2. The Battle for Power on the Internet (Bruce Schneier) — the state of cyberspace. [M]ost of the time, a new technology benefits the nimble first. [...] In other words, there will be an increasing time period during which nimble distributed powers can make use of new technologies before slow institutional powers can make better use of those technologies.
  3. Cisco’s H.264 Good News (Brendan Eich) — Cisco is paying the license fees for a particular implementation of H.264 to be used in open source software, enabling it to be the basis of web streaming video across all browsers (even the open source ones). It’s not as ideal a solution as it might sound.
  4. Principal Component Analysis for DummiesThis post will give a very broad overview of PCA, describing eigenvectors and eigenvalues (which you need to know about to understand it) and showing how you can reduce the dimensions of data using PCA. As I said it’s a neat tool to use in information theory, and even though the maths is a bit complicated, you only need to get a broad idea of what’s going on to be able to use it effectively.
Comment: 1 |
Four short links: 30 October 2013

Four short links: 30 October 2013

Offline Javascript, Android Coding, Stats Fails, and Stream Data

  1. Offline.js — Javascript library so web app developers can gracefully deal with users going offline.
  2. Android Guideslots of info on coding for Android.
  3. Statistics Done Wrong — learn from these failure modes. Not medians or means. Modes.
  4. Streaming, Sketching, and Sufficient Statistics (YouTube) — how to process huge data sets as they stream past your CPU (e.g., those produced by sensors). (via Ben Lorica)
Comment |
Four short links: 29 October 2013

Four short links: 29 October 2013

Digital Citizenship, Berg Cloud, Data Warehouse, and The Spying Iron

  1. Mozilla Web Literacy Standard — things you should be able to do if you’re to be trusted to be on the web unsupervised. (via BoingBoing)
  2. Berg Cloud Platform — hardware (shield), local network, and cloud glue. Caution: magic ahead!
  3. Sharka large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive’s query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. (via Strata)
  4. The Malware of Thingsa technician opening up an iron included in a batch of Chinese imports to find a “spy chip” with what he called “a little microphone”. Its correspondent said the hidden devices were mostly being used to spread viruses, by connecting to any computer within a 200m (656ft) radius which were using unprotected Wi-Fi networks.
Comment |

Mining the social web, again

If you want to engage with the data that's surrounding you, Mining the Social Web is the best place to start.

When we first published Mining the Social Web, I thought it was one of the most important books I worked on that year. Now that we’re publishing a second edition (which I didn’t work on), I find that I agree with myself. With this new edition, Mining the Social Web is more important than ever. While we’re seeing more…
Read Full Post | Comments: 2 |
Four short links: 18 October 2013

Four short links: 18 October 2013

Publishing Bad Research, Reproducing Research, DIY Police Scanner, and Inventing the Future

  1. Science Not as Self-Correcting As It Thinks (Economist) — REALLY good discussion of the shortcomings in statistical practice by scientists, peer-review failures, and the complexities of experimental procedure and fuzziness of what reproducibility might actually mean.
  2. Reproducibility Initiative Receives Grant to Validate Landmark Cancer StudiesThe key experimental findings from each cancer study will be replicated by experts from the Science Exchange network according to best practices for replication established by the Center for Open Science through the Center’s Open Science Framework, and the impact of the replications will be tracked on Mendeley’s research analytics platform. All of the ultimate publications and data will be freely available online, providing the first publicly available complete dataset of replicated biomedical research and representing a major advancement in the study of reproducibility of research.
  3. $20 SDR Police Scanner — using software-defined radio to listen to the police band.
  4. Reimagine the Chemistry Set — $50k prize in contest to design a “chemistry set” type kit that will engage kids as young as 8 and inspire people who are 88. We’re looking for ideas that encourage kids to explore, create, build and question. We’re looking for ideas that honor kids’ curiosity about how things work. Backed by the Moore Foundation and Society for Science and the Public.
Comment: 1 |
Four short links: 16 October 2013

Four short links: 16 October 2013

New Math, Business Math, Summarising Text, Clipping Images

  1. Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It (Jennifer Ouellette) — Yale University mathematician Ronald Coifman says that what is really needed is the big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus, which he believes is already underway.
  2. Is Google Jumping the Shark? (Seth Godin) — Public companies almost inevitably seek to grow profits faster than expected, which means beyond the organic growth that comes from doing what made them great in the first place. In order to gain that profit, it’s typical to hire people and reward them for measuring and increasing profits, even at the expense of what the company originally set out to do. Eloquent redux.
  3. textteaser — open source text summarisation algorithm.
  4. Clipping MagicInstantly create masks, cutouts, and clipping paths online.
Comment |