ENTRIES TAGGED "Big Data"

Four short links: 21 January 2014

Four short links: 21 January 2014

Mature Engineering, Control Theory, Open Access USA, and UK Health Data Too-Open?

  1. On Being a Senior Engineer (Etsy) — Mature engineers know that no matter how complete, elegant, or superior their designs are, it won’t matter if no one wants to work alongside them because they are assholes.
  2. Control Theory (Coursera) — Learn about how to make mobile robots move in effective, safe, predictable, and collaborative ways using modern control theory. (via DIY Drones)
  3. US Moves Towards Open Access (WaPo) — Congress passed a budget that will make about half of taxpayer-funded research available to the public.
  4. NHS Patient Data Available for Companies to Buy (The Guardian) — Once live, organisations such as university research departments – but also insurers and drug companies – will be able to apply to the new Health and Social Care Information Centre (HSCIC) to gain access to the database, called care.data. If an application is approved then firms will have to pay to extract this information, which will be scrubbed of some personal identifiers but not enough to make the information completely anonymous – a process known as “pseudonymisation”. Recipe for disaster as it has been repeatedly shown that it’s easy to identify individuals, given enough scrubbed data. Can’t see why the NHS just doesn’t make it an app in Facebook. “Nat’s Prostate status: it’s complicated.”
Comment |
Four short links: 15 January 2014

Four short links: 15 January 2014

SCADA Security, Graph Clustering, Facebook Flipbook, and Projections Illustrated

  1. Hackers Gain ‘Full Control’ of Critical SCADA Systems (IT News) — The vulnerabilities were discovered by Russian researchers who over the last year probed popular and high-end ICS and supervisory control and data acquisition (SCADA) systems used to control everything from home solar panel installations to critical national infrastructure. More on the Botnet of Things.
  2. mclMarkov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
  3. Facebook to Launch Flipboard-like Reader (Recode) — what I’d actually like to see is Facebook join the open web by producing and consuming RSS/Atom/anything feeds, but that’s a long shot. I fear it’ll either limit you to whatever circle-jerk-of-prosperity paywall-penetrating content-for-advertising-eyeballs trades the Facebook execs have made, or else it’ll be a leech on the scrotum of the open web by consuming RSS without producing it. I’m all out of respect for empire-builders who think you’re a fool if you value the open web. AOL might have died, but its vision of content kings running the network is alive and well in the hands of Facebook and Google. I’ll gladly post about the actual product launch if it is neither partnership eyeball-abuse nor parasitism.
  4. Map Projections Illustrated with a Face (Flowing Data) — really neat, wish I’d had these when I was getting my head around map projections.
Comment |
Four short links: 8 January 2014

Four short links: 8 January 2014

Cognition as a Service, Levy on NSA, SD-Sized Computer, and Learning Research

  1. Launching the Wolfram Connected Devices Project — Wolfram Alpha is cognition-as-a-service, which they hope to embed in devices. This data-powered Brain-in-the-Cloud play will pit them against Google, but G wants to own the devices and the apps and the eyeballs that watch them … interesting times ahead!
  2. How the USA Almost Killed the Internet (Wired) — “At first we were in an arms race with sophisticated criminals,” says Eric Grosse, Google’s head of security. “Then we found ourselves in an arms race with certain nation-state actors [with a reputation for cyberattacks]. And now we’re in an arms race with the best nation-state actors.”
  3. Intel Edison — SD-card sized, with low-power 22nm 400MHz Intel Quark processor with two cores, integrated Wi-Fi and Bluetooth.
  4. N00b 2 L33t, Now With Graphs (Tom Stafford) — open science research validating many of the findings on learning, tested experimentally via games. In the present study, we analyzed data from a very large sample (N = 854,064) of players of an online game involving rapid perception, decision making, and motor responding. Use of game data allowed us to connect, for the first time, rich details of training history with measures of performance from participants engaged for a sustained amount of time in effortful practice. We showed that lawful relations exist between practice amount and subsequent performance, and between practice spacing and subsequent performance. Our methodology allowed an in situ confirmation of results long established in the experimental literature on skill acquisition. Additionally, we showed that greater initial variation in performance is linked to higher subsequent performance, a result we link to the exploration/exploitation trade-off from the computational framework of reinforcement learning.
Comment |

The Snapchat Leak

4.6 million phone numbers, is one of them yours?

While the site crumbled quickly under the weight of so many people trying to…
Read Full Post | Comment |
Four short links: 26 December 2013

Four short links: 26 December 2013

Inside the Nest Protect, Log Structures, Predictions, and In-Memory Data Cubes

  1. Nest Protect Teardown (Sparkfun) — initial teardown of another piece of domestic industrial Internet.
  2. LogsThe distributed log can be seen as the data structure which models the problem of consensus. Not kidding when he calls it “real-time data’s unifying abstraction”.
  3. Mining the Web to Predict Future Events (PDF) — Mining 22 years of news stories to predict future events. (via Ben Lorica)
  4. Nanocubesa fast datastructure for in-memory data cubes developed at the Information Visualization department at AT&T Labs – Research. Nanocubes can be used to explore datasets with billions of elements at interactive rates in a web browser, and in some cases it uses sufficiently little memory that you can run a nanocube in a modern-day laptop. (via Ben Lorica)
Comment |
Four short links: 16 December 2013

Four short links: 16 December 2013

Data Pipeline, Data Driven Education, Crowdsourced Proofreading, and 3D Printed Shoes

  1. Suro (Github) — Netflix data pipeline service for large volumes of event data. (via Ben Lorica)
  2. NIPS Workshop on Data Driven Education — lots of research papers around machine learning, MOOC data, etc.
  3. Proofist — crowdsourced proofreading game.
  4. 3D-Printed Shoes (YouTube) — LeWeb talk from founder of the company, Continuum Fashion). (via Brady Forrest)
Comment |
Four short links: 10 December 2013

Four short links: 10 December 2013

Flexible Data, Google's Bottery, GPU Assist Deep Learning, and Open Sourcing

  1. ArangoDBopen-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.
  2. Google’s Seven Robotics Companies (IEEE) — The seven companies are capable of creating technologies needed to build a mobile, dexterous robot. Mr. Rubin said he was pursuing additional acquisitions. Rundown of those seven companies.
  3. Hebel (Github) — GPU-Accelerated Deep Learning Library in Python.
  4. What We Learned Open Sourcing — my eye was caught by the way they offered APIs to closed source code, found and solved performance problems, then open sourced the fixed code.
Comment: 1 |
Four short links: 9 December 2013

Four short links: 9 December 2013

Surveillance Demarcation, NYT Data Scientist, 2D Dart, and Bayesian Database

  1. Reform Government Surveillance — hard not to view this as a demarcation dispute. “Ruthlessly collecting every detail of online behaviour is something we do clandestinely for advertising purposes, it shouldn’t be corrupted because of your obsession over national security!”
  2. Brian Abelson — Data Scientist at the New York Times, blogging what he finds. He tackles questions like what makes a news app “successful” and how might we measure it. Found via this engaging interview at the quease-makingly named Content Strategist.
  3. StageXL — Flash-like 2D package for Dart.
  4. BayesDBlets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries. Open source.
Comment: 1 |
Four short links: 6 December 2013

Four short links: 6 December 2013

AI Book, Science Superstars, Engineering Ethics, and Crowdsourced Science

  1. Society of Mind — Marvin Minsky’s book now Creative-Commons licensed.
  2. Collaboration, Stars, and the Changing Organization of Science: Evidence from Evolutionary BiologyThe concentration of research output is declining at the department level but increasing at the individual level. [...] We speculate that this may be due to changing patterns of collaboration, perhaps caused by the rising burden of knowledge and the falling cost of communication, both of which increase the returns to collaboration. Indeed, we report evidence that the propensity to collaborate is rising over time. (via Sciblogs)
  3. As Engineers, We Must Consider the Ethical Implications of our Work (The Guardian) — applies to coders and designers as well.
  4. Eyewire — a game to crowdsource the mapping of 3D structure of neurons.
Comment: 1 |
Four short links: 5 December 2013

Four short links: 5 December 2013

R GUI, Drone Regulations, Bitcoin Stats, and Android/iOS Money Shootout

  1. DeducerAn R Graphical User Interface (GUI) for Everyone.
  2. Integration of Civil Unmanned Aircraft Systems (UAS) in the National Airspace System (NAS) Roadmap (PDF, FAA) — first pass at regulatory framework for drones. (via Anil Dash)
  3. Bitcoin Stats — $21MM traded, $15MM of electricity spent mining. Goodness. (via Steve Klabnik)
  4. iOS vs Android Numbers (Luke Wroblewski) — roundup comparing Android to iOS in recent commerce writeups. More Android handsets, but less revenue per download/impression/etc.
Comment: 1 |