ENTRIES TAGGED "analytics"

Four short links: 18 April 2014

Four short links: 18 April 2014

Interview Tips, Data of Any Size, Science Writing, and Instrumented Javascript

  1. 16 Interviewing Tips for User Studies — these apply to many situations beyond user interviews, too.
  2. The Backlash Against Big Data contd. (Mike Loukides) — Learn to be a data skeptic. That doesn’t mean becoming skeptical about the value of data; it means asking the hard questions that anyone claiming to be a data scientist should ask. Think carefully about the questions you’re asking, the data you have to work with, and the results that you’re getting. And learn that data is about enabling intelligent discussions, not about turning a crank and having the right answer pop out.
  3. The Science of Science Writing (American Scientist) — also applicable beyond the specific field for which it was written.
  4. earhornEarhorn instruments your JavaScript and shows you a detailed, reversible, line-by-line log of JavaScript execution, sort of like console.log’s crazy uncle.
Comment |
Four short links: 26 February 2014

Four short links: 26 February 2014

Library Box, Data-Driven Racial Profiling, Internet of Washing Machines, and Nokia's IoT R&D

  1. Librarybox 2.0fork of PirateBox for the TP-Link MR 3020, customized for educational, library, and other needs. Wifi hotspot with free and anonymous file sharing. v2 adds mesh networking and more. (via BoingBoing)
  2. Chicago PD’s Using Big Data to Justify Racial Profiling (Cory Doctorow) — The CPD refuses to share the names of the people on its secret watchlist, nor will it disclose the algorithm that put it there. [...] Asserting that you’re doing science but you can’t explain how you’re doing it is a nonsense on its face. Spot on.
  3. Cloudwash (BERG) — very good mockup of how and why your washing machine might be connected to the net and bound to your mobile phone. No face on it, though. They’re losing their touch.
  4. What’s Left of Nokia to Bet on Internet of Things (MIT Technology Review) — With the devices division gone, the Advanced Technologies business will cut licensing deals and perform advanced R&D with partners, with around 600 people around the globe, mainly in Silicon Valley and Finland. Hopefully will not devolve into being a patent troll. [...] “We are now talking about the idea of a programmable world. [...] If you believe in such a vision, as I do, then a lot of our technological assets will help in the future evolution of this world: global connectivity, our expertise in radio connectivity, materials, imaging and sensing technologies.”
Comment |
Four short links: 27 January 2014

Four short links: 27 January 2014

Real Time Exploratory Analytics, Algorithmic Agendas, Disassembly Engine, and Future of Employment

  1. Druid — open source clustered data store (not key-value store) for real-time exploratory analytics on large datasets.
  2. It’s Time to Engineer Some Filter Failure (Jon Udell) — Our filters have become so successful that we fail to notice: We don’t control them, They have agendas, and They distort our connections to people and ideas. That idea that algorithms have agendas is worth emphasising. Reality doesn’t have an agenda, but the deployer of a similarity metric has decided what features to look for, what metric they’re optimising, and what to do with the similarity data. These are all choices with an agenda.
  3. Capstone — open source multi-architecture disassembly engine.
  4. The Future of Employment (PDF) — We note that this prediction implies a truncation in the current trend towards labour market polarization, with growing employment in high and low-wage occupations, accompanied by a hollowing-out of middle-income jobs. Rather than reducing the demand for middle-income occupations, which has been the pattern over the past decades, our model predicts that computerisation will mainly substitute for low-skill and low-wage jobs in the near future. By contrast, high-skill and high-wage occupations are the least susceptible to computer capital. (via The Atlantic)
Comment |
Four short links: 16 December 2013

Four short links: 16 December 2013

Data Pipeline, Data Driven Education, Crowdsourced Proofreading, and 3D Printed Shoes

  1. Suro (Github) — Netflix data pipeline service for large volumes of event data. (via Ben Lorica)
  2. NIPS Workshop on Data Driven Education — lots of research papers around machine learning, MOOC data, etc.
  3. Proofist — crowdsourced proofreading game.
  4. 3D-Printed Shoes (YouTube) — LeWeb talk from founder of the company, Continuum Fashion). (via Brady Forrest)
Comment |
Four short links: 3 December 2013

Four short links: 3 December 2013

  1. SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
  2. madliban open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
  3. Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
  4. Disguise Detection — using Raspberry Pi, Arduino, and Python.
Comment |
Four short links: 7 June 2013

Four short links: 7 June 2013

Open Source BigTable, Robots Lost, Changing the World, Secrecy Binge

  1. Accumulo — NSA’s BigTable implementation, released as an Apache project.
  2. How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting access to the technology that the traders had: Last fall the SEC said it would pay Tradeworx, a high-frequency trading firm, $2.5 million to use its data collection system as the basic platform for a new surveillance operation. Code-named Midas (Market Information Data Analytics System), it scours the market for data from all 13 public exchanges. Midas went live in February. The SEC can now detect anomalous situations in the market, such as a trader spamming an exchange with thousands of fake orders, before they show up on blogs like Nanex and ZeroHedge. If Midas sees something odd, Berman’s team can look at trading data on a deeper level, millisecond by millisecond.
  3. PRISM: Surprised? (Danny O’Brien) — I really don’t agree with the people who think “We don’t have the collective will”, as though there’s some magical way things got done in the past when everyone was in accord and surprised all the time. It’s always hard work to change the world. Endless, dull hard work. Ten years later, when you’ve freed the slaves or beat the Nazis everyone is like “WHY CAN’T IT BE AS EASY TO CHANGE THIS AS THAT WAS, BACK IN THE GOOD OLD DAYS. I GUESS WE’RE ALL JUST SHEEPLE THESE DAYS.”
  4. What We Don’t Know About Spying on Citizens is Scarier Than What We Do Know (Bruce Schneier) — The U.S. government is on a secrecy binge. It overclassifies more information than ever. And we learn, again and again, that our government regularly classifies things not because they need to be secret, but because their release would be embarrassing. Open source BigTable implementation: free. Data gathering operation around it: $20M/year. Irony in having the extent of authoritarian Big Brother government secrecy questioned just as a whistleblower’s military trial is held “off the record”: priceless.
Comment |
Four short links: 1 April 2013

Four short links: 1 April 2013

Machine Learning Demos, iOS Debugging, Industrial Internet, and Deanonymity

  1. MLDemosan open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. (via Mark Alen)
  2. kiln (GitHub) — open source extensible on-device debugging framework for iOS apps.
  3. Industrial Internet — the O’Reilly report on the industrial Internet of things is out. Prasad suggests an illustration: for every car with a rain sensor today, there are more than 10 that don’t have one. Instead of an optical sensor that turns on windshield wipers when it sees water, imagine the human in the car as a sensor — probably somewhat more discerning than the optical sensor in knowing what wiper setting is appropriate. A car could broadcast its wiper setting, along with its location, to the cloud. “Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction.
  4. Unique in the Crowd: The Privacy Bounds of Human Mobility (PDF, Nature) — We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals. As Edd observed, “You are a unique snowflake, after all.” (via Alasdair Allan)
Comment |
Four short links: 18 March 2013

Four short links: 18 March 2013

Big Lit Data, 6502 Assembly, Small Startup Analytics, and Javascript Heatmaps

  1. A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
  2. Easy6502get started writing 6502 assembly language. Fun way to get started with low-level coding.
  3. How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
  4. webgl-heatmap (GitHub) — a JavaScript library for high performance heatmap display.
Comment |
Four short links: 14 February 2013

Four short links: 14 February 2013

Malware Industrial Complex, Indies Needed, TV Analytics, and HTTP Benchmarking

  1. Welcome to the Malware-Industrial Complex (MIT) — brilliant phrase, sound analysis.
  2. Stupid Stupid xBoxThe hardcore/soft-tv transition and any lead they feel they have is simply not defensible by licensing other industries’ generic video or music content because those industries will gladly sell and license the same content to all other players. A single custom studio of 150 employees also can not generate enough content to defensibly satisfy 76M+ customers. Only with quality primary software content from thousands of independent developers can you defend the brand and the product. Only by making the user experience simple, quick, and seamless can you defend the brand and the product. Never seen a better put statement of why an ecosystem of indies is essential.
  3. Data Feedback Loops for TV (Salon) — Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.
  4. wrka modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
Comment: 1 |
Four short links: 12 February 2013

Four short links: 12 February 2013

Handmade Hardware, Tab Silencer, Surprise and Models, and Sciencey GIFs

  1. Your USB Sticks Are Made With Chopsticks (Bunnie Huang) — behind-the-scenes on how USB sticks are made.
  2. mutetab — find and kill the Chrome tab making all the damn noise! (via Nelson Minar)
  3. Visualization, Modeling, and Surprises (John D Cook) — paraphrases Hadley Wickham: Visualization can surprise you, but it doesn’t scale well. Modelling scales well, but it can’t surprise you.
  4. Head Like an Orange — science animated GIFs, assembled from nature documentaries. (via Ed Yong)
Comment |