- School of Data — free online courses around data science and visualization.
- libshorttext — classify and analyse short-text of things like titles, questions, sentences, and short messages. MIT-style open source license, Python and C++ source.
- Letterboxd — a site for movie lovers from Kiwi Foo alums. I love people who build experiences to help people express their love of things.
- RadioBlocks and SimpleMesh — mesh networking for Arduino.
ENTRIES TAGGED "Big Data"
Four short links: 26 Feb 2013
Data Classes, Short Text Classification, Movie Lovers, and Mesh Networking for Hardware Hackers
Big data is dead, long live big data: Thoughts heading to Strata
The biggest problems will almost always be those for which the size of the data is part of the problem.
A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your Big Data in the Cloud with a Hadoop.”
You don’t have to read much industry news to get…
Four short links: 20 February 2013
Corporate Networks, SimCity Analysis, Monetizing Memes, and Javascript Autocomplete
- The Network of Global Control (PLoS One) — We find that transnational corporations form a giant bow-tie structure and that a large portion of control flows to a small tightly-knit core of financial institutions. [...] From an empirical point of view, a bow-tie structure with a very small and influential core is a new observation in the study of complex networks. We conjecture that it may be present in other types of networks where “rich-get-richer” mechanisms are at work. (via The New Aesthetic)
- Using SimCity to Diagnose My Home Town’s Traffic Problems — no actual diagnosis performed, but the modeling and observations gave insight. I always feel that static visualizations (infographics) are far less useful than an interactive simulation that can give you an intuitive sense of relationships and behaviour. once I’d built East Didsbury, the strip of shops in Northenden stopped making as much money as they once were, and some were even beginning to close down as my time ran out. Walk along Northenden high street, and you’ll know that feeling.
- How the Harlem Shake Went from Viral Sideshow to Global Meme (The Verge) — interesting because again the musician is savvy enough (and has tools and connections) to monetize popularity without trying to own every transaction involving his idea. Baauer and Mad Decent have generally been happy to let a hundred flowers bloom, permitting over 4,000 videos to use an excerpt of the song but quietly adding each of them to YouTube’s Content ID database, asserting copyright over the fan videos and claiming a healthy chunk of the ad revenue for each of them.
- typeahead.js (GitHub) — Javascript library for fast autocomplete.
Four short links: 18 February 2013
Social Aggregator, Social Tracking, Data Boom, and Vim Search
- crowy — open source social media aggregator.
- Raytheon makes Social Media Tracking Software (Guardian) — the technology was shared with US government and industry as part of a joint research and development effort, in 2010, to help build a national security system capable of analysing “trillions of entities” from cyberspace.
- Big Data Leads to Jobs for Cleveland — Spun out of the Cleveland Clinic three years ago, Explorys already employs 85 people and the prospects are as bright as its hip new offices in University Circle. Suddenly, economic development specialists are eyeing Big Data, and its potential for Cleveland, with new intensity. From rust belt to Hadoop uber alles.
- YouCompleteMe — a fuzzy search engine for Vim.
Four short links: 14 February 2013
Malware Industrial Complex, Indies Needed, TV Analytics, and HTTP Benchmarking
- Welcome to the Malware-Industrial Complex (MIT) — brilliant phrase, sound analysis.
- Stupid Stupid xBox — The hardcore/soft-tv transition and any lead they feel they have is simply not defensible by licensing other industries’ generic video or music content because those industries will gladly sell and license the same content to all other players. A single custom studio of 150 employees also can not generate enough content to defensibly satisfy 76M+ customers. Only with quality primary software content from thousands of independent developers can you defend the brand and the product. Only by making the user experience simple, quick, and seamless can you defend the brand and the product. Never seen a better put statement of why an ecosystem of indies is essential.
- Data Feedback Loops for TV (Salon) — Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.
- wrk — a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
Four short links: 12 February 2013
Handmade Hardware, Tab Silencer, Surprise and Models, and Sciencey GIFs
- Your USB Sticks Are Made With Chopsticks (Bunnie Huang) — behind-the-scenes on how USB sticks are made.
- mutetab — find and kill the Chrome tab making all the damn noise! (via Nelson Minar)
- Visualization, Modeling, and Surprises (John D Cook) — paraphrases Hadley Wickham: Visualization can surprise you, but it doesn’t scale well. Modelling scales well, but it can’t surprise you.
- Head Like an Orange — science animated GIFs, assembled from nature documentaries. (via Ed Yong)
Four short links: 4 February 2013
Enlightened Tinkering, In-Browser Tor Proxy, Dark Patterns, and Subjective Data
- Hands on Learning (HuffPo) — Unfortunately, engaged and enlightened tinkering is disappearing from contemporary American childhood. (via BoingBoing)
- FlashProxy (Stanford) — a miniature proxy that runs in a web browser. It checks for clients that need access, then conveys data between them and a Tor relay. [...] If your browser runs JavaScript and has support for WebSockets then while you are viewing this page your browser is a potential proxy available to help censored Internet users.
- Dark Patterns (Slideshare) — User interfaces to trick people. (via Beta Knowledge)
- Bill Gates is Naive: Data Are Not Objective (Math Babe) — examples at the end of biased models/data should be on the wall of everyone analyzing data. (via Karl Fisch)
Hacking robotic arms, predicting flight arrival times, manufacturing in America, tracking Disney customers (industrial Internet links)
The next wave of manufacturing will be highly automated--and American. Also, a hardware hacking collective rehabilitated a pair of cast-off industrial robots.
Flight Quest (GE, powered by Kaggle) — Last November GE, Alaska Airlines, and Kaggle announced the Flight Quest competition, which invites data scientists to build models that can accurately predict when a commercial airline flight touches down and reaches its gate. Since the leaderboard for the competition was activated on December 18, 2012, entrants have already beaten the…
Four short links: 30 January 2013
Cheap Attack Drones, Truth Filters, Where Musicians Make Money, and Dynamic Pricing From Digitized Analogue Signals
- Chinese Attack UAV (Alibaba) — Small attack UAV is characterized with small size, light weight, convenient carrying, rapid outfield expansion procedure, easy operation and maintenance; the system only needs 2-3 operators to operate, can be carried by surveillance personnel to complete the attack mission. (via BoingBoing)
- TruthTeller Prototype (Washington Post) — speech-to-text, then matches statements against known facts to identify truth/falsehoods. Still a prototype but I love that, in addition to the Real Time Coupon Specials From Hot Singles Near You mobile advertising lens, there might be a truth lens that technology helps us apply to the world around us.
- Money from Music: Survey Evidence on Musicians’ Revenue and Lessons About Copyright Incentives — 5,000 American musicians surveyed, For most musicians, copyright does not provide much of a direct financial reward for what they are producing currently. The survey findings are instead consistent with a winner-take-all or superstar model in which copyright motivates musicians through the promise of large rewards in the future in the rare event of wide popularity. This conclusion is not unfamiliar, but this article is the first to support it with empirical evidence on musicians’ revenue. (via TechDirt)
- Max Levchin’s DLD13 Keynote — I believe the next big wave of opportunities exists in centralized processing of data gathered from primarily analog systems. [...] There is also a neat symmetry to this analog-to-digtail transformation — enabling centralization of unique analog capacities. As soon as the general public is ready for it, many things handled by a human at the edge of consumption will be controlled by the best currently available human at the center of the system, real time sensors bringing the necessary data to them in real time.
Four short links: 25 January 2013
Bio-Writing, Internet Fame, Obama's Tech, and Precog Software
- How to Write a Good Bio (Scott Berkun) — something we all have to do, and rarely do well the first time. Excellent advice.
- Scumbag Steve’s Advice for Annoying Facebook Girl — Some people can’t distinguish the internet from real life. There are people who refuse to believe my name isn’t Steve and that I am not really the scumbag (well not all the time, that is). Just remember who you are. And that you know you’re a decent kid. Blake (the guy whose image was adopted as “Scumbag Steve” by meme-makers) was 21 when he wrote that, and it remains the best advice for anyone dealing with sudden visibility in the public eye.
- The Battle for Obama’s Tech (The Verge) — same old story: the software that got Obama elected won’t be released. Instead it’ll atrophy and have to be rewritten in four years’ time. How do I know this? The morons at the Democratic Party did it with Kerry’s run and again for Obama’s first campaign. It’s a choice the OFA developers warn could not only squander the digital advantage the Democrats now hold, but also severely impact their ability to recruit top tech talent in the future.
- Precog Software (Wired) — researchers assembled a dataset of more than 60,000 crimes, including homicides, then wrote an algorithm to find the people behind the crimes who were more likely to commit murder when paroled or put on probation. Berk claims the software could identify eight future murderers out of 100. The software parses about two dozen variables, including criminal record and geographic location. The type of crime and the age at which it was committed, however, turned out to be two of the most predictive variables. [...] The software aims to replace the judgments parole officers already make based on a parolee’s criminal record and is currently being used in Baltimore and Philadelphia. I look forward to the study comparing human judgement from parole officers against algorithmic judgement.
Radar
Radar on
Radar on
Radar on
Radar on 