ENTRIES TAGGED "scale"

Four short links: 23 April 2014

Four short links: 23 April 2014

Mobile UX, Ideation Tools, Causal Consistency, and Intellectual Ventures Patent Fail

  1. Samsung UX (Scribd) — little shop of self-catalogued UX horrors, courtesy discovery in a lawsuit. Dated (Android G1 as competition) but rewarding to see there are signs of self-awareness in the companies that inflict unusability on the world.
  2. Tools for Ideation and Problem Solving (Dan Lockton) — comprehensive and analytical take on different systems for ideas and solutions.
  3. Don’t Settle for Eventual Consistency (ACM) — proposes “causal consistency”, prototyped in COPS and Eiger from Princeton.
  4. Intellectual Ventures Loses Patent Case (Ars Technica) — The Capital One case ended last Wednesday, when a Virginia federal judge threw out the two IV patents that remained in the case. It’s the first IV patent case seen through to a judgment, and it ended in a total loss for the patent-holding giant: both patents were invalidated, one on multiple grounds.
Comment |
Four short links: 31 March 2014

Four short links: 31 March 2014

Game Patterns, What Next, GPU vs CPU, and Privacy with Sensors

  1. Game Programming Patterns — a book in progress.
  2. Search for the Next Platform (Fred Wilson) — Mobile is now the last thing. And all of these big tech companies are looking for the next thing to make sure they don’t miss it.. And they will pay real money (to you and me) for a call option on the next thing.
  3. Debunking the 100X GPU vs. CPU Myth — in Pete Warden’s words, “in a lot of real applications any speed gains on the computation side are swamped by the time it takes to transfer data to and from the graphics card.”
  4. Privacy in Sensor-Driven Human Data Collection (PDF) — see especially the section “Attacks Against Privacy”. More generally, it is often the case the data released by researches is not the source of privacy issues, but the unexpected inferences that can be drawn from it. (via Pete Warden)
Comments: 2 |
Four short links: 3 January 2014

Four short links: 3 January 2014

Mesh Networks, Collaborative LaTeX, Distributed Systems Book, and Reverse-Engineering Netflix Metadata

  1. Commotion — open source mesh networks.
  2. WriteLaTeX — online collaborative LaTeX editor. No, really. This exists. In 2014.
  3. Distributed Systems — free book for download, goal is to bring together the ideas behind many of the more recent distributed systems – systems such as Amazon’s Dynamo, Google’s BigTable and MapReduce, Apache’s Hadoop etc.
  4. How Netflix Reverse-Engineered Hollywood (The Atlantic) — Using large teams of people specially trained to watch movies, Netflix deconstructed Hollywood. They paid people to watch films and tag them with all kinds of metadata. This process is so sophisticated and precise that taggers receive a 36-page training document that teaches them how to rate movies on their sexually suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.
Comment |
Four short links: 5 November 2013

Four short links: 5 November 2013

Time Series Database, Cluster Schedulers, Structural Search-and-Replace, and TV Data

  1. Influx DBopen-source, distributed, time series, events, and metrics database with no external dependencies.
  2. Omega (PDF) — flexible, scalable schedulers for large compute clusters. From Google Research.
  3. GraspJSSearch and replace your JavaScript code based on its structure rather than its text.
  4. Amazon Mines Its Data Trove To Bet on TV’s Next Hit (WSJ) — Amazon produced about 20 pages of data detailing, among other things, how much a pilot was viewed, how many users gave it a 5-star rating and how many shared it with friends.
Comment: 1 |
Four short links: 27 September 2013

Four short links: 27 September 2013

Amen Break, MySQL Scale, Spooky Source, and Graph Analytics Engine

  1. The Amen Break (YouTube) — fascinating 20m history of the amen break, a handful of bars of drum solo from a forgotten 1969 song which became the origin of a huge amount of popular music from rap to jungle and commercials, and the contested materials at the heart of sample-based music. Remix it and weep. (via Beta Knowledge)
  2. The MySQL Ecosystem at Scale (PDF) — nice summary of how MySQL is used on massive users, and where the sweet spots have been found.
  3. Lab41 (Github) — open sourced code from a spook hacklab in Silicon Valley.
  4. Fanulus — open sourced Hadoop-based graph analytics engine for analyzing graphs represented across a multi-machine compute cluster. A breadth-first version of the graph traversal language Gremlin operates on graphs stored in the distributed graph database Titan, in any Rexster-fronted graph database, or in HDFS via various text and binary formats.
Comment |
Four short links: 15 August 2013

Four short links: 15 August 2013

Audio Visualization, 3D Printed Toys, Data Center Computing, and Downloding Not Yet Beaten

  1. github realtime activity — audio triggered by github activity, built with choir.io.
  2. Makies Hit Shelves at Selfridges — 3d printing business gaining mainstream distribution. Win!
  3. The Datacenter as Computerwe must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board. (via Mike Loukides)
  4. Illegal Downloads Not Erased By Simultaneous ReleaseData gathered by TorrentFreak throughout the day reveals that most early downloaders, a massive 16.1%, come from Australia. Down Under the show aired on the pay TV network Foxtel, but it appears that many Aussies prefer to download a copy instead. The same is true for the United States and Canada, with 16% and 9.6% of the total downloads respectively, despite the legal offerings. Unclear whether this represents greater or less downloading than would have happened without simultaneous release.
Comment |
Four short links: 9 August 2013

Four short links: 9 August 2013

DEFCON Doco, Global-Scale Networks, Media Goblin, and TCP/IP Legos

  1. DEFCON Documentary — free download, I’m looking forward to watching it on the flight back to NZ.
  2. Global-Scale Systems — botnets as example of the scale of networks and systems we’ll have to build but don’t have experience in.
  3. MediaGoblin — GNU project to build a decentralized alternative to Flickr, YouTube, SoundCloud, etc.
  4. Teaching TCP/IP Headers with Legos — genius. (via BoingBoing)
Comment |
Strata Week: The rise of the robot essay graders

Strata Week: The rise of the robot essay graders

Bot graders pass muster, Instagram's small team handles scale, assessing UK open data efforts.

In this week's data news, a look at the performance of automated essay-grading software, scaling Instagram, and an audit of the UK government's open data initiative.

Read Full Post | Comments: 3 |
Strata Week: The data behind Yahoo's front page

Strata Week: The data behind Yahoo's front page

A new look at Yahoo's traffic, the challenge of scaling Tumblr, and a host of visualization guidelines.

In this week's data news: Yahoo visualizes its front page traffic and demographics, why Tumblr is tougher to scale than Twitter, and a look at what you need to consider as you build visualizations.

Read Full Post | Comment |
Four short links: 9 March 2011

Four short links: 9 March 2011

R IDE, Audience Participation, Machine Learning, Surviving Success

  1. R Studio — AGPLv3-licensed IDE for R. It brings your R console, source code, plots, help, history, and workspace browser into one cohesive package. We’ve added some neat productivity features like a searchable endless command history, function/symbol completion, data import dialog with preview, one-click Sweave compile, and more. Source on github. Built as a web-app on Google AppEngine, from Joe Cheng who did Windows Live Writer at Microsoft. (via DeWitt Clinton)
  2. Adventures in Participatory Audience — Nina Simon helped thirteen students produce three projects to encourage participation in museum audiences: Xavier, Stringing Connections, and Dirty Laundry. My favourite was Dirty Laundry, where people shared secrets connected to works of art. Nina’s description of what she learned has some nuggets: friendly faces welcoming people in gets better response than a card with instructions, and I am still flummoxed as to what would make someone admit to an affair or bad parenting in a sterile art gallery, or the devastating one that read, “I avoid the important, difficult conversations with those I love the most.” Audience participation in the real world has lessons on what works for those who would build social software.
  3. Why Generic Machine Learning FailsReturns for increasing data size come from two sources: (1) the importance of tails and (2) the cost of model innovation. When tails are important, or when model innovation is difficult relative to cost of data capture, then more data is the answer. [...] Machine learning is not undifferentiated heavy lifting, it’s not commoditizable like EC2, and closer to design than coding. The Netflix prize is a good example: the last 10% reduction in RMSE wasn’t due to more powerful generic algorithms, but rather due to some very clever thinking about the structure of the problem; observations like “people who rate a whole slew of movies at one time tend to be rating movies they saw a long time ago” from BellKor.
  4. Anatomy of a Crushing — Maciej Ceglowski describes how pinboard.in survived the flood of Delicious émigrées. It took several rounds of rewrites to get the simple tag cloud script right, and this made me very skittish about touching any other parts of the code over the next few days, even when the fixes were easy and obvious. The part of my brain that knew what to do no longer seemed to be connected directly to my hands.
Comment: 1 |