"algorithms" entries

Four short links: 30 April 2015

Four short links: 30 April 2015

Managing Complex Data Projects, Graphical Linear Algebra, Consistent Hashing, and NoTCP Manifesto

  1. More Tools for Managing and Reproducing Complex Data Projects (Ben Lorica) — As I survey the landscape, the types of tools remain the same, but interfaces continue to improve, and domain specific languages (DSLs) are starting to appear in the context of data projects. One interesting trend is that popular user interface models are being adapted to different sets of data professionals (e.g. workflow tools for business users).
  2. Graphical Linear Algebra — or “Graphical The-Subject-That-Kicked-Nat’s-Butt” as I read it.
  3. Consistent Hashing: A Guide and Go Implementation — easy-to-follow article (and source).
  4. NoTCP Manifesto — a nice summary of the reasons to build custom protocols over UDP, masquerading as church-nailed heresy. Today’s heresy is just the larval stage of tomorrow’s constricting orthodoxy.
Comment: 1
Four short links: 10 April 2015

Four short links: 10 April 2015

Graph Algorithm, Touchy Robots, Python Bolt-Ons, and Building Data Products

  1. Exact Maximum Clique for Large or Massive Real Graphs — explanation of how BBMCSP works.
  2. Giving Robots and Prostheses the Human Touchthe team, led by mechanical engineer Veronica J. Santos, is constructing a language of touch that both a computer and a human can understand. The researchers are quantifying this with mechanical touch sensors that interact with objects of various shapes, sizes, and textures. Using an array of instrumentation, Santos’ team is able to translate that interaction into data a computer can understand. The data is used to create a formula or algorithm that gives the computer the ability to identify patterns among the items it has in its library of experiences and something it has never felt before. This research will help the team develop artificial haptic intelligence, which is, essentially, giving robots, as well as prostheses, the “human touch.”
  3. boltons — things in Python that should have been builtins.
  4. Everything We Wish We’d Known About Building Data Products (DJ Patil and RusJan Belkin) — Data is super messy, and data cleanup will always be literally 80% of the work. In other words, data is the problem. […] “If you’re not thinking about how to keep your data clean from the very beginning, you’re fucked. I guarantee it.” […] “Every single company I’ve worked at and talked to has the same problem without a single exception so far — poor data quality, especially tracking data,” he says.“Either there’s incomplete data, missing tracking data, duplicative tracking data.” To solve this problem, you must invest a ton of time and energy monitoring data quality. You need to monitor and alert as carefully as you monitor site SLAs. You need to treat data quality bugs as more than a first priority. Don’t be afraid to fail a deploy if you detect data quality issues.
Comments: 2
Four short links: 2 April 2015

Four short links: 2 April 2015

250 Whys, Amazon Dash, Streaming Data, and Lightning Networks

  1. What I Learned from 250 WhysLet’s Plan for a Future Where We’re All As Stupid as We Are Today.
  2. Thoughts on Amazon Dash (Matt Webb) — In a way, we’re really seeing the future of marketing here. We’ve separated awareness (advertising) and distribution (stores) for so long, but it’s no longer the way. When you get a Buy Now button in a Tweet, you’re seeing ads and distribution merging, and the Button is the physical instantiation of this same trend. […] in the future every product will carry a buy button.
  3. A Collection of Links for Streaming Algorithms and Data Structures — is this not the most self-evident title ever?
  4. Lightning Networks (Rusty Russell) — I finally took a second swing at understanding the Lightning Network paper. The promise of this work is exceptional: instant, reliable transactions across the bitcoin network. But the implementation is complex, and the draft paper reads like a grab bag of ideas; but it truly rewards close reading! It doesn’t involve novel crypto, nor fancy bitcoin scripting tricks. There are several techniques that are used in the paper, so I plan to concentrate on one per post and wrap up at the end. Already posted part II.
Comment
Four short links: 26 March 2015

Four short links: 26 March 2015

GPU Graph Algorithms, Data Sharing, Build Like Google, and Distributed Systems Theory

  1. gunrocka CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. (via Ben Lorica)
  2. How to Share Data with a Statisticiansome instruction on the best way to share data to avoid the most common pitfalls and sources of delay in the transition from data collection to data analysis.
  3. Bazela build tool, i.e. a tool that will run compilers and tests to assemble your software, similar to Make, Ant, Gradle, Buck, Pants, and Maven. Google’s build tool, to be precise.
  4. You Can’t Have Exactly-Once Delivery — not about the worst post office ever. FLP and the Two Generals Problem are not design complexities, they are impossibility results.
Comment
Four short links: 12 March 2015

Four short links: 12 March 2015

Billion Node Graphs, Asynchronous Systems, Deep Learning Hardware, and Vision Resources

  1. Mining Billion Node Graphs: Patterns and Scalable Algorithms (PDF) — slides from a CMU academic’s talk at C-BIG 2012.
  2. There Is No NowOne of the most important results in the theory of distributed systems is an impossibility result, showing one of the limits of the ability to build systems that work in a world where things can fail. This is generally referred to as the FLP result, named for its authors, Fischer, Lynch, and Paterson. Their work, which won the 2001 Dijkstra Prize for the most influential paper in distributed computing, showed conclusively that some computational problems that are achievable in a “synchronous” model in which hosts have identical or shared clocks are impossible under a weaker, asynchronous system model.
  3. Deep Learning Hardware GuideOne of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high performance system.
  4. Awesome Computer Vision — curated list of computer vision resources.
Comment
Four short links: 13 February 2015

Four short links: 13 February 2015

Web Post-Mortem, Data Flow, Hospital Robots, and Robust Complex Networks

  1. What Happened to Web Intents (Paul Kinlan) — I love post-mortems, and this is a thoughtful one.
  2. Apache NiFi — incubated open source project for data flow.
  3. Tug Hospital Robot (Wired) — It may have an adult voice, but Tug has a childlike air, even though in this hospital you’re supposed to treat it like a wheelchair-bound old lady. It’s just so innocent, so earnest, and at times, a bit helpless. If there’s enough stuff blocking its way in a corridor, for instance, it can’t reroute around the obstruction. This happened to the Tug we were trailing in pediatrics. “Oh, something’s in its way!” a woman in scrubs says with an expression like she herself had ruined the robot’s day. She tries moving the wheeled contraption but it won’t budge. “Uh, oh!” She shoves on it some more and finally gets it to move. “Go, Tug, go!” she exclaims as the robot, true to its programming, continues down the hall.
  4. Improving the Robustness of Complex Networks with Preserving Community Structure (PLoSone) — To improve robustness while minimizing the above three costly changes, we first seek to verify that the community structure of networks actually do identify the robustness and vulnerability of networks to some extent. Then, we propose an effective 3-step strategy for robustness improvement, which retains the degree distribution of a network, as well as preserves its community structure.
Comment
Four short links: 31 December 2014

Four short links: 31 December 2014

Feudal Employment, Untrusted Computing, Nerd Entitlement, and Paxos Explained

  1. Governance for the New Class of Worker (Matt Webb) — there is a new class of worker. They’re not inside the company – not benefiting from job security or healthcare – but their livelihoods in large part dependent on it, the transaction cost of moving to a competitor deliberately kept high. Or the worker is, without seeing any of the upside of success, taking on the risk or bearing the cost of the company’s expansion and operation.
  2. Hidden Code in Your Chipset (Slideshare) — there’s a processor that supervises your processor, and it’s astonishingly fully-featured (to the point of having privileged access to the network and being able to run Java code).
  3. On Nerd EntitlementPrivilege doesn’t mean you don’t suffer. The best part of 2014 was the tech/net feminist consciousness-raising/uprising. That’s probably the wrong label for it, but bullshit is being called that was ignored years ago. I think we’ve collectively found the next thing we fix that future generations will look back on us and wonder why it went unremarked-upon for so long.
  4. Understanding Paxos — a simple introduction, with animations, to one of the key algorithms in distributed systems.
Comment
Four short links: 30 December 2014

Four short links: 30 December 2014

DevOps Security, Bit Twiddling, Design Debates, and Chinese IP

  1. DevOoops (Slideshare) — many ways in which your devops efforts can undermine your security efforts.
  2. Matters Computational (PDF) — low-level bit-twiddling and algorithms with source code. (via Jarkko Hietaniemi)
  3. Top 5 Game Design Debates I Ignored in 2014 (Daniel Cook) — Stretch your humanity.
  4. From Gongkai to Open Source (Bunnie Huang) — The West has a “broadcast” view of IP and ownership: good ideas and innovation are credited to a clearly specified set of authors or inventors, and society pays them a royalty for their initiative and good works. China has a “network” view of IP and ownership: the far-sight necessary to create good ideas and innovations is attained by standing on the shoulders of others, and as such there is a network of people who trade these ideas as favors among each other. In a system with such a loose attitude toward IP, sharing with the network is necessary as tomorrow it could be your friend standing on your shoulders, and you’ll be looking to them for favors. This is unlike the West, where rule of law enables IP to be amassed over a long period of time, creating impenetrable monopoly positions. It’s good for the guys on top, but tough for the upstarts.
Comment
Four short links: 26 December 2014

Four short links: 26 December 2014

Science Software, Better Bitmaps, Pushy Internet, and Graphical Perception

  1. How Bad Software Leads to Bad Science — 21% of scientists who write software have never received training in software development.
  2. Roaring Bitmapscompressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, they can be hundreds of times faster and they often offer significantly better compression.
  3. Two Eras of the Internet: From Pull to Push (Chris Dixon) — in which the consumer becomes the infinite sink for an unending and constant stream of updates, media, and social mobile local offers to swipe right on brands near you.
  4. Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods (PDF) — research on how well people decode visual cues. In order: Position along a common scale e.g. scatter plot; Position on identical but nonaligned scales e.g. multiple scatter plots; Length e.g. bar chart; Angle & Slope (tie) e.g. pie chart; Area e.g. bubbles; Volume, density, and color saturation (tie) e.g. heatmap; Color hue e.g. newsmap. (via Flowing Data)
Comments: 2
Four short links: 13 November 2014

Four short links: 13 November 2014

Material Design, GitHub Communication, Priority Queues, and DevOps Learnings

  1. Materialize — another web implementation of Material Design.
  2. Communicating at Github — interesting take on making visible and optimising for the conversations and decisions that form culture but are otherwise invisible.
  3. MultiQueues — an approach for parallel access to priority queues.
  4. Devops LearningsWe view DevOps as the missing components of agile – the enabler for getting it out of the door and closing the loop between software engineer and customer.
Comment