"text processing" entries

Four short links: 17 November 2015

Four short links: 17 November 2015

Remix Contest, Uber Asymmetry, Language Learning, and Continuous Delivery

  1. GIF It Up — very clever remix campaign to use heritage content—Friday is your last day to enter this year’s contest, so get creating! My favourite.
  2. Uber’s Drivers: Information Asymmetries and Control in Dynamic WorkOur conclusions are two-fold: first, that the information asymmetries produced by Uber’s system are fundamental to its ability to structure indirect control over its workers; and second, that Uber relies heavily on the evolving rhetoric of the algorithm to justify these information asymmetries to drivers, riders, as well as regulators and outlets of public opinion.
  3. ANNABELL — unsupervised language learning using artificial neural networks, install your own four year old. The paper explains how.
  4. Spinnakeran open source, multi-cloud continuous delivery platform for releasing software changes with high velocity and confidence.

Prepare distribution patches with gawk

Exploring the power and sophistication of awk.

I maintain GNU Awk. As part of making releases, I have to create a patch script to convert the file tree of the previous release into the current one. This means writing rm commands to remove any files that have been removed. This is fairly straightforward using tools like find, sort, and comm.

However, for the 4.1.2 release, I also changed the permissions (mode) on some files. I want to create chmod commands to update these files’ permission settings as well. This is a little harder, so I decided to write an awk script that will do this for me.

Let’s take a look at some of the sophistication and control you can achieve using awk, such as recursion, the use of arrays of arrays, and extension functions for using operating system facilities.

This script, comptrees.awk, uses the fts() extension function to do the heavy lifting. This function walks file trees, building up a representation of those trees using gawk‘s arrays of arrays.

Read more…

Four short links: 16 December 2014

Four short links: 16 December 2014

Memory Management, Stream Processing, Robot's Google, and Emotive Words

  1. Effectively Managing Memory at Gmail Scale — how they gathered data, how Javascript memory management works, and what they did to nail down leaks.
  2. tigonan open-source, real-time, low-latency, high-throughput stream processing framework.
  3. Robo Brain — machine knowledge of the real world for robots. (via MIT Technology Review)
  4. The Structure and Interpretation of the Computer Science Curriculum — convincing argument for teaching intro to programming with Scheme, but not using the classic text SICP.

Update: the original fourth link to Depeche Mood led only to a README on GitHub; we’ve replaced it with a new link.

Four short links: 25 July 2014

Four short links: 25 July 2014

Public Private Pain, Signature Parsing, OSCON Highlights, and Robocar Culture

  1. What is Public? (Anil Dash) — the most cogent and articulate (and least hyperventilated dramaware) rundown of just what the problem is, that you’re ever likely to find.
  2. talon — mailgun’s open sourced library for parsing email signatures.
  3. Signals from OSCON — some highlights. Watching Andrew Sorensen livecode synth playing (YouTube clip) is pretty wild.
  4. Two Cultures of Robocars (Brad Templeton) — The conservative view sees this technology as a set of wheels that has a computer. The aggressive school sees this as a computer that has a set of wheels.