"text processing" entries

Prepare distribution patches with gawk

Exploring the power and sophistication of awk.

I maintain GNU Awk. As part of making releases, I have to create a patch script to convert the file tree of the previous release into the current one. This means writing rm commands to remove any files that have been removed. This is fairly straightforward using tools like find, sort, and comm.

However, for the 4.1.2 release, I also changed the permissions (mode) on some files. I want to create chmod commands to update these files’ permission settings as well. This is a little harder, so I decided to write an awk script that will do this for me.

Let’s take a look at some of the sophistication and control you can achieve using awk, such as recursion, the use of arrays of arrays, and extension functions for using operating system facilities.

This script, comptrees.awk, uses the fts() extension function to do the heavy lifting. This function walks file trees, building up a representation of those trees using gawk‘s arrays of arrays.

Read more…

Comment
Four short links: 16 December 2014

Four short links: 16 December 2014

Memory Management, Stream Processing, Robot's Google, and Emotive Words

  1. Effectively Managing Memory at Gmail Scale — how they gathered data, how Javascript memory management works, and what they did to nail down leaks.
  2. tigonan open-source, real-time, low-latency, high-throughput stream processing framework.
  3. Robo Brain — machine knowledge of the real world for robots. (via MIT Technology Review)
  4. The Structure and Interpretation of the Computer Science Curriculum — convincing argument for teaching intro to programming with Scheme, but not using the classic text SICP.

Update: the original fourth link to Depeche Mood led only to a README on GitHub; we’ve replaced it with a new link.

Comments: 5
Four short links: 25 July 2014

Four short links: 25 July 2014

Public Private Pain, Signature Parsing, OSCON Highlights, and Robocar Culture

  1. What is Public? (Anil Dash) — the most cogent and articulate (and least hyperventilated dramaware) rundown of just what the problem is, that you’re ever likely to find.
  2. talon — mailgun’s open sourced library for parsing email signatures.
  3. Signals from OSCON — some highlights. Watching Andrew Sorensen livecode synth playing (YouTube clip) is pretty wild.
  4. Two Cultures of Robocars (Brad Templeton) — The conservative view sees this technology as a set of wheels that has a computer. The aggressive school sees this as a computer that has a set of wheels.
Comment