"computer science" entries

Mark Burgess on a CS narrative, orders of magnitude, and approaching biological scale

The O'Reilly Radar Podcast: "In Search of Certainty," Promise Theory, and scaling the computational net.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.


In this week’s Radar Podcast episode, Aneel Lakhani, director of marketing at SignalFx, chats with Mark Burgess, professor emeritus of network and system administration, former founder and CTO of CFEngine, and now an independent technologist and researcher. They talk about the new edition of Burgess’ book, In Search of Certainty, Promise Theory and how promises are a kind of service model, and ways of applying promise-oriented thinking to networks.

Here are a few highlights from their chat:

We tend to separate our narrative about computer science from the narrative of physics and biology and these other sciences. Many of the ideas of course, all of the ideas, that computers are based on originate in these other sciences. I felt it was important to weave computer science into that historical narrative and write the kind of book that I loved to read when I was a teenager, a popular science book explaining ideas, and popularizing some of those ideas, and weaving a story around it to hopefully create a wider understanding.

I think one of the things that struck me as I was writing [In Search of Certainty], is it all goes back to scales. This is a very physicist point of view. When you measure the world, when you observe the world, when you characterize it even, you need a sense of something to measure it by. … I started the book explaining how scales affect the way we describe systems in physics. By scale, I mean the order of magnitude. … The descriptions of systems are often qualitatively different with these different scales. … Part of my work over the years has been trying to find out how we could invent the measuring scale for semantics. This is how so-called Promise Theory came about. I think this notion of scale and how we apply it to systems is hugely important.

You’re always trying to find the balance between the forces of destruction and the forces of repair.

Read more…


From search to distributed computing to large-scale information extraction

The O'Reilly Data Show Podcast: Mike Cafarella on the early days of Hadoop/HBase and progress in structured data extraction.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

The_Wonders_of_The_World_British_Library_FlickrFebruary 2016 marks the 10th anniversary of Hadoop — at a point in time when many IT organizations actively use Hadoop, and/or one of the open source, big data projects that originated after, and in some cases, depend on it.

During the latest episode of the O’Reilly Data Show Podcast, I had an extended conversation with Mike Cafarella, assistant professor of computer science at the University of Michigan. Along with Strata + Hadoop World program chair Doug Cutting, Cafarella is the co-founder of both Hadoop and Nutch. In addition, Cafarella was the first contributor to HBase

We talked about the origins of Nutch, Hadoop (HDFS, MapReduce), HBase, and his decision to pursue an academic career and step away from these projects. Cafarella’s pioneering contributions to open source search and distributed systems fits neatly with his work in information extraction. We discussed a new startup he recently co-founded, ClearCutAnalytics, to commercialize a highly regarded academic project for structured data extraction (full disclosure: I’m an advisor to ClearCutAnalytics). As I noted in a previous post, information extraction (from a variety of data types and sources) is an exciting area that will lead to the discovery of new features (i.e., variables) that may end up improving many existing machine learning systems. Read more…

Comment: 1
Four short links: 20 April 2010

Four short links: 20 April 2010

CS Epigrams, Star Trek Made Real, Python Filings, and Difficult Games

  1. Epigrams in Programming — all from the remarkable Alan Perlis. By the time I learned that he was responsible for such gems as “Syntactic sugar causes cancer of the semicolon”, “A language that doesn’t affect the way you think about programming, is not worth knowing”, and “Around computers it is difficult to find the correct unit of time to measure progress. Some cathedrals took a century to complete. Can you imagine the grandeur and scope of a program that would take as long?”, he had died and I never had a chance to meet him. “The best book on programming for the layman is “Alice in Wonderland”; but that’s because it’s the best book on anything for the layman.”. (via Hacker News)
  2. Tricorder for Android — app that shows all the info from the sensors: local magnetic field, RF, acceleration, sound, etc. They really need a designer to make this look more like Star Trek than an Apple ][c program. (via attercop on Delicious)
  3. Will Wall Street Require Pythonwith Release 33-9117, the SEC is considering substitution of Python or another programming language for legal English as a basis for some of its regulations. Reminds me of Charlie Stross’s “Accelerando” where companies bylaws are written in Python and largely autonomous.
  4. Hatetris — game of Tetris that deliberately gives you the most difficult pieces. I love inversions like this, which present their own algorithmic challenges distinct from the original’s.
Comments Off on Four short links: 20 April 2010
Four short links: 5 January 2010

Four short links: 5 January 2010

Computational Advertising, Timing Attacks, Climate Visualized, and Context Assembly

  1. Introduction to Computational Advertising — slides to a Stanford class on a new “scientific discipline” whose central challenge is to find the best ad to present to a user engaged in a given context, such as querying a search engine (“sponsored search”), reading a web page (“content match”), watching a movie, and IM-ing. “Scientific discipline” makes me gag. You could devise algorithms, measure performance, and write papers about the best way to put carrots up your bottom or the best way to pick pockets, but those still aren’t complex enough activities to be trumpeted as “new scientific disciplines”. (Although I do look forward to reading Stanford’s CBUM126, “Introduction to Carrot Stuffing” lecture notes online). (via Greg Linden)
  2. Timing Attack in Google KeyCzar Library — if you compare strings in the naive way, attackers can figure out whether the first bytes they gave you are correct based on the time the comparison takes. When they get the first bytes correct, then they can work on the next, and so on. This is a common mode of information leakage, and reminds me of my revelation when I began to edit security books: “this stuff is hard”. New programmers are not taught to think like attackers, and the only trope of secure programming that they’re taught is “avoid buffer overflows”. (via Simon Willison)
  3. Climate Wizard — explore historical temperature data as well as the various climate models and see what their predictions look like across the United States. (via Sciblogs)
  4. Contextual Clothing for Naked Transparency (Jon Udell) — notable for this: The Net can be an engine for context assembly, a wonderful phrase I picked up years ago from Jack Ozzie. We used to think that the challenge of social software was to amass as many users as quickly as possible, but the far harder problem to solve is how to help those people contribute to something positive. YouTube comments shows that simply having a lot of users doesn’t make something virtuous.
Comments: 2
Four short links: 30 December 2009

Four short links: 30 December 2009

Time Management, CS Education, Installing EtherPad, Infoengravings

  1. How to Run a Meeting Like Google (BusinessWeek) — the temptation is to mock things like “even five minute meetings must have an agenda”, but my sympathy with Marissa Mayer is high. The more I try to cram into a work day, the more I have to be able to justify every part of it. If you can’t tell me why you want to see me for five minutes, then I probably have better things to be doing. There may be false culls (missing something important because the “process’ is too high) but I bet these are far outweighed by the missed opportunities if time isn’t so structured.
  2. Computer Science Education WeekDecember 5-11, 2010, recognizes that computing: Touches everyone’s daily lives and plays a critical role in society; Drives innovation and economic growth; Provides rewarding job opportunities; Prepares students with the knowledge and skills they need for the 21st century.” Worthy, but there’s no mention of the fact that it’s FUN. The brilliant people in this field love what they do. They’re not brilliant 9-5, then heading home to scan the Jobs Wanted to see whether they could earn more as dumptruck drivers in Uranium mines in Australia. CS isn’t for everyone, but it won’t be for anyone unless we help them find the bits they find fun.
  3. Installing EtherPad — step-by-step instructions for installing EtherPad, the open-source real-time text editor recently acquired by Google.
  4. Victorian Infographics — animals, time, and space from the Victorians. It’s beautiful, it’s meaningful, it must be infoengravings.
Comments Off on Four short links: 30 December 2009
Four short links: 27 November 2009

Four short links: 27 November 2009

3D Models from Webcams, a Javascript Scheme, EMACS in Your Browser, and CS History

  1. ProFORMA — software which builds a 3D model as you rotate an object in front of your webcam. Check out the video below. (via Wired)
  2. BiwaScheme — a Scheme interpreter written in Javascript. (via Hacker News)
  3. YMacs — in-browser EMACS written in Javascript. Emacs, for those of you who were left in any doubt, is the only editor ever created by software engineers worth a damn (where “worth a damn” == “has possibly already achieved sentience”) with the possible exception of teco.
  4. Historic Documents in Computer Science — my eye was caught by John Backus’s first FORTRAN manual, Niklaus Wirth’s original Pascal paper, the BCPL reference manual (the C programming language got its name from the C in BCPL), and Eckert and Mauchly’s ENIAC patent. (via Hacker News)

Comments: 2