"Python" entries

Four short links: 28 February 2014

Four short links: 28 February 2014

Minecraft+Pi+Python, Science Torrents, Web App Performance Measurement, and Streaming Data

  1. Programming Minecraft Pi with Python — an early draft, but shows promise for kids. (via Raspberry Pi)
  2. Terasaur — BitTorrent for mad-large files, making it easy for datasets to be saved and exchanged.
  3. BuckyOpen-source tool to measure the performance of your web app directly from your users’ browsers. Nifty graph.
  4. Zoe Keating’s Streaming Payouts — actual data on a real musician’s distribution and revenues through various channels. Hint: streaming is tragicomically low-paying. (via Andy Baio)
Comment
Four short links: 31 January 2014

Four short links: 31 January 2014

Mobile Libraries, Python Idioms, Graphics Book, and Declining Returns on Aging Link Bait

  1. Bolts — Facebook’s library of small, low-level utility classes in iOS and Android.
  2. Python Idioms (PDF) — useful cheatsheet.
  3. Michael Abrash’s Graphics Programming Black Book — Markdown source in github. Notable for elegance and instructive for those learning to optimise. Coder soul food.
  4. About Link Bait (Anil Dash) — excellent consideration of Upworthy’s distinctive click-provoking headlines, but my eye was caught by we often don’t sound like 2012 Upworthy anymore. Because those tricks are starting to dilute click rates. from Upworthy’s editor-at-large. Attention is a scarce resource, and our brains are very good at filtering.
Comment

IPython: A unified environment for interactive data analysis

It has roots in academic scientific computing, but has features that appeal to many data scientists

As I noted in a recent post on reproducing data projects, notebooks have become popular tools for maintaining, sharing, and replicating long data science workflows. Much of that is due to the popularity of IPython1. In development since 2001, IPython grew out of the scientific computing community and has slowly added features that appeal to data scientists.

Roots in academic scientific computing
As IPython creator Fernando Perez noted in his “historical retrospective”, exploratory analysis in a scientific setting requires a solid interactive environment. After years of development IPython has become a great tool for interacting with data. IPython also addresses other important pain points for scientists – reproducibility and collaboration – issues that are equally important to data scientists working in industry.

The Lifecycle of a Scientific Idea (schematically)

IPython is more than just Python
With an interactive widget architecture that’s 100% language-agnostic, these days IPython is used by many other programming language communities2, including Julia, Haskell, F#, Ruby, Go, and Scala. If you’re a data scientist who likes to mix-and-match languages, you can create, maintain, and share multi-language data projects in IPython:

Read more…

Comment

Code Carabiners: Essential Protection Tools for Safe Programming

Assertions, regression tests, and version control

Programming any non-trivial piece of software feels like rock climbing up the side of a mountain. The larger and more complex the software, the higher the peak.

You can’t make it to the top in one fell swoop, so you need to take careful steps, anchor your harnesses for safety, and set up camp to rest. Each time you start coding on your project, your sole goal is to make some progress up that mountain. You might struggle a bit to get set up at first, but once you get going, progress will be fast as you get the basic cases working. That’s the fun part; you’re in flow and slinging out dozens of lines of code at a time, climbing up that mountain step by steady step. You feel energized.

However, as you keep climbing, it will get harder and harder to write each subsequent line. When you run your program on larger data sets or with real user inputs, errors arise from rare edge cases that you didn’t plan for, and soon enough, that conceptually elegant design in your head gives way to a tangled mess of patches and bug fixes. Your software starts getting brittle and collapsing under its own weight.

Read more…

Comment
Four short links: 10 December 2013

Four short links: 10 December 2013

Flexible Data, Google's Bottery, GPU Assist Deep Learning, and Open Sourcing

  1. ArangoDBopen-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.
  2. Google’s Seven Robotics Companies (IEEE) — The seven companies are capable of creating technologies needed to build a mobile, dexterous robot. Mr. Rubin said he was pursuing additional acquisitions. Rundown of those seven companies.
  3. Hebel (Github) — GPU-Accelerated Deep Learning Library in Python.
  4. What We Learned Open Sourcing — my eye was caught by the way they offered APIs to closed source code, found and solved performance problems, then open sourced the fixed code.
Comment: 1
Four short links: 21 November 2013

Four short links: 21 November 2013

Offline Design, Full Text, Parsing Library, and Node Streams

  1. Network Connectivity Optional (Luke Wroblewski) — we need progressive enhancement: assume people are offline, then enhance if they are actually online.
  2. Whoosh fast, featureful full-text indexing and searching library implemented in pure Python
  3. Flanker (GitHub) — open source address and MIME parsing library in Python. (via Mailgun Blog)
  4. Stream Adventure (Github) — interactive exercises to help you understand node streams.
Comment

Handling Data at a New Particle Accelerator

Unlocking Scientific Data with Python

dust_accelerator

Most people working on complex software systems have had That Moment, when you throw up your hands and say “If only we could start from scratch!” Generally, it’s not possible. But every now and then, the chance comes along to build a really exciting project from the ground up.

In 2011, I had the chance to participate in just such a project: the acquisition, archiving and database systems which power a brand-new hypervelocity dust accelerator at the University of Colorado.

Read more…

Comment
Four short links: 15 November 2013

Four short links: 15 November 2013

Scan Win, Watson Platform, Metal Printer, and Microcontroller Python

  1. Google Wins Book Scanning Case (Giga Om) — will probably be appealed, though many authors will fear it’s good money after bad tilting at the fair use windmill.
  2. IBM Watson To Be A Platform (IBM) — press release indicates you’ll soon be able to develop your own apps that use Watson’s machine learning and text processing.
  3. MiniMetalMaker (IndieGogo) — 3D printer that can print detailed objects from specially blended metal clay and fire.
  4. MicroPython (KickStarter) — Python for Microcontrollers.
Comment: 1
Four short links: 12 November 2013

Four short links: 12 November 2013

Coding for Unreliability, AirBnB JS Style, Category Theory, and Text Processing

  1. Quantitative Reliability of Programs That Execute on Unreliable Hardware (MIT) — As MIT’s press release put it: Rely simply steps through the intermediate representation, folding the probability that each instruction will yield the right answer into an estimation of the overall variability of the program’s output. (via Pete Warden)
  2. AirBNB’s Javascript Style Guide (Github) — A mostly reasonable approach to JavaScript.
  3. Category Theory for Scientists (MIT Courseware) — Scooby snacks for rationalists.
  4. Textblob — Python open source text processing library with sentiment analysis, PoS tagging, term extraction, and more.
Comment

Mining the social web, again

If you want to engage with the data that's surrounding you, Mining the Social Web is the best place to start.

When we first published Mining the Social Web, I thought it was one of the most important books I worked on that year. Now that we’re publishing a second edition (which I didn’t work on), I find that I agree with myself. With this new edition, Mining the Social Web is more important than ever.

While we’re seeing more and more cynicism about the value of data, and particularly “big data,” that cynicism isn’t shared by most people who actually work with data. Data has undoubtedly been overhyped and oversold, but the best way to arm yourself against the hype machine is to start working with data yourself, to find out what you can and can’t learn. And there’s no shortage of data around. Everything we do leaves a cloud of data behind it: Twitter, Facebook, Google+ — to say nothing of the thousands of other social sites out there, such as Pinterest, Yelp, Foursquare, you name it. Google is doing a great job of mining your data for value. Why shouldn’t you?

There are few better ways to learn about mining social data than by starting with Twitter; Twitter is really a ready-made laboratory for the new data scientist. And this book is without a doubt the best and most thorough approach to mining Twitter data out there. Read more…

Comments: 2