"Python" entries

Four short links: 10 December 2013

Four short links: 10 December 2013

Flexible Data, Google's Bottery, GPU Assist Deep Learning, and Open Sourcing

  1. ArangoDBopen-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.
  2. Google’s Seven Robotics Companies (IEEE) — The seven companies are capable of creating technologies needed to build a mobile, dexterous robot. Mr. Rubin said he was pursuing additional acquisitions. Rundown of those seven companies.
  3. Hebel (Github) — GPU-Accelerated Deep Learning Library in Python.
  4. What We Learned Open Sourcing — my eye was caught by the way they offered APIs to closed source code, found and solved performance problems, then open sourced the fixed code.
Four short links: 21 November 2013

Four short links: 21 November 2013

Offline Design, Full Text, Parsing Library, and Node Streams

  1. Network Connectivity Optional (Luke Wroblewski) — we need progressive enhancement: assume people are offline, then enhance if they are actually online.
  2. Whoosh fast, featureful full-text indexing and searching library implemented in pure Python
  3. Flanker (GitHub) — open source address and MIME parsing library in Python. (via Mailgun Blog)
  4. Stream Adventure (Github) — interactive exercises to help you understand node streams.

Handling Data at a New Particle Accelerator

Unlocking Scientific Data with Python

dust_accelerator

Most people working on complex software systems have had That Moment, when you throw up your hands and say “If only we could start from scratch!” Generally, it’s not possible. But every now and then, the chance comes along to build a really exciting project from the ground up.

In 2011, I had the chance to participate in just such a project: the acquisition, archiving and database systems which power a brand-new hypervelocity dust accelerator at the University of Colorado.

Read more…

Four short links: 15 November 2013

Four short links: 15 November 2013

Scan Win, Watson Platform, Metal Printer, and Microcontroller Python

  1. Google Wins Book Scanning Case (Giga Om) — will probably be appealed, though many authors will fear it’s good money after bad tilting at the fair use windmill.
  2. IBM Watson To Be A Platform (IBM) — press release indicates you’ll soon be able to develop your own apps that use Watson’s machine learning and text processing.
  3. MiniMetalMaker (IndieGogo) — 3D printer that can print detailed objects from specially blended metal clay and fire.
  4. MicroPython (KickStarter) — Python for Microcontrollers.
Four short links: 12 November 2013

Four short links: 12 November 2013

Coding for Unreliability, AirBnB JS Style, Category Theory, and Text Processing

  1. Quantitative Reliability of Programs That Execute on Unreliable Hardware (MIT) — As MIT’s press release put it: Rely simply steps through the intermediate representation, folding the probability that each instruction will yield the right answer into an estimation of the overall variability of the program’s output. (via Pete Warden)
  2. AirBNB’s Javascript Style Guide (Github) — A mostly reasonable approach to JavaScript.
  3. Category Theory for Scientists (MIT Courseware) — Scooby snacks for rationalists.
  4. Textblob — Python open source text processing library with sentiment analysis, PoS tagging, term extraction, and more.

Mining the social web, again

If you want to engage with the data that's surrounding you, Mining the Social Web is the best place to start.

When we first published Mining the Social Web, I thought it was one of the most important books I worked on that year. Now that we’re publishing a second edition (which I didn’t work on), I find that I agree with myself. With this new edition, Mining the Social Web is more important than ever.

While we’re seeing more and more cynicism about the value of data, and particularly “big data,” that cynicism isn’t shared by most people who actually work with data. Data has undoubtedly been overhyped and oversold, but the best way to arm yourself against the hype machine is to start working with data yourself, to find out what you can and can’t learn. And there’s no shortage of data around. Everything we do leaves a cloud of data behind it: Twitter, Facebook, Google+ — to say nothing of the thousands of other social sites out there, such as Pinterest, Yelp, Foursquare, you name it. Google is doing a great job of mining your data for value. Why shouldn’t you?

There are few better ways to learn about mining social data than by starting with Twitter; Twitter is really a ready-made laboratory for the new data scientist. And this book is without a doubt the best and most thorough approach to mining Twitter data out there. Read more…

Dead Batteries Included

Recharging the Python standard library

It’s unfortunate that the official About Python page still describes Python’s standard library as having “batteries included.” Sure, some of those old standbys will keep your project going and going, but many of them are leaking acid all over the place. Guido Van Rossum, head developer of Python, has said “the stdlib offerings … are not very convenient and may not support popular idioms very well.” Five years ago, I always assumed the Python library contained the “best of breed” for all packages. These days, I tend to think the opposite.

To counteract this minor flaw, I keep a small “personal standard library.” I keep a pip requirements file listing all the packages I use in every project. A simple script automatically installs that file whenever I create a virtualenv for a new project. With the pip download cache enabled, this is a near-painless process.

Read more…

Four short links: 8 October 2013

Four short links: 8 October 2013

Video Editing, Game Engine, Python Debugger, and P2P VPN

  1. Lightworks — open source non-linear video editing software, with quite a history.
  2. Puzzlescript — open source puzzle game engine for HTML5.
  3. pudb — full-screen (text-mode) Python debugger.
  4. Freelanfree, open-source, multi-platform, highly-configurable and peer-to-peer VPN software.

Django Is Python’s Most Mature Web Framework

Testing, Python 3, and Dealing with Technical Debt

Nathan Yergler (@nyergler), Principal Engineer at Eventbrite, and I had a chance to talk Django at OSCON 2013. We talk about why Django is the go-to choice for Pythonistas and about the growing technical debt that each programmer has to deal with on Python projects and beyond.

Key highlights include:

  • Django is mature and feature complete amidst many Python frameworks [Discussed at 0:15]
  • Testing in Django leads to straightforward code that the next programmer can read as well as you can [Discussed at 1:02]
  • Dare we discuss Django’s weaknesses like: Is Django too monolithic? [Discussed at 2:43]
  • Django at long last supports Python 3! Check out Django 1.5 [Discussed at 4:06]
  • Dealing with technical debt while programming [Discussed at 5:36]

You can view the full interview here:

Read more…

Think about learning Bayes using Python

An interview with Allen Downey, the author of Think Bayes

Allen Downey

Allen Downey

When Mike first discussed Allen Downey’s Think Bayes book project with me, I remember nodding a lot. As the data editor, I spend a lot of time thinking about the different people within our Strata audience and how we can provide what I refer to “bridge resources”. We need to know and understand the environments that our users are the most comfortable in and provide them with the appropriate bridges in order to learn a new technique, language, tool, or …even math.  I’ve also been very clear that almost everyone will need to improve their math skills should they decide to pursue a career in data science. So when Mike mentioned that Allen’s approach was to teach math not using math…but using Python, I immediately indicated my support for the project. Once the book was written, I contacted Allen about an interview and he graciously took some time away from the start of the semester to answer a few questions about his approach, teaching, and writing.

How did the “Think” series come about? What led you to start the series?

Allen Downey: A lot of it comes from my experience teaching at Olin College. All of our students take a basic programming class in the first semester, and I discovered that I could use their programming skills as a pedagogic wedge. What I mean is if you know how to program, you can use that skill to learn everything else.

I started with Think Stats because statistics is an area that has really suffered from the mathematical approach. At a lot of colleges, students take a mathematical statistics class that really doesn’t prepare them to work with real data. By taking a computational approach I was able to explain things more clearly (at least I think so). And more importantly, the computational approach lets students dive in and work with real data right away.

At this point there are four books in the series and I’m working on the fifth. Think Python covers Python programming–it’s the prerequisite for all the other books. But once you’ve got basic Python skills, you can read the others in any order.

Read more…