- Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
- Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to […] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
- Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
- Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
Text Analysis Bundle, Scala Probabilistic Modeling, Game Analytics, and Encouraging Writing
Rare Visualization, Google+ Tech, Scala+Erlang, and In-Database Analytics
- Slopegraphs — a nifty Tufte visualization which conveys rank, value, and delta over time. Includes pointers to how to make them, and guidelines for when and how they work. (via Avi Bryant)
- scalang (github) — a Scala wrapper that makes it easy to interface with Erlang, so you can use two hipster-compliant built-to-scale technologies in the same project. (via Justin Sheehy)
- Madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. (via Mike Loukides)
Java is as much about the JVM as it is the language.
This overview of JVM-based programming compares the relative strengths of the major languages.
DOM Snitch, Hadoop in Scala, Pregel in Hadoop in Scala, Reflections on the Company
- DOM Snitch — an experimental Chrome extension that enables developers and testers to identify insecure practices commonly found in client-side code. See also the introductory post. (via Hacker News)
- Spark — Hadoop-alike in Scala. Spark was initially developed for two applications where keeping data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can outperform Hadoop by 30x. However, you can use Spark’s convenient API to for general data processing too. (via Hilary Mason)
- Bagel — an implementation of the Pregel graph processing framework on Spark. (via Oliver Grisel)
- Week 315 (Matt Webb) — read this entire post. It will make you smarter. The company’s decisions aren’t actually the shareholders’ decisions. A company has a culture which is not the simple sum of the opinions of the people in it. A CEO can never be said to perform an action in the way that a human body can be said to perform an action, like picking an apple. A company is a weird, complex thing, and rather than attempt (uselessly) to reduce it to people within it, it makes more sense – to me – to approach it as an alien being and attempt to understand its biology and momentums only with reference to itself. Having done that, we can then use metaphors to attempt to explain its behaviour: we can say that it follows profit, or it takes an innovative step, or that it is middle-aged, or that it treats the environment badly, or that it takes risks. None of these statements is literally true, but they can be useful to have in mind when attempting to negotiate with these bizarre, massive creatures. If anyone wonders why I link heavily to BERG’s work, it’s because they have some incredibly thoughtful and creative people who are focused and productive, and it’s Webb’s laser-like genius that makes it possible. They’re doing a lot of subtle new things and it’s a delight and privilege to watch them grow and reflect.
How Facebook Ships, EU Funds, Bacteria Play, and Screens Capture
- How Facebook Ships Code — all engineers go through 4 to 6 week “Boot Camp” training where they learn the Facebook system by fixing bugs and listening to lectures given by more senior/tenured engineers. estimate 10% of each boot camp’s trainee class don’t make it and are counseled out of the organization. Reminded me of Zappos paying people to leave. (via Hacker News)
- EU Funds Scala — it’s a research project at a university, and just got a big pile of funding from the EU.
- Biotic Games — they make Pong, Pacman, Pinball, etc. from biotech. (via Andy Baio)
- Asleep and Awake (BERG London) — It’s glowing rectangles all the way down: those backlit screens that suck your attention. Matt J described it nicely a few years ago: the iPhone is a beautiful, seductive but jealous mistress that craves your attention, and enslaves you to its jaw-dropping gorgeousness at the expense of the world around you. Reminded me of Jesse Robbins’s great line, “mobile is the opposite of mindful”.
Java's wild ride, multicore drives functional, and a look at how the usual programming suspects stacked up in 2010.
This year brought confusion and chaos in the Java space, continued growth for functional languages due to the attack of multicore, and the usual popularity for all of the dynamic languages we know and love.
Etherpad, Scala, Journalism, and Mazes from Ruby
- ietherpad — continuation of the etherpad startup. Offers pro accounts, and promise an iPad app to come. (via Steve O’Grady on Twitter)
- Scala Collections Quickref — quick reference card for the Scala collections classes. (via Ian Kallen on Twitter)
- Raw Data and the Rise of Little Brother — Turns out, despite the great push for citizen journalism, citizens are not, on average, great at “journalism.” But they are excellent conduits for raw material — those documents, videos, or photos.
- Theseus 1.0 — impressive source maze builder in Ruby contributed to the public domain. (via Hacker News)
European Economic Crisis, Scaling Guardian API, Cheerful Pessimism, and Science Mapping
- Lending Merry-Go-Round — these guys have been Australia’s sharpest satire for years, filling the role of the Daily Show. Here they ask some strong questions about the state of Europe’s economies … (via jdub on Twitter)
- What’s Powering the Guardian’s Content API — Scala and Solr/Lucene on EC2 is the short answer. The long answer reveals the details of their setup, including some of their indexing tricks that means Solr can index all their content in just an hour. (via Simon Willison)
- What I Learned About Engineering from the Panama Canal (Pete Warden) — I consider myself a cheerful pessimist. I’ve been through enough that I know how steep the odds of success are, but I’ve made a choice that even a hopeless fight in a good cause is worthwhile. What a lovely attitude!
- Mapping the Evolution of Scientific Fields (PLoSone) — clever use of data. We build an idea network consisting of American Physical Society Physics and Astronomy Classification Scheme (PACS) numbers as nodes representing scientific concepts. Two PACS numbers are linked if there exist publications that reference them simultaneously. We locate scientific fields using a community finding algorithm, and describe the time evolution of these fields over the course of 1985-2006. The communities we identify map to known scientific fields, and their age depends on their size and activity. We expect our approach to quantifying the evolution of ideas to be relevant for making predictions about the future of science and thus help to guide its development.