Pattern — a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.
Reset (Rowan Simpson) — It was a bit chilling to go back over a whole years worth of tweets and discover how many of them were just junk. Visiting the water cooler is fine, but somebody who spends all day there has no right to talk of being full.
Google’s AI Brain — on the subject of Google’s AI ethics committee … Q: Will you eventually release the names? A: Potentially. That’s something also to be discussed. Q: Transparency is important in this too. A: Sure, sure. Such reassuring.
AVA is now Open Source (Laura Bell) — Assessment, Visualization and Analysis of human organisational information security risk. AVA maps the realities of your organisation, its structures and behaviors. This map of people and interconnected entities can then be tested using a unique suite of customisable, on-demand, and scheduled information security awareness tests.
Deep Learning for Torch (Facebook) — Facebook AI Research open sources faster deep learning modules for Torch, a scientific computing framework with wide support for machine learning algorithms.
The Devops Identity Crisis (Baron Schwartz) — I saw one framework-retailing bozo saying that devops was the art of ensuring there were no flaws in software. I didn’t know whether to cry or keep firing until the gun clicked.
Apache Giraph — an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections.
Apache Flink — a data processing system and an alternative to Hadoop’s MapReduce component. It comes with its own runtime, rather than building on top of MapReduce. As such, it can work completely independently of the Hadoop ecosystem. However, Flink can also access Hadoop’s distributed file system (HDFS) to read and write data, and Hadoop’s next-generation resource manager (YARN) to provision cluster resources. Since most Flink users are using Hadoop HDFS to store their data, we ship already the required libraries to access HDFS.
Internet of Things: Blackett Review — the British Government’s review of Internet of Things opportunities around government. Government and others can use expert commissioning to encourage participants in demonstrator programmes to develop standards that facilitate interoperable and secure systems. Government as a large purchaser of IoT systems is going to have a big impact if it buys wisely. (via Matt Webb)
rdbms-subsetter — open source tool to generate a random sample of rows from a relational database that preserves referential integrity – so long as constraints are defined, all parent rows will exist for child rows. (via 18F)
UXcheck — a browser extension to help you do a quick UX check against Nielsen’s 10 principles.
The Toxoplasma of Rage — It’s in activists’ interests to destroy their own causes by focusing on the most controversial cases and principles, the ones that muddy the waters and make people oppose them out of spite. And it’s in the media’s interest to help them and egg them on.
Samza: LinkedIn’s Stream-Processing Engine — Samza’s goal is to provide a lightweight framework for continuous data processing. Unlike batch processing systems such as Hadoop, which typically has high-latency responses (sometimes hours), Samza continuously computes results as data arrives, which makes sub-second response times possible.
Jane Jacobs on Strangers (Nina Simon) — Many of us live in towns where we rarely have the opportunity for this kind of anonymous, safe, positive social contact. This is a problem. It means we smile less at strangers. We take care of each other less. We fear it opens up a social contract for too much more. There’s an analogous gap in online social media, where it feels like there are all too few social contract-building public Internet spaces.
PANDA — an open-source Platform for Architecture-Neutral Dynamic Analysis. It is built upon the QEMU whole system emulator, so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Further, the replay log files are compact and shareable, allowing for repeatable experiments.
Killer App for Wearables (Fortune) — While many corporations are still waiting to see what the “killer app” for wearables is, Disney invented one. The company launched the RFID-enabled MagicBands just over a year ago. Since then, they’ve given out more than 9 million of them. Disney says 75% of MagicBand users engage with the “experience”—a website called MyMagic+—before their visit to the park. Online, they can connect their wristband to a credit card, book fast passes (which let you reserve up to three rides without having to wait in line), and even order food ahead of time. […] Already, Disney says, MagicBands have led to increased spending at the park.
globalnamedata — We have collected birth record data from the United States and the United Kingdom across a number of years for all births in the two countries and are releasing the collected and cleaned up data here. We have also generated a simple gender classifier based on incidence of gender by name.
geogig — an open source tool that draws inspiration from Git, but adapts its core concepts to handle distributed versioning of geospatial data.
From Gongkai to Open Source (Bunnie Huang) — The West has a “broadcast” view of IP and ownership: good ideas and innovation are credited to a clearly specified set of authors or inventors, and society pays them a royalty for their initiative and good works. China has a “network” view of IP and ownership: the far-sight necessary to create good ideas and innovations is attained by standing on the shoulders of others, and as such there is a network of people who trade these ideas as favors among each other. In a system with such a loose attitude toward IP, sharing with the network is necessary as tomorrow it could be your friend standing on your shoulders, and you’ll be looking to them for favors. This is unlike the West, where rule of law enables IP to be amassed over a long period of time, creating impenetrable monopoly positions. It’s good for the guys on top, but tough for the upstarts.