ENTRIES TAGGED "Python"
The Changing Internet, Python Data Analysis, Society of Mind, and Gaming Proteins
- 1996 vs 2011 Infographic from Online University (Evolving Newsroom) — “AOL and Yahoo! may be the butt of jokes for young people, but both are stronger than ever in the Internet’s Top 10″. Plus ça change, plus c’est la même chose.
- Pandas — open source Python package for data analysis, fast and powerful. (via Joshua Schachter)
- The Society of Mind — MIT open courseware for the classic Marvin Minsky theory that explains the mind as a collection of simpler processes. The subject treats such aspects of thinking as vision, language, learning, reasoning, memory, consciousness, ideals, emotions, and personality. Ideas incorporate psychology, artificial intelligence, and computer science to resolve theoretical issues such as whole vs. parts, structural vs. functional descriptions, declarative vs. procedural representations, symbolic vs. connectionist models, and logical vs. common-sense theories of learning. (via Maria Popover)
- Gamers Solve Problem in AIDS Research That Puzzled Scientists for Years (Ed Yong) — researchers put a key protein from an HIV-related virus onto the Foldit game. If we knew where the halves joined together, we could create drugs that prevented them from uniting. But until now, scientists have only been able to discern the structure of the two halves together. They have spent more than ten years trying to solve structure of a single isolated half, without any success. The Foldit players had no such problems. They came up with several answers, one of which was almost close to perfect. In a few days, Khatib had refined their solution to deduce the protein’s final structure, and he has already spotted features that could make attractive targets for new drugs. Foldit is a game where players compete to find the best shape for a protein, but it’s capable of being played by anyone–barely an eighth of players work in science.
STM in Python, Static Web is Back, Cyberwar, and Virtual Language Education
- STM in PyPy — a proposal to add software transactional memory to the all-Python Python interpreter as a way of simplifying concurrent programming. I first learned about STM from Haskell’s Simon Peyton-Jones at OSCON. (via Nelson Minar)
- Werner Vogels’ Static Web Site on S3 — nice writeup of the toolchain to publish a web site to static files served from S3.
- China Inadvertently Reveals State-Sponsored Hacking — if UK, US, France, Israel, or Chinese citizens believe their government doesn’t have malware and penetration teams working on extracting information from foreign governments, they’re dreaming.
- MyChinese360 — virtual foreign language instruction in Mandarin, including “virtual visits” to Chinese landmarks. The ability to get native speakers virtually into the classroom makes the Internet a huge asset for rural schools. (via Lucy Gray)
Tabular Data API, Open Stanford Courses, Wearable TV, and Wearable Sensors
- Tablib — MIT-licensed open source library for manipulating tabular data. Reputed to have a great API. (via Tim McNamara)
- Stanford Education Everywhere — courses in CS, machine learning, math, and engineering that are open for all to take. Over 58,000 have already signed up for the introduction to machine learning taught by Peter Norvig, Google’s Director of Research.
- Wearable LED Television — 160×120 RGBs powered by a 12v battery, built for Burning Man (natch). (via Bridget McKendry)
- Temporary Tattoo Biosensors (Science News) — early work putting flexible sensors into temporary tattoos. (via BoingBoing)
Learning Adventure, Python Data Analysis, Lanyrd Technology, and New Sensor
- Hippocampus Text Adventure — written as an exercise in learning Python, you explore the hippocampus. It’s simple, but I like the idea of educational text adventures. (Well, educational in that you learn about more than the axe-throwing behaviour of the cave-dwelling dwarf)
- Pandas — BSD-licensed Python data analysis library.
- Building Lanyrd — Simon Willison’s talk (with slides) about the technology under Lanyrd and the challenges in building with and deploying it.
- Electronic Skin Monitors Heart, Brain, and Muscles (Discover Magazine blogs) — this is freaking awesome proof-of-concept. Interview with the creator of a skin-mounted sensor, attached like a sticker, is flexible, inductively powered, and much more. This represents a major step forward in possibilities for personal data-gathering. (via Courtney Johnston)
Graph ORM, Graphic Computation, Web Intents, and Async RPC
- Bulbflow — a Python framework for graph databases: it’s like an ORM for graphs. (via Joshua Schachter)
- Nomograms — the lost art of graphical computing. (via John D Cook)
- Web Intents — adding Android-style Intents to the web. Services register their intention to be able to handle an action on the user’s behalf. Applications request to start an Action of a certain verb (share, edit, view, pick etc) and the system will find the appropriate Services for the user to use based on the user’s preference.
- Finagle (GitHub) — Twitter’s asynchronous network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language. Finagle provides a rich set of tools that are protocol independent.
Sorting Out 9/11, Tagging Text, Unlocking Scientific Publishing, and Internet Archive's Meatspace Branch
- Sorting Out 9/11 (New Yorker) — the thorniest problem for the 9/11 memorial was the ordering of the names. Computer science to the rescue!
- Tagger — Python library for extracting tags (statistically significant words or phrases) from a piece of text.
- Free Science, One Paper at a Time (Wired) — Jonathan Eisen’s attempt to collect and distribute his father’s scientific papers (which were written while a federal employee, so in the public domain), thwarted by old-fashioned scientific publishing. “But now,” says Jonathan Eisen, “there’s this thing called the Internet. It changes not just how things can be done but how they should be done.”
- Internet Archive Launches Physical Archive — I’m keen to see how this develops, because physical storage has problems that digital does not. I’d love to see the donor agreement require the donor to give the archive full rights to digitize and distribute under open licenses. That’d put the Internet Archive a step in front of traditional archives, museums, libraries, and galleries, whose donor agreements typically let donors place arbitrary specifications on use and reuse (“must be inaccessible for 50 years”, “no commercial use”, “no use that compromises the work”, etc.), all of which are barriers to wholesale digitization and reuse.
Healthcare Data, C64 Emulator, Python Machine Learning, and Startup Success Stats
- E-Referral Evaluation Interim Findings — in general good, but note this: The outstanding system issues are an ongoing source of frustration and concern, including [...] automated data uptake from the GP [General Practitioner=family doctor] PMS [Patient Management System], that sometimes has clearly inaccurate or contradictory information. When you connect systems, you realize the limitations of the data in them.
- c64iphone (GitHub) — the source to an iPhone/iPad app from the store, released under GPLv3. It incorporates the Frodo emulator. Sweet Freedom.
- mlpy — machine learning Python library, a high-performance Python package for predictive modeling. It makes extensive use of NumPy to provide fast N-dimensional array manipulation and easy integration of C code. (via Joshua Schachter)
- What is The Truth Behind 9 Out of 10 Startups Fail? (Quora) — some very interesting pointers and statistics, such as Hall and Woodward (2007) analyze a dataset of all VC-backed firms and show the highly skewed distribution of outcomes. VC revenue averages $5 million per VC-backed company. Founding team averages $9 million per VC-backed company (most from small probability of great success). The economically rational founding team would sell at time of VC funding for $900,000 to avoid the undiversified risk. (via Hacker News)
Javascript Master Class, Stats for Pythonistas, CAM Floor, and HTML Extraction
- Javascript Trie Performance Analysis (John Resig) — if you program in Javascript and you’re not up to John’s skill level (*cough*) then you should read this and follow along. It’s a ride-along in the brain of a master.
- Think Stats — an introduction to statistics for Python programmers. (via Edd Dumbill)
- Bolefloor — they build curvy wooden floors. Instead of straightening naturally curvy wood (which is wasteful), they use CV and CAD/CAM to figure the smallest cuts to slot strips of wood together. It’s gorgeous, green, and geeky. (via BoingBoing)
- Extracting Article Text from HTML Documents — everyone’s doing it, now you know how. It’s the theory behind the lovingly hand-crafted magic of readability. (via Hacker News)
Data Manual, Data Processing, Piracy Report, and Fragile Free
- The Open Data Manual — a HOWTO for organisations wanting to open up data. This report discusses legal, social and technical aspects of open data. The manual can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data — why to go open, what open is, and the how to ‘open’ data.
- MDP — Modular Toolkit for Data-Processing. Open source Python toolkit embodying a pile of machine learning and signal processing algorithms. (via Joshua Schachter)
- Media Piracy in Emerging Economies — SSRC report. The study finds no systematic links between media piracy and organized crime or terrorism in any of the countries examined. Today, commercial pirates and transnational smugglers face the same dilemma as the legal industry: how to compete with free. (via BoingBoing)
- The Fragility of Free (Ben Brooks) — The fragility of free is a catchy term that describes what happens when the free money runs out. Or—perhaps more accurately—when the investors/founders/venture capitalists run out of cash, or patience, or both. Because at some point Twitter and all other companies have to make the move from ‘charity’ to ‘business’—or, put another way, they have to make the move from spending tons of money to making slightly more money than they spend. It’s at this moment that we begin to see the fragilities of the free system. Things that never had ads, get ads—things that were free, now cost a monthly fee. We have all seen it before with hundreds of services—many of which are no longer around. (via Marco Arment)
Python Unicode, Cognitive Enhancement, Journal Balk, Engineering SaaS
- Unicode in Python, Completely Demystified — a good introduction to Unicode in Python, which helped me with some code. (via Hacker News)
- A Ban on Brain-Boosting Drugs (Chronicle of Higher Education) — Simply calling the use of study drugs “unfair” tells us nothing about why colleges should ban them. If such drugs really do improve academic performance among healthy students (and the evidence is scant), shouldn’t colleges put them in the drinking water instead? After all, it would be unfair to permit wealthy students to use them if less privileged students can’t afford them. As we start to hack our bodies and minds, we’ll face more questions about legitimacy and ethics of those actions. Not, of course, about using coffee and Coca-Cola, ubiquitous performance-enhancing stimulants that are mysteriously absent from bans and prohibitions.
- Copywrongs — Matt Blaze spits the dummy on IEEE and ACM copyright policies. In particular, the IEEE is explicitly preventing authors from distributing copies of the final paper. We write scientific papers first and last because we want them read. When papers were disseminated solely in print form it might have been reasonable to expect authors to donate the copyright in exchange for production and distribution. Today, of course, this model seems, at best, quaintly out of touch with the needs of researchers and academics who no longer desire or tolerate the delay and expense of seeking out printed copies of far-flung documents. We expect to find on it on the open web, and not hidden behind a paywall, either.
- On the Engineering of SaaS — An upgrade process, for example, is an entirely different beast. Making it robust and repeatable is far less important than making it quick and reversible. This is because the upgrade only every happens once: on your install. Also, it only ever has to work right in one, exact variant of the environment: yours. And while typical customers of software can schedule an outage to perform an upgrade, scheduling downtime in SaaS is nearly impossible. So, you must be able to deploy new releases quickly, if not entirely seamlessly — and in the event of failure, rollback just as rapidly.