ENTRIES TAGGED "machine learning"
Neuromancer Game, Ray Ozzie, Sentiment Analysis, and Open Science Prizes
- Case and Molly, a Game Inspired by Neuromancer (Greg Borenstein) — On reading Neuromancer today, this dynamic feels all too familiar. We constantly navigate the tension between the physical and the digital in a state of continuous partial attention. We try to walk down the street while sending text messages or looking up GPS directions. We mix focused work with a stream of instant message and social media conversations. We dive into the sudden and remote intimacy of seeing a family member’s face appear on FaceTime or Google Hangout. “Case and Molly” uses the mechanics and aesthetics of Neuromancer’s account of cyberspace/meatspace coordination to explore this dynamic.
- Rethinking Ray Ozzie — an inescapable conclusion: Ray Ozzie was right. And Microsoft’s senior leadership did not listen, certainly not at the time, and perhaps not until it was too late. Hear, hear!
- Recursive Deep Models for Semantic Compositionality
Over a Sentiment Treebank (PDF) — apparently it nails sentiment analysis, and will be “open sourced”. At least, according to this GigaOm piece, which also explains how it works.
- PLoS ASAP Award Finalists Announced — with pointers to interviews with the finalists, doing open access good work like disambiguating species names and doing open source drug discovery.
Translation Glasses, Diagramming, Offline Gmail, and WTF Computation
- Instant Translator Glasses (ZDNet) — character recognition to do instant translating, and a UI that turns any flat surface into a touch-screen via a finger-ring sensor.
- draw.io — diagramming … In The Cloud!
- Airmail — Mac gmail client with offline mode that fails to suck.
- The Page-Fault Weird Machine: Lessons in Instruction-less Computation (Usenix) — video, audio, and text of a paper that’ll make your head hurt. We demonstrate a Turing-complete execution environment driven solely by the IA32 architecture’s interrupt handling and memory translation tables, in which the processor is trapped in a series of page faults and double faults, without ever successfully dispatching any instructions. LOLWUT?!
Google Code Analysis, Deep Learning, Front-End Workflow, and SICP in JS
- Steve Yegge on GROK (YouTube) — The Grok Project is an internal Google initiative to simplify the navigation and querying of very large program source repositories. We have designed and implemented a language-neutral, canonical representation for source code and compiler metadata. Our data production pipeline runs compiler clusters over all Google’s code and third-party code, extracting syntactic and semantic information. The data is then indexed and served to a wide variety of clients with specialized needs. The entire ecosystem is evolving into an extensible platform that permits languages, tools, clients and build systems to interoperate in well-defined, standardized protocols.
- Deep Learning for Semantic Analysis — When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effect of contrastive conjunctions as well as negation and its scope at various tree levels for both positive and negative phrases.
- Fireshell — workflow tools and framework for front-end developers.
Drones Dismissed, Drones Denied, Passing PRISM, and Data Analysis and Mining
- UAV Offers of Assistance in Colorado Rebuffed by FEMA — we were told by FEMA that anyone flying drones would be arrested. [...] Civil Air Patrol and private aircraft were authorized to fly over the small town tucked into the base of Rockies. Unfortunately due to the high terrain around Lyons and large turn radius of manned aircraft they were flying well out of a useful visual range and didn’t employ cameras or live video feed to support the recovery effort. Meanwhile we were grounded on the Lyons high school football field with two Falcons that could have mapped the entire town in less than 30 minutes with another few hours to process the data providing a near real time map of the entire town.
- Texas Bans Some Private Use of Drones (DIY Drones) — growing move for govt to regulate drones.
- IETF PRISM-Proof Plans (Parity News) — Baker starts off by listing out the attack degree including he likes of information / content disclosure, meta-data analysis, traffic analysis, denial of service attacks and protocol exploits. The author than describes the different capabilities of an attacker and the ways in which an attack can be carried out – passive observation, active modification, cryptanalysis, cover channel analysis, lawful interception, Subversion or Coercion of Intermediaries among others.
- Data Mining and Analysis: Fundamental Concepts and Algorithms (PDF) — 650 pages on cluster, sequence mining, SVNs, and more. (via author’s page)
Constant KV Store, Google Me, Learned Bias, and DRM-Stripping Lego Robot
- Sparkey — Spotify’s open-sourced simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.
- The Truth of Fact, The Truth of Feeling (Ted Chiang) — story about what happens when lifelogs become searchable. Now with Remem, finding the exact moment has become easy, and lifelogs that previously lay all but ignored are now being scrutinized as if they were crime scenes, thickly strewn with evidence for use in domestic squabbles. (via BoingBoing)
- Algorithms Magnifying Misbehaviour (The Guardian) — when the training set embodies biases, the machine will exhibit biases too.
- Lego Robot That Strips DRM Off Ebooks (BoingBoing) — so. damn. cool. If it had been controlled by a C64, Cory would have hit every one of my geek erogenous zones with this find.
The Internet of Americas, Pharma Pricey, Who's Watching, and Data Mining Course
- Bradley Manning and the Two Americas (Quinn Norton) — The first America built the Internet, but the second America moved onto it. And they both think they own the place now. The best explanation you’ll find for wtf is going on.
- Staggering Cost of Inventing New Drugs (Forbes) — $5BB to develop a new drug; and subject to an inverse-Moore’s law: A 2012 article in Nature Reviews Drug Discovery says the number of drugs invented per billion dollars of R&D invested has been cut in half every nine years for half a century.
- Who’s Watching You — (Tim Bray) threat modelling. Everyone should know this.
- Data Mining with Weka — learn data mining with the popular open source Weka platform.
Better Tutorials, Self-Talk, Better AI, and Visualised Mechanics
- pineapple.io — attempt to crowdsource rankings for tutorials for important products, so you’re not picking your way through Google search results littered with tutorials written by incompetent illiterates for past versions of the software.
- BBC Forum — American social psychologist Aleks Krotoski has been looking at how the internet affects the way we talk to ourselves. Podcast (available for next 30 days) from BBC. (via Vaughan Bell)
- Why Can’t My Computer Understand Me (New Yorker) — using anaphora as the basis of an intelligence test, as example of what AI should be striving for. It’s not just that contemporary A.I. hasn’t solved these kinds of problems yet; it’s that contemporary A.I. has largely forgotten about them. In Levesque’s view, the field of artificial intelligence has fallen into a trap of “serial silver bulletism,” always looking to the next big thing, whether it’s expert systems or Big Data, but never painstakingly analyzing all of the subtle and deep knowledge that ordinary human beings possess. That’s a gargantuan task— “more like scaling a mountain than shoveling a driveway,” as Levesque writes. But it’s what the field needs to do.
- 507 Mechanical Movements — an old basic engineering textbook, animated. Me gusta.
Thread Problems, Better Image Search, Open Standards, and GitHub Maps
- Multithreading is Hard — The compiler and the processor both conspire to defeat your threads by moving your code around! Be warned and wary! You will have to do battle with both. Sample code and explanation of WTF the eieio barrier is (hint: nothing to do with Old McDonald’s server farm). (via Erik Michaels-Ober)
- Improving Photo Search (Google Research) — volume of training images, number of CPU cores, and Freebase entities. (via Alex Dong)
- Is Google Dumping Open Standards for Open Wallets? (Matt Asay) — it’s easier to ship than standardise, to innovate than integrate, but the ux of a citizen in the real world is pants. Like blog posts? Log into Facebook to read your friends! (or Google+) Chat is great, but you’d better have one client per corporation your friends hang out on. Nobody woke up this morning asking for features to make web pages only work on one browser. The user experience of isolationism is ugly.
- GitHub Renders GeoJSON — Under the hood we use Leaflet.js to render the geoJSON data, and overlay it on a custom version of MapBox’s street view baselayer — simplified so that your data can really shine. Best of all, the base map uses OpenStreetMap data, so if you find an area to improve, edit away.
Deep Learning, Internet of ux Nightmares, Mozilla Science Lab, and Ground-Up Computing
- Weekend Reads on Deep Learning (Alex Dong) — an article and two videos unpacking “deep learning” such as multilayer neural networks.
- The Internet of Actual Things — “I have 10 reliable activations remaining,” your bulb will report via some ridiculous light-bulbs app on your phone. “Now just nine. Remember me when I’m gone.” (via Andy Baio)
- Announcing the Mozilla Science Lab (Kaitlin Thaney) — We also want to find ways of supporting and innovating with the research community – building bridges between projects, running experiments of our own, and building community. We have an initial idea of where to start, but want to start an open dialogue to figure out together how to best do that, and where we can be of most value..
- NAND to Tetris — The site contains all the software tools and project materials necessary to build a general-purpose computer system from the ground up. We also provide a set of lectures designed to support a typical course on the subject. (via Hacker News)
Open Source BigTable, Robots Lost, Changing the World, Secrecy Binge
- Accumulo — NSA’s BigTable implementation, released as an Apache project.
- How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting access to the technology that the traders had: Last fall the SEC said it would pay Tradeworx, a high-frequency trading firm, $2.5 million to use its data collection system as the basic platform for a new surveillance operation. Code-named Midas (Market Information Data Analytics System), it scours the market for data from all 13 public exchanges. Midas went live in February. The SEC can now detect anomalous situations in the market, such as a trader spamming an exchange with thousands of fake orders, before they show up on blogs like Nanex and ZeroHedge. If Midas sees something odd, Berman’s team can look at trading data on a deeper level, millisecond by millisecond.
- PRISM: Surprised? (Danny O’Brien) — I really don’t agree with the people who think “We don’t have the collective will”, as though there’s some magical way things got done in the past when everyone was in accord and surprised all the time. It’s always hard work to change the world. Endless, dull hard work. Ten years later, when you’ve freed the slaves or beat the Nazis everyone is like “WHY CAN’T IT BE AS EASY TO CHANGE THIS AS THAT WAS, BACK IN THE GOOD OLD DAYS. I GUESS WE’RE ALL JUST SHEEPLE THESE DAYS.”
- What We Don’t Know About Spying on Citizens is Scarier Than What We Do Know (Bruce Schneier) — The U.S. government is on a secrecy binge. It overclassifies more information than ever. And we learn, again and again, that our government regularly classifies things not because they need to be secret, but because their release would be embarrassing. Open source BigTable implementation: free. Data gathering operation around it: $20M/year. Irony in having the extent of authoritarian Big Brother government secrecy questioned just as a whistleblower’s military trial is held “off the record”: priceless.