"text analysis" entries

Four short links: 27 August 2015

Four short links: 27 August 2015

Chrome as APT, Nature's Mimicry, Information Extraction, and Better 3D Printing

  1. The Advanced Persistent Threat You Have: Google Chrome (PDF) — argues that if you can’t detect and classify Google Chrome’s self-updating behavior, you’re not in a position to know when you’re hit by malware that also downloads and executes code from the net that updates executables and system files.
  2. Things Mimicking Other Things — nifty visual catalog/graph of camouflage and imitation in nature.
  3. MITIE — permissively-licensed (Boost) tools for named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.
  4. MultiFab Prints 10 Materials At Once — and uses computer vision to self-calibrate and self-correct, as well as letting users embed objects (e.g., circuit boards) in the print. developed by CSAIL researchers from low-cost, off-the-shelf components that cost a total of $7,000
Four short links: 11 February 2015

Four short links: 11 February 2015

Crowdsourcing Working, etcd DKVS, Psychology Progress, and Inferring Logfile Rules

  1. Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
  2. etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
  3. You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)
  4. Sequence: Automated Analyzer for Reducing 100k Messages to 10s of Patterns — induces patterns from the examples in log files.
Four short links: 2 February 2015

Four short links: 2 February 2015

Weather Forecasting, Better Topic Modelling, Cyberdefense, and Facebook Warriors

  1. Global Forecast System — National Weather Service open sources its weather forecasting software. Hope you have a supercomputer and all the data to make use of it …
  2. High-reproducibility and high-accuracy method for automated topic classificationLatent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
  3. Army Open Sources Cyberdefense Codegit push is the new “for immediate release”.
  4. British Army Creates Team of Facebook Warriors (The Guardian) — no matter how much I know the arguments for it, it still feels vile.
Four short links: 17 February 2014

Four short links: 17 February 2014

Commandline iMessage, Lovely Data, Software Plagiarism Detection, and 3D GIFs

  1. imsg — use iMessage from the commandline.
  2. Facebook Data Science Team Posts About Love — I tell people, “this is what you look like to SkyNet.”
  3. A System for Detecting Software Plagiarism — the research behind the undergraduate bete noir.
  4. 3D GIFs — this is awesome because brain.
Four short links: 4 October 2013

Four short links: 4 October 2013

Neuromancer Game, Ray Ozzie, Sentiment Analysis, and Open Science Prizes

  1. Case and Molly, a Game Inspired by Neuromancer (Greg Borenstein) — On reading Neuromancer today, this dynamic feels all too familiar. We constantly navigate the tension between the physical and the digital in a state of continuous partial attention. We try to walk down the street while sending text messages or looking up GPS directions. We mix focused work with a stream of instant message and social media conversations. We dive into the sudden and remote intimacy of seeing a family member’s face appear on FaceTime or Google Hangout. “Case and Molly” uses the mechanics and aesthetics of Neuromancer’s account of cyberspace/meatspace coordination to explore this dynamic.
  2. Rethinking Ray Ozziean inescapable conclusion: Ray Ozzie was right. And Microsoft’s senior leadership did not listen, certainly not at the time, and perhaps not until it was too late. Hear, hear!
  3. Recursive Deep Models for Semantic Compositionality
    Over a Sentiment Treebank
    (PDF) — apparently it nails sentiment analysis, and will be “open sourced”. At least, according to this GigaOm piece, which also explains how it works.
  4. PLoS ASAP Award Finalists Announced — with pointers to interviews with the finalists, doing open access good work like disambiguating species names and doing open source drug discovery.
Four short links: 12 July 2013

Four short links: 12 July 2013

Name Analysis, Old UIs, Browser Crypto Social Network, and Smart Watch Displays

  1. How Well Does Name Analysis Work? (Pete Warden) — explanation of how those “turn a name into gender/ethnicity/etc” routines work, and how accurate they are. Age has the weakest correlation with names. There are actually some strong patterns by time of birth, with certain names widely recognized as old-fashioned or trendy, but those tend to be swamped by class and ethnicity-based differences in the popularity of names.
  2. Old Interfaces — a lazy-scrolling interface to Andy Baio’s collection of faux UIs from movies. (via Andy Baio)
  3. Pidder — browser-crypto’d social network, address book, messaging, RSS reader, and more.
  4. What I Learned From Researching Almost Every Single Smart Watch That Has Been Rumoured or Announced (Quartz) — interesting roundup of the different display technologies used in each of the smartwatches.
Four short links: 28 May 2013

Four short links: 28 May 2013

Geeky Primer, Visible CSS, Remote Working, and Raspberry Pi Sentiment Server

  1. My Little Geek — children’s primer with a geeky bent. A is for Android, B is for Binary, C is for Caffeine …. They have a Kickstarter for two sequels: numbers and shapes.
  2. Visible CSS RulesEnter a url to see how the css rules interact with that page.
  3. How to Work Remotely — none of this is rocket science, it’s all true and things we had to learn the hard way.
  4. Raspberry Pi Twitter Sentiment Server — step-by-step guide, and github repo for the lazy. (via Jason Bell)
Four short links: 12 April 2013

Four short links: 12 April 2013

Wikileaks Code, Account Afterlife, Digital in Museums, and Companies and Conferences

  1. Wikileaks ProjectK Code (Github) — open-sourced map and graph modules behind the Wikileaks code serving Kissinger-era cables. (via Journalism++)
  2. Plan Your Digital Afterlife With Inactive Account Manageryou can choose to have your data deleted — after three, six, nine or 12 months of inactivity. Or you can select trusted contacts to receive data from some or all of the following services: +1s; Blogger; Contacts and Circles; Drive; Gmail; Google+ Profiles, Pages and Streams; Picasa Web Albums; Google Voice and YouTube. Before our systems take any action, we’ll first warn you by sending a text message to your cellphone and email to the secondary address you’ve provided. (via Chris Heathcote)
  3. Leo Caillard: Art GamesCaillard’s images show museum patrons interacting with priceless paintings the way someone might browse through slides in a personal iTunes library on a device like an iPhone or MacBook. Playful and thought-provoking. (via Beta Knowledge)
  4. Lanyrd Pro — helping companies keep track of which events their engineers speak at, so they can avoid duplication and have maximum opportunity to promote it. First paid product from ETecher and Foo Simon Willison’s startup.
Four short links: 26 Feb 2013

Four short links: 26 Feb 2013

Data Classes, Short Text Classification, Movie Lovers, and Mesh Networking for Hardware Hackers

  1. School of Data — free online courses around data science and visualization.
  2. libshorttext — classify and analyse short-text of things like titles, questions, sentences, and short messages. MIT-style open source license, Python and C++ source.
  3. Letterboxd — a site for movie lovers from Kiwi Foo alums. I love people who build experiences to help people express their love of things.
  4. RadioBlocks and SimpleMesh — mesh networking for Arduino.
Four short links: 18 December 2012

Four short links: 18 December 2012

Tweet Cred, C64 History, Performance Articles, Return of Manufacturing

  1. Credibility Ranking of Tweets During High Impact Events (PDF) — interesting research. Situational awareness information is information that leads to gain in the knowledge or update about details of the event, like the location, people affected, causes, etc. We found that on average, 30% content about an event, provides situational awareness information about the event, while 14% was spam. (via BoingBoing)
  2. The Commodore 64 — interesting that Chuck Peddle (who designed the 6502) and Bob Yannes (who designed the SID chip) are still alive. This article safely qualifies as Far More Than You Ever Thought You Wanted To Know About The C64 but it is fascinating. The BASIC housed in its ROM (“BASIC 2.0″) was painfully antiquated. It was actually the same BASIC that Tramiel had bought from Microsoft for the original PET back in 1977. Bill Gates, in a rare display of naivete, sold him the software outright for a flat fee of $10,000, figuring Commodore would have to come back soon for another, better version. He obviously didn’t know Jack Tramiel very well. Ironically, Commodore did have on hand a better BASIC 4.0 they had used in some of the later PET models, but Tramiel nixed using it in the Commodore 64 because it would require a more expensive 16 K rather than 8 K of ROM chips to house.
  3. The Performance Calendar — an article each day about speed. (via Steve Souders)
  4. Mr China Comes to America (The Atlantic) — long piece on the return of manufacturing to America, featuring Foo camper Liam Casey.