"text analytics" entries

Four short links: 17 March 2016

Four short links: 17 March 2016

Boozy Tweets, Quantum Games, Viva Vectors, and Being Fired

  1. Algorithm Identifies Tweets Sent Under the Influence of Alcohol (MIT TR) — notable for how they determined whether a Tweet was sent from home. They made a list of phrases like “home at last!” and had MTurkers confirm the Tweets were about being home, then used those as training data for an algorithm to identify other Tweets talking about home.
  2. Puzzle Game to Help Program Quantum Computers (New Scientist) — Devitt has turned the problem of programming a quantum computer into a game called meQuanics. His team has developed a prototype to test the game, which you can play now, and today launched a Kickstarter campaign to fund a fully fledged version for iOS and Android phones.
  3. Deep or Shallow, NLP is Breaking Out (ACM) — readable roundup of how NLP changed in the last five years, with a useful list for further reading and watching.
  4. Firing and Being Fired (Zach Holman) — advice for the fired, the firing, and the coworkers. All solid.
Comment
Four short links: 28 January 2016

Four short links: 28 January 2016

Augmented Intelligence, Social Network Limits, Microsoft Research, and Google's Go

  1. Chimera (Paper a Day) — the authors summarise six main lessons learned while building Chimera: (1) Things break down at large scale; (2) Both learning and hand-crafted rules are critical; (3) Crowdsourcing is critical, but must be closely monitored; (4) Crowdsourcing must be coupled with in-house analysts and developers; (5) Outsourcing does not work at a very large scale; (6) Hybrid human-machine systems are here to stay.
  2. Do Online Social Media Remove Constraints That Limit the Size of Offline Social Networks? (Royal Society) — paper by Robin Dunbar. Answer: The data show that the size and range of online egocentric social networks, indexed as the number of Facebook friends, is similar to that of offline face-to-face networks.
  3. Microsoft Embedding ResearchTo break down the walls between its research group and the rest of the company, Microsoft reassigned about half of its more than 1,000 research staff in September 2014 to a new group called MSR NExT. Its focus is on projects with greater impact to the company rather than pure research. Meanwhile, the other half of Microsoft Research is getting pushed to find more significant ways it can contribute to the company’s products. The challenge is how to avoid short-term thinking from your research team. For instance, Facebook assigns some staff to focus on long-term research, and Google’s DeepMind group in London conducts pure AI research without immediate commercial considerations.
  4. Google’s Go-Playing AIThe key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network,” predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network,” is then used to reduce the depth of the search tree — estimating the winner in each position in place of searching all the way to the end of the game.
Comment
Four short links: 14 January 2016

Four short links: 14 January 2016

DRM at W3C, Tractor DRM, Self-Driving Timeline, and Emoji Analytics

  1. You Can’t Destroy a Village to Save It (EFF) — EFF have a clever compromise for W3C’s proposal for DRM on the Web. [T]he W3C could have its cake and eat it, too. It could adopt a rule that requires members who help make DRM standards to promise not to sue people who report bugs in tools that conform to those standards, nor could they sue people just for making a standards-based tool that connected to theirs. They could make DRM, but only if they made sure that they took steps to stop that DRM from being used to attack the open Web. I hope the W3C take it.
  2. Copyright Law Shouldn’t Keep Me From Fixing a Tractor (Slate) — When a farmer friend of mine wanted to know if there was a way to tweak the copyrighted software of his broken tractor, I knew it was going to be rough. The only way to get around the DMCA’s restriction on software tinkering is to ask the Copyright Office for an exemption at the Section 1201 Rulemaking, an arduous proceeding that takes place just once every three years.
  3. License to DriveI have difficulty viewing No Drive Day as imminent. We’re maybe 95% there, but that last 5% will be a lengthy slog.
  4. Text, Sentiment & Social Analytics in the Year Ahead: 10 Trends — emoji analytics and machine-written content are the two that caught my eye.
Comment
Four short links: 29 December 2015

Four short links: 29 December 2015

Security Talks, Multi-Truth Discovery, Math Books, and Geek Cultures

  1. 2015 CCC Videos — collected talks from the 32nd Chaos Computer Congress conference.
  2. An Integrated Bayesian Approach for Effective Multi-Truth Discovery (PDF) — Integrating data from multiple sources has been increasingly becoming commonplace in both Web and the emerging Internet of Things (IoT) applications to support collective intelligence and collaborative decision-making. Unfortunately, it is not unusual that the information about a single item comes from different sources, which might be noisy, out-of-date, or even erroneous. It is therefore of paramount importance to resolve such conflicts among the data and to find out which piece of information is more reliable.
  3. Direct Links to Free Springer Books — Springer released a lot of math books.
  4. A Psychological Exploration of Engagement in Geek CultureSeven studies (N = 2354) develop the Geek Culture Engagement Scale (GCES) to quantify geek engagement and assess its relationships to theoretically relevant personality and individual differences variables. These studies present evidence that individuals may engage in geek culture in order to maintain narcissistic self-views (the great fantasy migration hypothesis), to fulfill belongingness needs (the belongingness hypothesis), and to satisfy needs for creative expression (the need for engagement hypothesis). Geek engagement is found to be associated with elevated grandiose narcissism, extraversion, openness to experience, depression, and subjective well-being across multiple samples.
Comment
Four short links: 22 December 2015

Four short links: 22 December 2015

Machine Poetry, Robo Script Kiddies, Big Data of Love, and Virtual Currency and the Nation State

  1. How Machines Write PoetryHarmon would love to have writers or other experts judge FIGURE8’s work, too. Her online subjects tended to rate the similes better if they were obvious. “The snow continued like a heavy rain” got high scores, for example, even though Harmon thought this was quite a bad effort on FIGURE8’s part. She preferred “the snow falls like a dead cat,” which got only middling ratings from humans. “They might have been cat lovers,” she says. FIGURE8 (PDF) system generates figurative language.
  2. The Decisions the Pentagon Wants to Leave to Robots“You cannot have a human operator operating at human speed fighting back at determined cyber tech,” Work said. “You are going to need have a learning machine that does that.” I for one welcome our new robot script kiddie overlords.
  3. Love in the Age of Big DataOver decades, John has observed more than 3,000 couples longitudinally, discovering patterns of argument and subtle behaviors that can predict whether a couple would be happily partnered years later or unhappy or divorced. Turns out, “don’t be a jerk” is good advice for marriages, too. (via Cory Doctorow)
  4. National Security Implications of Virtual Currency (PDF) — Rand research report examining the potential for non-state actor deployment.
Comment
Four short links: 16 October 2015

Four short links: 16 October 2015

Tesla Update, Final Feltron, Mined Medicine, and Dodgy Drone Program

  1. Tesla’s Cars Drive Themselves, Kinda (Wired) — over-the-air software update just made existing cars massively more awesome. Sometimes knowing how they did it doesn’t make it feel any less like magic.
  2. Felton’s Last Report — ten years of quantified self. See Fast Company for more.
  3. Spinal Cord Injury Breakthrough by SoftwareThis wasn’t the result of a new, long-term study, but a meta-analysis of $60 million worth of basic research written off as useless 20 years ago by a team of neuroscientists and statisticians led by the University of California San Francisco and partnering with the software firm Ayasdi, using mathematical and machine learning techniques that hadn’t been invented yet when the trials took place.
  4. The Assassination Complex (The Intercept) — America’s drone program’s weaknesses highlighted in new document dump: Taken together, the secret documents lead to the conclusion that Washington’s 14-year high-value targeting campaign suffers from an overreliance on signals intelligence, an apparently incalculable civilian toll, and — due to a preference for assassination rather than capture — an inability to extract potentially valuable intelligence from terror suspects.
Comment
Four short links: 15 September 2015

Four short links: 15 September 2015

Bot Bucks, Hadoop Database, Futurism Biases, and Tactile Prosthetics

  1. Ashley Madison’s Fembot Con (Gizmodo) — As documents from company e-mails now reveal, 80% of first purchases on Ashley Madison were a result of a man trying to contact a bot, or reading a message from one.
  2. Terrapin — Pinterest’s low-latency NoSQL replacement for HBase. See engineering blog post.
  3. Why Futurism Has a Cultural Blindspot (Nautilus) — As the psychologist George Lowenstein and colleagues have argued, in a phenomenon they termed “projection bias,” people “tend to exaggerate the degree to which their future tastes will resemble their current tastes.”
  4. Mind-Controlled Prosthetic Arm (Quartz) — The robotic arm is connected by wires that link up to the wearer’s motor cortex — the part of the brain that controls muscle movement — and sensory cortex, which identifies tactile sensations when you touch things. The wires from the motor cortex allow the wearer to control the motion of the robot arm, and pressure sensors in the arm that connect back into the sensory cortex give the wearer the sensation that they are touching something.
Comment
Four short links: 31 August 2015

Four short links: 31 August 2015

Linux Security Checklist, Devops for Water Bags, Summarising Reviews, and Exoskeleton with BMI

  1. Linux Workstation Security ChecklistThis is a set of recommendations used by the Linux Foundation for their systems administrators.
  2. Giant Bags of Mostly Water (PDF) — on securing systems that are used by humans. This is what DevOps is about: running Ops like you’re Developing an app, not letting your devs run your ops.
  3. Mining and Summarising Customer Reviews (Paper a Day) — redux of a 2004 paper on sentiment extraction from reviews.
  4. Brain-Machine-Interface for Exoskeleton — no need to worry about the “think of sex every seven seconds” trope, the new system allows users to move forwards, turn left and right, sit and stand simply by staring at one of five flickering LEDs.
Comment
Four short links: 7 April 2015

Four short links: 7 April 2015

JavaScript Numeric Methods, Misunderstood Statistics, Web Speed, and Sentiment Analysis

  1. NumericJS — numerical methods in JavaScript.
  2. P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
  3. Breaking the 1000ms Time to Glass Mobile Barrier (YouTube) —
    See also slides. Stay under 250 ms to feel “fast.” Stay under 1000 ms to keep users’ attention.
  4. Modern Methods for Sentiment AnalysisRecently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.
Comment: 1
Four short links: 11 June 2014

Four short links: 11 June 2014

Right to Mine, Summarising Microblogs, C Sucks for Stats, and Scanning Logfiles

  1. UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
  2. Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
  3. Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
  4. fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
Comment