ENTRIES TAGGED "NLP"
What You Do, Wordnik Branches, 5 Whys, and Hardware Hackathon
- You’re Saving Time — can you explain what you do, as well as this? Love the clarity of thought, as well as elegance of expression.
- Related Content, by Wordnik — branching out by offering a widget for websites which recommends other content on your site which is related to the current page. I’ve been keen to see what Wordnik do with their text knowledge.
- How to Run a 5 Whys with Humans, Not Robots (Slideshare) — gold Gold GOLD! (via Hacker News)
- Open Computer Project Hackathon — have never heard of a hardware hackathon before, keen to see how it works out. (via Jim Stogdill)
3D Printed Drones, When Pacemakers Attack, N-Gram Updated, and Deanonymizing Datasets
- Home-made 3D-Printed Drones — if only they used computer-vision to sequence DNA, they’d be the perfect storm of O’Reilly memes :-)
- Hacking Pacemakers For Death — IOActive researcher Barnaby Jack has reverse-engineered a pacemaker transmitter to make it possible to deliver deadly electric shocks to pacemakers within 30 feet and rewrite their firmware.
- Google N-Gram Viewer Updated — now with more books, better OCR, parts of speech, and complex queries. e.g., the declining ratio of sex to drugs. Awesome work by Friend of O’Reilly, Jon Orwant.
- Deanonymizing Mobility Traces: Using Social Networks as a Side-Channel — a set of location traces can be deanonymized given an easily obtained social network graph. [...] Our experiments [on standard datasets] show that 80% of users are identiﬁed precisely, while only 8% are identiﬁed incorrectly, with the remainder mapped to a small set of users. (via Network World)
Search Fail, Recruiter Data, Ed Web, and Enterprise IT Yuks
- WTF — when keyword matching fails.
- The Best Recruiters, Pt II (Elaine Wherry) — almost all these tips are relevant to the cold-call “hey, you don’t know me but …” email messages you’ll have to send at some point in your life. Read, learn, obey.
- Best Websites for Teaching And Learning — as decided by the American Association of School Librarians. Lots of these I didn’t know existed but can see being used in class, e.g. Gamestar Mechanic which walks kids through the process of creating a game, teaching them how to think about games even as they produce one.
- Enterprise IT Adoption Curve — so very very true.
Mobile Money, Quantified Server, Mobile Chatbot, and YouTube's Content Detection
- Mobile Numbers (Luke Wroblewski) — eBay’s mobile shoppers and mobile payers are 3 to 4 times more valuable than Web only [...] Yelp runs ads on the mobile web, and those ads see a higher clickthrough rate than their desktop counterparts.
- Data-Driven Restaurants (Washingtonian) — Did Elizabeth bring your Pinot Gris within three minutes of the time you ordered it? Were your appetizers delivered within seven minutes, entrées within ten, desserts within seven? Were these plates described at the table before they were set in front of you? Were napkins refolded when you went to the restroom? Was non-bottled water referred to as “ice water” (correct) or “water” (incorrect)? (via Daniel Bachhuber)
- Content Detection Fail (Ars Technica) — five other media organizations (mostly television stations, including some from overseas) had claimed the content of his video through YouTube’s Content ID system. That video? A Google+ hangout where he played NASA videos of the Mars landing. Shonky rights verification is a problem, as Google pays ad royalties to those who claim the rights–creating incentives to lie. And as Google doesn’t pay any royalties while material is disputed and the dispute is unresolved, it’s not really in Google’s interest to make this work either. (via Andy Baio)
Caching Pages, Node NLP, Digital Native are Clueless, and Wal-Mart Loves Your Calendar
- Cache Them If You Can (Steve Souders) — the percentage of resources that are cacheable has increased 4% during the past year. Over that same time the number of requests per page has increased 12% and total transfer size has increased 24%.
- How Millennials Search — Statistically significant findings suggest that millennial generation Web searchers proceed erratically through an information search process, make only a limited attempt to evaluate the quality or validity of information gathered, and may perform some level of ‘backfilling’ or adding sources to a research project before final submission of the work. Never let old people tell you that “digital natives” actually know what they’re doing.
- Walmart Buys A Facebook App for Calendar Access (Ars Technica) — The Social Calendar app and its file of 110 million birthdays and other events, acquired from Newput Corp., will give Walmart the ability to expand its efforts to dig deeper into the lives of customers. Interesting to think that by buying a well-loved app, a company could get access to your Facebook details whether you Like them or not.
- Backup Bouncer — software to test how effective your backup tools are: you copy files to a test area by whatever means you like, then run this tool to see whether permissions, flags, owners, contents, timestamps, etc. are preserved. (via Joshua Schachter)
- reVerb — open source (GPLv3) toolkit for learning triples from text. See the paper for more details.
- Think Stats — an introduction to statistics for Python programmers. (via Edd Dumbill)
- Bolefloor — they build curvy wooden floors. Instead of straightening naturally curvy wood (which is wasteful), they use CV and CAD/CAM to figure the smallest cuts to slot strips of wood together. It’s gorgeous, green, and geeky. (via BoingBoing)
- Extracting Article Text from HTML Documents — everyone’s doing it, now you know how. It’s the theory behind the lovingly hand-crafted magic of readability. (via Hacker News)
100 Trends, Mobile to Web, Geometry Fun, and C# NLP Tools
- 100 Things to Watch in 2011 — people who consider tech trends without considering social trends are betting on the atom bomb without considering the Summer of Love. (via Fred Wilson)
- Mobile Economics will Trend Towards Web Economics (Fred Wilson) — A central issue with the Internet, no matter what device and presentation layer you use to access it, is that there is an unlimited amount of content available. Evan Williams calls it “a web of infinite information” in this chat with Om Malik. What is valuable is filtering and curation. Restricting access to content doesn’t work. Someone else’s content will get filtered and curated instead of yours. Scarcity is not a viable business model on the Internet.
- Magic Tile — geometric and topological analogues of Rubik’s Cube. Mindblowing fun with math.
- SharpNLP — open source C# NLP tools.
Stemming Demo, Mapping Service, Value of Data, and The Magic of the Valley
- Demo of Stemming Algorithms — type in text and see what it looks like when stemmed with different algorithms provided by NLTK. (via zelandiya on Twitter)
- Crowdmap — hosted Ushahidi. (via dvansickle on Twitter)
- Opinions vs Data — talks about the usability of a new gmail UI element, but notable for this quote from Jakob Nielsen: In my two examples, the probability of making the right design decision was vastly improved when given the tiniest amount of empirical data. (via mcannonbrookes on Twitter)
- The Next Silicon Valley — long and detailed list of the many forces contributing to Silicon Valley’s success as tech hub, arguing that the valley’s position is path-dependent and can simply be grown ab initio in some aspiring nation’s co-prosperity zone of policy whim. (via imran and timoreilly on Twitter)
Rational Smoking, Latency Poor, NLP Cites, Security Podcast
- Smoking and Ill Health: Does Lay Epidemiology Explain the Failure of Smoking Cessation Programs Among Deprived Populations? — Here we pose the question of whether the poorer life chances of those who continue to smoke in effect constitute a rational disincentive to their avoidance or cessation of smoking. (via bengoldacre on Twitter)
- Scaling the New Bar for Latency in Financial Networks — Since the first trade to the market gets the best price, the delivery of a buy or sell order must be as fast as possible. Just a little more than a year ago, firms were concentrating on removing milliseconds from their network; today, a mere 250 nanoseconds make a difference. (via economicsnz on Twitter)
- Cataloging Bibliographic Data with Natural Language and RDF (OKFN) — In the grand tradition of W3C IRC bots, I’ve started some speculative work on a robot that tries to understand natural language descriptions of works and their authors and generates RDF. It is written in Python and uses ORDF, the NLTK and FuXi.
- Eurotrash Security — European infosec podcast. Latest episode features Ivan Ristic on SSL. (via ivanristic on Twitter)