- Sparkey — Spotify’s open-sourced simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.
- The Truth of Fact, The Truth of Feeling (Ted Chiang) — story about what happens when lifelogs become searchable. Now with Remem, finding the exact moment has become easy, and lifelogs that previously lay all but ignored are now being scrutinized as if they were crime scenes, thickly strewn with evidence for use in domestic squabbles. (via BoingBoing)
- Algorithms Magnifying Misbehaviour (The Guardian) — when the training set embodies biases, the machine will exhibit biases too.
- Lego Robot That Strips DRM Off Ebooks (BoingBoing) — so. damn. cool. If it had been controlled by a C64, Cory would have hit every one of my geek erogenous zones with this find.
ENTRIES TAGGED "search"
Constant KV Store, Google Me, Learned Bias, and DRM-Stripping Lego Robot
Model-Driven Configuration, 1,000 RSS Readers Bloom, JSON Query Language, and Doug Engelbart's Vision
- ansible — Model-driven configuration management, multi-node deployment/orchestration, and remote task execution system. Uses SSH by default, so no special software has to be installed on the nodes you manage. Ansible can be extended in any language.
- The Golden Age of RSS — One of the things I expected least to see in 2013 was that this year would mark the greatest flourishing of RSS reader applications in the decade since it first came to prominence on the web.
- JSONiq: the JSON Query Language — expressive and highly optimizable language to query and update NoSQL stores. It enables developers to leverage the same productive high-level language across a variety of NoSQL products. Implemented in Zorba, an Apache-licensed virtual machine for JSONiq and XQuery queries.
- Bret Victor on Doug Engelbart — If you attempt to make sense of Engelbart’s design by drawing correspondences to our present-day systems, you will miss the point, because our present-day systems do not embody Engelbart’s intent. Engelbart hated our present-day systems. Poetic, articulate, and bang on the money.
Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management
- How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
- How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
- Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
- mesos — a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)
Thread Problems, Better Image Search, Open Standards, and GitHub Maps
- Multithreading is Hard — The compiler and the processor both conspire to defeat your threads by moving your code around! Be warned and wary! You will have to do battle with both. Sample code and explanation of WTF the eieio barrier is (hint: nothing to do with Old McDonald’s server farm). (via Erik Michaels-Ober)
- Improving Photo Search (Google Research) — volume of training images, number of CPU cores, and Freebase entities. (via Alex Dong)
- Is Google Dumping Open Standards for Open Wallets? (Matt Asay) — it’s easier to ship than standardise, to innovate than integrate, but the ux of a citizen in the real world is pants. Like blog posts? Log into Facebook to read your friends! (or Google+) Chat is great, but you’d better have one client per corporation your friends hang out on. Nobody woke up this morning asking for features to make web pages only work on one browser. The user experience of isolationism is ugly.
- GitHub Renders GeoJSON — Under the hood we use Leaflet.js to render the geoJSON data, and overlay it on a custom version of MapBox’s street view baselayer — simplified so that your data can really shine. Best of all, the base map uses OpenStreetMap data, so if you find an area to improve, edit away.
Search API, Cyberwar=Cyberbollocks, 4k Magic, and Geoparsing
- techu Search Server — Techu exposes a RESTful API for realtime indexing and searching with the Sphinx full-text search engine. We leverage Redis, Nginx and the Python Django framework to make searching easy to handle & flexible.
- In Defence of Digital Freedom — a member of the European Parliament’s piece on the risks to our online freedoms caused by framing computer security into cyberwarfare. Digital freedoms and fundamental rights need to be enforced, and not eroded in the face of vulnerabilities, attacks, and repression. In order to do so, essential and difficult questions on the implementation of the rule of law, historically place-bound by jurisdiction rooted in the nation-state, in the context of a globally connected world, need to be addressed. This is a matter for the EU as a global player, and should involve all of society. (via BoingBoing)
- Inside a 4k Demo — what it’s like to write an amazing demo with only 4k of code. (via Nelson Minar)
- CLAVIN — open source (Apache2) Java library for document geotagging and geoparsing that employs context-based geographic entity resolution. (via Pete Warden)
Search Ads Meh, Hacked Website Help, Web Design Sins, and Lazy Correlations
- Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment (PDF) — We ﬁnd that new and infrequent users are positively inﬂuenced by ads but that existing loyal users whose purchasing behavior is not inﬂuenced by paid search account for most of the advertising expenses, resulting in average returns that are negative. We discuss substitution to other channels and implications for advertising decisions in large ﬁrms. eBay-commissioned research, so salt to taste. (via Guardian)
- Google’s Help for Hacked Webmasters — what it says.
- 14 Lousy Web Design Trends Making a Comeback Thanks to HTML 5 — “mystery meat icons” a pet bugbear of mine.
- The Human Microbiome 101 (SlideShare) — SciFoo alum Jonathan Eisen’s talk. Informative, but super-notable for “complexity is astonishing, massive risk for false positive associations”. Remember this the next time your Big Data Scientist (aka kid with R) tells you one surprising variable predicts 66% of anything. I wish I had the audio from this talk!
Drug Interactions from Search History, Web Satire, Visible Peer Review, and Rights-based Copyright
- Pharmacovigilance — Signals from The Crowd (PDF) — in the NY Times’ words: Using automated software tools to examine queries by 6 million Internet users taken from Web search logs in 2010, the researchers looked for searches relating to an antidepressant, paroxetine, and a cholestorol lowering drug, pravastatin. They were able to find evidence that the combination of the two drugs caused high blood sugar. (via New York Times)
- The World Wide Web is Moving to AOL — best satire you’ll read this month.
- Review History for Perceptual elements in Penn & Teller’s “Cups and Balls” magic trick — PeerJ makes peer review history available for the articles it publishes. Not only does this build reputation for peer reviewers who want it, but it is also a wonderful insight into how paranoid science must be to defend against mistakes in data interpretation. (The finished paper is fun, too)
- A New Basis for Copyright — NZ’s most technically-literate judge floats an idea for how copyright might be reimagined in a more useful way for the modern age by considering it in terms of human rights. Perhaps there should be consideration of a new copyright model that recognises content user rights against a backdrop of the right to receive and impart information and a truly balanced approach to information and expression that recognises that ideas expressed are building blocks for new ideas. Underpinning this must be a recognition on the part of content owners that the properties of new technologies dictate our responses, our behaviours, our values and our ways of thinking. These should not be seen as a threat but an opportunity. It cannot be a one-way street with traffic heading only in the direction dictated by content owners.
Design Trends, Researching Online Culture, Choosing Connection, and 3D Printing Creativity
- 13 Design Trends for 2013 — many of these coalesced what I’ve seen in websites recently, but I was particularly intrigued by the observation that search’s growing importance to apps is being reflected in larger searchboxes.
- How Twitter Gets In The Way of Research (Buzzfeed) — tl;dr: our culture increasingly plays outline, but scraping and otherwise getting access to the data stream of online culture sees researchers struggling in the face of data volumes and Twitter et al.’s commercial imperatives.
- The Post-Productive Economy (Kevin Kelly) — The farmers in rural China have chosen cell phones and twitter over toilets and running water. To them, this is not a hypothetical choice at all, but a real one. and they have made their decision in massive numbers. Tens of millions, maybe hundreds of millions, if not billions of people in the rest of Asia, Africa and South America have chosen Option B. You can go to almost any African village to see this. And it is not because they are too poor to afford a toilet. As you can see from these farmers’ homes in Yunnan, they definitely could have at least built an outhouse if they found it valuable. (I know they don’t have a toilet because I’ve stayed in many of their homes.) But instead they found the intangible benefits of connection to be greater than the physical comforts of running water.
- Crayon Creatures — We will bring to life the kid’s artwork by modeling a digital sculpture and turning it into a real object using 3D Printing technology.
Search Fail, Recruiter Data, Ed Web, and Enterprise IT Yuks
- WTF — when keyword matching fails.
- The Best Recruiters, Pt II (Elaine Wherry) — almost all these tips are relevant to the cold-call “hey, you don’t know me but …” email messages you’ll have to send at some point in your life. Read, learn, obey.
- Best Websites for Teaching And Learning — as decided by the American Association of School Librarians. Lots of these I didn’t know existed but can see being used in class, e.g. Gamestar Mechanic which walks kids through the process of creating a game, teaching them how to think about games even as they produce one.
- Enterprise IT Adoption Curve — so very very true.