- How Twitter Stores 250M Tweets a Day Using MySQL (High Scalability) — notes from a talk at the MySQL conference on how Twitter built a high-volume MySQL store.
- How The Atlantic Got Profitable With Digital First (Mashable) — Lauf says his team has focused on putting together premium advertising experiences that span print, digital, events and (increasingly) mobile.
- Data Mining Without Prejudice — an attempt to measure fit without pre-favouring one type of curve over another.
- It Is No Longer OK Not To Know How Congress Works (Clay Johnson) — looking for a specific innovation to try and change the way Washington works by the time Congress votes on SOPA is about as foolish as Steve Jobs trying to diet his way out of having pancreatic cancer.
ENTRIES TAGGED "algorithms"
Maximum MySQL, Digital News, Unbiased Mining, and Congressional Clue
Bitcoin Banks, Journo Ethics, Android and iOS, and Clever Algorithms
- Dan Kaminsky on Bitcoin (Slideshare) — short version: banks are an emergent property as it scales.
- Unethical Ventures (All Things D) — astonishing slam on the new venture fund that Michael Arrington (founder of TechCrunch) will be running while still writing for TechCrunch. This could have been a lot cleaner, of course, by Arrington simply resigning from TechCrunch, becoming a VC and perhaps starting a new blog where his agenda is much clearer, from which he could huff and puff away as he does with much entertaining gusto at real and (mostly) imagined slights. There is certainly precedent for VCs blogging, including Fred Wilson, Brad Feld and Ben Horowitz. And, despite my criticisms about ethics, it is clear that Arrington is a talented writer whose unique voice would be even stronger if it was truly seen as separate from what has become a news organization. But because of his obvious need to be the center of attention — requiring the ermine kingmaker mantle and foisting his patented I’m-here-to-tell-it-like-it-is attitude on us all — that appears to be impossible.
- An iOS Developer Takes on Android — a very easy to follow comparison of the two platforms from a developer who worked on both and who is carefully not partisan. I hadn’t realized before what an advantage OpenGL confers to the iOS devices. It’s not just for 3D games any more (he says, catching up with 2008).
- Clever Algorithms — book of 45 nature-inspired algorithms, code in Ruby.
Minecraft Emergent Behaviour, Algorithmic 3D Printing, Automated MapReduce Optimization, and Multi-Device Preview
- Anonymity in Bitcoin — TL;DR: Bitcoin is not inherently anonymous. It may be possible to conduct transactions is such a way so as to obscure your identity, but, in many cases, users and their transactions can be identified. We have performed an analysis of anonymity in the Bitcoin system and published our results in a preprint on arXiv. (via Hacker News)
- 3D Printing + Algorithmic Generation — clever designers use algorithms based on leaf vein generation to create patterns for lamps, which are then 3d-printed. (via Imran Ali)
- Manimal: Relational Optimization for Data-Intensive Programs (PDF) — static code analysis to detect MapReduce program semantics and thereby enable wholly-automatic optimization of MapReduce programs. (via BigData)
- Screenfly — preview your site in different devices’ screen sizes and resolutions. (via Smashing Magazine)
Bogus Analysis x 2, API Classifications, and Expansive Text
- Mathematical Intimidation: Driven by the Data (PDF) — excellent article from Notices of the American Mathematical Society about the flaws in “value-added modelling”, the latest fad whereby data about students’ results in different classes are analysed to identify the effect of each teacher. People recognize that tests are an imperfect measure of educational success, but when sophisticated mathematics is applied, they believe the imperfections go away by some mathematical magic. But this is not magic. What really happens is that the mathematics is used to disguise the problems and intimidate people into ignoring them—a modern, mathematical version of the Emperor’s New Clothes. A critical instance of Hilary Mason’s Clean data > More Data > Fancy Math. (via Audrey Watters)
- Classification of HTTP-based APIs — The classification achieves an explicit differentiation between the various kinds of uses of HTTP and provides a foundation to analyse and describe the system properties induced. (via Brian Mulloy)
- Cancer Clusters (BBC) — straightforward demonstration of how naive analysis of random numbers can yield “patterns”.
- FitText.js — a jQuery plugin for inflating type.
Sentiment analysis gives algorithmic trading an edge
Sorting through thousands of news stories and categorizing information based on mood and tone creates useful data points for financial systems.
Algorithms go awry on Amazon, the future of Hadoop at Yahoo, and the Supreme Court mulls data mining
In this Strata Week: Algorithm pricing on Amazon pushes the price of a biology book to astronomical levels, Yahoo weighs the future of Hadoop, and the Supreme Court hears arguments about a Vermont law restricting the data mining of prescription records.
Email Game, Faster B Trees, RFID+Projectors, and Airport Express Broken
- The Email Game — game mechanics to get you answering email more efficiently. Can’t wait to hear that conversation with corporate IT. “You want us to install what on the Exchange server?” (via Demo Day Wrapup)
- Stratified B-trees and versioning dictionaries — A classic versioned data structure in storage and computer science is the copy-on-write (CoW) B-tree — it underlies many of today’s file systems and databases, including WAFL, ZFS, Btrfs and more. Unfortunately, it doesn’t inherit the B-tree’s optimality properties; it has poor space utilization, cannot offer fast updates, and relies on random IO to scale. Yet, nothing better has been developed since. We describe the `stratified B-tree’, which beats all known semi-external memory versioned B-trees, including the CoW B-tree. In particular, it is the first versioned dictionary to achieve optimal tradeoffs between space, query and update performance. (via Bob Ippolito)
- DisplayCabinet (Ben Bashford) — We embedded a group of inanimate ornamental objects with RFID tags. Totems or avatars that represent either people, products or services. We also added RFID tags to a set of house keys and a wallet. Functional things that you carry with you. This group of objects combine with a set of shelves containing a hidden projector and RFID reader to become DisplayCabinet. (via Chris Heathcote)
- shairport — Aussie pulled the encryption keys from an Airport Express device, so now you can have software pretend to be an Airport Express.
Stephan Spencer on how autonomous intelligence and language processing will transform search.
Stephan Spencer, co-author of "The Art of SEO," says searching the Internet of the future will be like talking to a human being.
A Princeton search algorithm uses language indicators to measure importance.
A search algorithm being developed by Princeton University researchers parses language to determine relevance. Academic application is one possibility, but this type of algorithm could also extend to news recommendations.