- Unhappy Truckers and Other Algorithmic Problems — Even the insides of vans are subjected to a kind of routing algorithm; the next time you get a package, look for a three-letter letter code, like “RDL.” That means “rear door left,” and it is so the driver has to take as few steps as possible to locate the package. (via Sam Minnee)
- Fuel3D: A Sub-$1000 3D Scanner (Kickstarter) — a point-and-shoot 3D imaging system that captures extremely high resolution mesh and color information of objects. Fuel3D is the world’s first 3D scanner to combine pre-calibrated stereo cameras with photometric imaging to capture and process files in seconds.
- Corporate Open Source Anti-Patterns (YouTube) — Brian Cantrill’s talk, slides here. (via Daniel Bachhuber)
- Hacking for Humanity) (The Economist) — Getting PhDs and data specialists to donate their skills to charities is the idea behind the event’s organizer, DataKind UK, an offshoot of the American nonprofit group.
ENTRIES TAGGED "algorithms"
Algorithmic Optimisation, 3D Scanners, Corporate Open Source, and Data Dives
Distributed Browser-Based Computation, Streaming Regex, Preventing SQL Injections, and SVM for Faster Deep Learning
- WeevilScout — browser app that turns your browser into a worker for distributed computation tasks. See the poster (PDF). (via Ben Lorica)
- sregex (Github) — A non-backtracking regex engine library for large data streams. See also slide notes from a YAPC::NA talk. (via Ivan Ristic)
- Bobby Tables — a guide to preventing SQL injections. (via Andy Lester)
- Deep Learning Using Support Vector Machines (Arxiv) — we are proposing to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers. Our experiments show that simply replacing softmax with linear SVMs gives significant gains on datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop’s face expression recognition challenge. (via Oliver Grisel)
The importance of data science tools that let organizations easily combine, deploy, and maintain algorithms
Data science often depends on data pipelines, that involve acquiring, transforming, and loading data. (If you’re fortunate most of the data you need is already in usable form.) Data needs to be assembled and wrangled, before it can be visualized and analyzed. Many companies have data engineers (adept at using workflow tools like Azkaban and Oozie), who manage1 pipelines for data scientists and analysts.
A workflow tool for data analysts: Chronos from airbnb
A raw bash scheduler written in Scala, Chronos is flexible, fault-tolerant2, and distributed (it’s built on top of Mesos). What’s most interesting is that it makes the creation and maintenance of complex workflows more accessible: at least within airbnb, it’s heavily used by analysts.
Job orchestration and scheduling tools contain features that data scientists would appreciate. They make it easy for users to express dependencies (start a job upon the completion of another job), and retries (particularly in cloud computing settings, jobs can fail for a variety of reasons). Chronos comes with a web UI designed to let business analysts3 define, execute, and monitor workflows: a zoomable DAG highlights failed jobs and displays stats that can be used to identify bottlenecks. Chronos lets you include asynchronous jobs – a nice feature for data science pipelines that involve long-running calculations. It also lets you easily define repeating jobs over a finite time interval, something that comes in handy for short-lived4 experiments (e.g. A/B tests or multi-armed bandits).
Comparing Algorithms, Programming & Visual Arts, Data Brokers, and Your Brain on Ebooks
- mlcomp — a free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
- Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
- What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
- Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.
Comms 101, RoboTurking, Geek Tourism, and Implementing Papers
- How to Redesign Your App Without Pissing Everybody Off (Anil Dash) — the basic straightforward stuff that gets your users on-side. Anil’s making a career out of being an adult.
- Clockwork Raven (Twitter) — open source project to send data analysis tasks to Mechanical Turkers.
- Updates from the Tour in China (Bunnie Huang) — my dream geek tourism trip: going around Chinese factories and bazaars with MIT geeks.
- How to Implement an Algorithm from a Scientific Paper — I have implemented many complex algorithms from books and scientific publications, and this article sums up what I have learned while searching, reading, coding and debugging. (via Siah)
Invisible Data Economy, Hacked Value, Open Algorithms Textbook, and Mobile Testing
- Beyond Goods and Services: The Unmeasured Rise of the Data-Driven Economy — excellent points about data as neither good nor service, and how data use goes unmeasured by economists and thus doesn’t influence policy. According to statistics from the Bureau of Economic Analysis, real consumption of ‘internet access’ has been falling since the second quarter of 2011. In other words, according to official U.S. government figures, consumer access to the Internet—including mobile—has been a drag on economic growth for the past year and a half. (via Mike Loukides)
- How Crooks Turn Even Crappy Hacked PCs Into Money (Brian Krebs) — show to your corporate IT overlords, or your parents, to explain why you want them to get rid of the Windows XP machines. (via BoingBoing)
- Open Data Structures — an open content textbook (Java and C++ editions; CC-BY licensed) on data structures. (via Hacker News)
- Mobiforge — test what gets sent back to mobile browsers. This site sends the HTTP headers that a mobile browser would. cf yesterday’s Responsivator. (via Ronan Cremin)
News App, Data Wrangler, Responsive Previews, and Accountable Algorithms
- cir.ca — news app for iPhone, which lets you track updates and further news on a given story. (via Andy Baio)
- DataWrangler (Stanford) — an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. From the Stanford Visualization Group.
- Responsivator — see how websites look at different screen sizes.
- Accountable Algorithms (Ed Felten) — When we talk about making an algorithmic public process open, we mean two separate things. First, we want transparency: the public knows what the algorithm is. Second, we want the execution of the algorithm to be accountable: the public can check to make sure that the algorithm was executed correctly in a particular case. Transparency is addressed by traditional open government principles; but accountability is different.
Drone Conflict, 3D Scanning Booths, Bitcoin Consensus, and Moar Coders
- Beware the Drones (Washington Times) — the temptation to send difficult to detect, unmanned aircraft into foreign airspace with perceived impunity means policymakers will naturally incline towards aggressive use of drones and hyperactive interventionism, leading us to a future that is ultimately plagued by more, not less warfare and conflict. This. Also, what I haven’t seen commented on with the Israeli air force shooting down a (presumably Hezbollah) drone: low cost of drones vs high cost of maintaining an air force to intercept, means this is asymmetric unmanned warfare.
- Scanbooth (github) — a collection of software for running a 3D scanning booth. Greg Borenstein said to me, “we need tools to scan and modify before 3D printing can take off.” (via Jeremy Herrman)
- Bitcoin’s Value is Decentralization (Paul Bohm) — Bitcoin isn’t just a currency but an elegant universal solution to the Byzantine Generals’ Problem, one of the core problems of reaching consensus in Distributed Systems. Until recently it was thought to not be practically solvable at all, much less on a global scale. Irrespective of its currency aspects, many experts believe Bitcoin is brilliant in that it technically made possible what was previously thought impossible. (via Mike Loukides)
- Blue Collar Coder (Anil Dash) — I am proud of, and impressed by, Craigslist’s ability to serve hundreds of millions of users with a few dozen employees. But I want the next Craigslist to optimize for providing dozens of jobs in each of the towns it serves, and I want educators in those cities to prepare young people to step into those jobs. Time for a Massively Multiplayer Online Economy, as opposed to today’s fun economic games of Shave The Have-Nots and Race To The Oligarchy.
Putting high-frequency trading into perspective.
Technology is critical to today’s financial markets. It’s also surprisingly controversial. In most industries, increasing technological involvement is progress, not a problem. And yet, people who believe that computers should drive cars suddenly become Luddites when they talk about computers in trading.
There’s widespread public sentiment that technology in finance just screws the “little guy.” Some of that sentiment is due to concern about a few extremely high-profile errors. A lot of it is rooted in generalized mistrust of the entire financial industry. Part of the problem is that media coverage on the issue is depressingly simplistic. Hyperbolic articles about the “rogue robots of Wall Street” insinuate that high-frequency trading (HFT) is evil without saying much else. Very few of those articles explain that HFT is a catchall term that describes a host of different strategies, some of which are extremely beneficial to the public market.
I spent about six years as a trader, using automated systems to make markets and execute arbitrage strategies. From 2004-2011, as our algorithms and technology became more sophisticated, it was increasingly rare for a trader to have to enter a manual order. Even in 2004, “manual” meant instructing an assistant to type the order into a terminal; it was still routed to the exchange by a computer. Automating orders reduced the frequency of human “fat finger” errors. It meant that we could adjust our bids and offers in a stock immediately if the broader market moved, which enabled us to post tighter markets. It allowed us to manage risk more efficiently. More subtly, algorithms also reduced the impact of human biases — especially useful when liquidating a position that had turned out badly. Technology made trading firms like us more profitable, but it also benefited the people on the other sides of those trades. They got tighter spreads and deeper liquidity.
A startup mashes personal and government data with algorithms to provide automated advice.
Given the turmoil in financial markets and uncertainty abroad, good financial advice has never been more valuable. Startup Future Advisor looks to democratize personalized financial advice using the Internet, data and algorithms.