- Bradley Manning and the Two Americas (Quinn Norton) — The first America built the Internet, but the second America moved onto it. And they both think they own the place now. The best explanation you’ll find for wtf is going on.
- Staggering Cost of Inventing New Drugs (Forbes) — $5BB to develop a new drug; and subject to an inverse-Moore’s law: A 2012 article in Nature Reviews Drug Discovery says the number of drugs invented per billion dollars of R&D invested has been cut in half every nine years for half a century.
- Who’s Watching You — (Tim Bray) threat modelling. Everyone should know this.
- Data Mining with Weka — learn data mining with the popular open source Weka platform.
ENTRIES TAGGED "Big Data"
The Internet of Americas, Pharma Pricey, Who's Watching, and Data Mining Course
Approximate Queries, Spreadsheet as Database, China Robot Plans, and Open Source Google App Engine
- blinkdb — The current version of BlinkDB supports a slightly constrained set of SQL-style declarative queries and provides approximate results for standard SQL aggregate queries, specifically queries involving COUNT, AVG, SUM and PERCENTILE and is being extended to support any User-Defined Functions (UDFs). Queries involving these operations can be annotated with either an error bound, or a time constraint, based on which the system selects an appropriate sample to operate on.
- China Plans to Become a Leader in Robotics (Quartz) — The ODCCC too funds high risk research initiatives through the Thousand Talent Project (TTP), a three-year term project with possible extension. The goal of the TTP is to recruit thousands of foreign researchers with strong expertise in hardware and software to help develop innovation in China. There are already more than 100 foreign researchers working in China since 2008, the year TTP started.
- AppScale (GitHub) — open source implementation of Google App Engine.
Aural Viz, SPOF ID, Information Asymmetry, and Support IA
- choir.io explained (Alex Dong) — Sound is the perfect medium for wearable computers to talk back to us. Sound has a dozen of properties that we can tune to convey different level of emotions and intrusivenesses. Different sound packs would fit into various contexts.
- Identity Single Point of Failure (Tim Bray) — continuing his excellent series on federated identity. There’s this guy here at Google, Eric Sachs, who’s been doing Identity stuff in the white-hot center of the Internet universe for a lot of years. One of his mantras is “If you’re typing a password into something, unless they have 100+ full-time engineers working on security and abuse and fraud, you should be nervous.” I think he’s right.
- What Does It Really Matter If Companies Are Tracking Us Online? (The Atlantic) — Rather, the failures will come in the form of consumers being systematically charged more than they would have been had less information about that particular consumer. Sometimes, that will mean exploiting people who are not of a particular class, say upcharging men for flowers if a computer recognizes that that he’s looking for flowers the day after his anniversary. A summary of Ryan Calo’s paper. (via Slashdot)
- Life Inside Brewster’s Magnificent Contraption (Jason Scott) — I’ve been really busy. Checking my upload statistics, here’s what I’ve added to the Internet Archive: Over 169,000 individual objects, totaling 245 terabytes. You should subscribe and keep them in business. I did.
As society becomes increasingly data driven, it's critical to remember big data isn't a magical tool for predicting the future.
Better Crypto, NukeViz, Weed Economics, and Ethics of Prediction
- Applied Practical Cryptography — technical but readable article with lots of delicious lines. They’re a little magical, in the same sense that ABS brakes were magical in the 1970s and Cloud applications share metal with strangers, and thus attackers, who will gladly spend $40 to co-host themselves with a target and The conservative approach is again counterintuitive to developers, to whom hardcoding anything is like simony.
- Nukemap — interactive visualization of the fallout damage from a nuclear weapon. Now we can all be the scary 1970s “this is what it would look like if [big town] were nuked” documentaries that I remember growing up with. I love interactives for learning the contours of a problem, and making it real and personal in a way that a static visualization cannot. WIN. See also the creator’s writeup.
- Legalising Weed — Chuck, a dealer who switched from selling weed in California to New York and quadrupled his income, told WNYC, “There’s plenty of weed in New York. There’s just an illusion of scarcity, which is part of what I’m capitalizing on. Because this is a black market business, there’s insufficient information for customers.” Invisible economies are frequently inefficient, disrupted by moving online and made market-sense efficient.
- Can Software That Predicts Crime Pass Constitutional Muster? (NPR) — “I think most people are gonna defer to the black box,” he says. “Which means we need to focus on what’s going into that black box, how accurate it is, and what transparency and accountability measures we have [for] it.”
Rules of the Internet, Bigness of the Data, Wifi ADCs, and Google Flirts with Client-Side Encryption
- Ten Rules of the Internet (Anil Dash) — they’re all candidates for becoming “Dash’s Law”. I like this one the most: When a company or industry is facing changes to its business due to technology, it will argue against the need for change based on the moral importance of its work, rather than trying to understand the social underpinnings.
- Data Storage by Vertical (Quartz) — The US alone is home to 898 exabytes (1 EB = 1 billion gigabytes)—nearly a third of the global total. By contrast, Western Europe has 19% and China has 13%. Legally, much of that data itself is property of the consumers or companies who generate it, and licensed to companies that are responsible for it. And in the US—a digital universe of 898 exabytes (1 EB = 1 billion gigabytes)—companies have some kind of liability or responsibility for 77% of all that data.
- x-OSC — a wireless I/O board that provides just about any software with access to 32 high-performance analogue/digital channels via OSC messages over WiFi. There is no user programmable firmware and no software or drivers to install making x-OSC immediately compatible with any WiFi-enabled platform. All internal settings can be adjusted using any web browser.
- Google Experimenting with Encrypting Google Drive (CNet) — If that’s the case, a government agency serving a search warrant or subpoena on Google would be unable to obtain the unencrypted plain text of customer files. But the government might be able to convince a judge to grant a wiretap order, forcing Google to intercept and divulge the user’s login information the next time the user types it in. Advertising depends on the service provider being able to read your data. Either your Drive’s contents aren’t valuable to Google advertising, or it won’t be a host-resistant encryption process.
Tracking Bitcoin, Gaming Deflation, Bloat-Aware Design, and Mapping Entity Relationships
- Quantitative Analysis of the Full Bitcoin Transaction Graph (PDF) — We analyzed all these large transactions by following in detail the way these sums were accumulated and the way they were dispersed, and realized that almost all these large transactions were descendants of a single transaction which was carried out in November 2010. Finally, we noted that the subgraph which contains these large transactions along with their neighborhood has many strange looking structures which could be an attempt to conceal the existence and relationship between these transactions, but such an attempt can be foiled by following the money trail in a succinctly persistent way. (via Alex Dong)
- Majority of Gamers Today Can’t Finish Level 1 of Super Mario Bros — Nintendo test, and the President of Nintendo said in a talk, We watched the replay videos of how the gamers performed and saw that many did not understand simple concepts like bottomless pits. Around 70 percent died to the first Goomba. Another 50 percent died twice. Many thought the coins were enemies and tried to avoid them. Also, most of them did not use the run button. There were many other depressing things we noted but I can not remember them at the moment. (via Beta Knowledge)
- Bloat-Aware Design for Big Data Applications (PDF) — (1) merging and organizing related small data record objects into few large objects (e.g., byte buffers) instead of representing them explicitly as one-object-per-record, and (2) manipulating data by directly accessing buffers (e.g., at the byte chunk level as opposed to the object level). The central goal of this design paradigm is to bound the number of objects in the application, instead of making it grow proportionally with the cardinality of the input data. (via Ben Lorica)
- Poderopedia (Github) — originally designed for investigative journalists, the open src software allows you to create and manage entity profile pages that include: short bio or summary, sheet of connections, long newsworthy profiles, maps of connections of an entity, documents related to the entity, sources of all the information and news river with external news about the entity. See the announcement and website.
Mobile Numbers, SSL Best Practices, Free and Open No More, and PRISM Budget
- Mobile Email Numbers (Luke Wroblewski) — 79% use their smartphone for reading email, a higher percentage than those who used it for making calls and in Feb ’12, mobile email overtook webmail client use.
- ProperSSL — a series of best practices for establishing SSL connections between clients and servers.
- How We Are Losing the War for the Free and Open Internet (Sue Gardner) — The internet is evolving into a private-sector space that is primarily accountable to corporate shareholders rather than citizens. It’s constantly trying to sell you stuff. It does whatever it wants with your personal information. And as it begins to be regulated or to regulate itself, it often happens in a clumsy and harmful way, hurting the internet’s ability to function for the benefit of the public.
- The Amazingly Low Cost of PRISM — breaks down costs to store and analyse the data gathered from major Internet companies. Total hardware cost per year for 3.75 EB of data storage: €168M
Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management
- How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
- How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
- Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
- mesos — a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)
Ant-Sized Computers, Digital Manufacturing, Dictatorship of Data, and Mobile Shielding
- Ant-Sized Computers (MIT TR) — The KL02 chip, made by Freescale, is shorter on each side than most ants are long and crams in memory, RAM, a processor, and more.
- Some Thoughts on Digital Manufacturing (Nick Pinkston) — Whenever I see someone make a “new” 3D printer that’s just a derivative of the RepRap or MakerBot – I could care less. Only new processes, great interfaces or super-low price points get my attention anymore. FormLabs being a great example of all three – which is why they were a massive hit. If you’re looking for problems: make a cheap laser cutter, CNC mill, or pick-n-place machine. See the Othermill.
- The Dictatorship of Data (MIT TR) — Robert McNamara epitomizes the hyper-rational executive led astray by numbers. (via Wolfgang Blau)
- A Field Test of Mobile Phone Shielding Devices (PDF) — masters thesis comparing various high-tech fabric-type shielding devices. Alas, tin-foil helmets weren’t investigated. (via Udhay Shankar)