- 16 Andreessen-Horowitz Investment Areas — I’m struck by how they’re connected: there’s a cluster around cloud development, there are two maybe three on sensors …
- Pattern — a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.
- Code Review — FogCreek’s code review checklist.
- Expectations of Brilliance Underlie Gender Distributions Across Academic Disciplines (Science) — Surveys revealed that some fields are believed to require attributes such as brilliance and genius, whereas other fields are believed to require more empathy or hard work. In fields where people thought that raw talent was required, academic departments had lower percentages of women. (via WaPo)
Understanding the value of the blockchain above and beyond bitcoin.
Editor’s note: Lorne Lantz is a program co-chair for our O’Reilly Radar Summit: Bitcoin & the Blockchain on January 27, 2015, in San Francisco. For more on the program and for registration information, visit the Bitcoin & the Blockchain event website.
I remember the first time I heard about bitcoin. It was June 2012, and I was invited to a bitcoin meetup. The whole time I was sitting there, I thought these were a bunch of computer geeks playing around with nerd money.
At the same time, I felt excited about the possibilities. If what the bitcoin believers were saying was true, it could become something very big. When I took a closer look, I realized why it could be so groundbreaking: decentralization.
Unlike other currencies and payment networks, bitcoin is not controlled by a bank, government, or financial institution. Instead, thousands of computers around the world verify transactions and manage a global decentralized ledger. This innovative technology is called the blockchain, and it provides a unique pathway that allows — for the first time — many computers that don’t trust each other to achieve consensus. In bitcoin’s case, they are achieving consensus on updates to the global ledger. Read more…
A look at the stumbling blocks to blockchain scalability and some high-level technical solutions.
Author note: Vitalik Buterin contributed to this article.
Editor’s note: Kieren James-Lubin is a program co-chair for our O’Reilly Radar Summit: Bitcoin & the Blockchain on January 27, 2015, in San Francisco. For more on the program and for registration information, visit the Bitcoin & the Blockchain event website.
“I have no worries that bitcoin can scale, and the simple reason for that is that I know that IPv4 can’t, and yet I use it every day.”
The issue of bitcoin scalability and the phrase “blockchain scalability” are often seen in technical discussions of the bitcoin protocol. Will the requirements of recording every bitcoin transaction in the blockchain compromise its security (because fewer users will keep a copy of the whole blockchain) or its ability to handle a great number of transactions (because new blocks on which transactions can be recorded are only produced at limited intervals)? In this article, we’ll explore several meanings of “blockchain scalability” and some high-level technical solutions to the issue.
The three main stumbling blocks to blockchain scalability are:
- The tendency toward centralization with a growing blockchain: the larger the blockchain grows, the larger the requirements become for storage, bandwidth, and computational power that must be spent by “full nodes” in the network, leading to a risk of much higher centralization if the blockchain becomes large enough that only a few nodes are able to process a block.
- The bitcoin-specific issue that the blockchain has a built-in hard limit of 1 megabyte per block (about 10 minutes), and removing this limit requires a “hard fork” (ie. backward-incompatible change) to the bitcoin protocol.
- The high processing fees currently paid for bitcoin transactions, and the potential for those fees to increase as the network grows. We won’t discuss this too much, but see here for more detail.
We’ll consider these first two issues in detail. Read more…
The O'Reilly Radar Podcast: Tim Gardner on the synthetic biology landscape, lab automation, and the problem of reproducibility.
Editor’s note: this podcast is part of our investigation into synthetic biology and bioengineering. For more on these topics, download a free copy of the new edition of BioCoder, our quarterly publication covering the biological revolution. Free downloads for all past editions are also available.
Tim Gardner, founder of Riffyn, has recently been working with the Synthetic Biology Working Group of the European Commission Scientific Committees to define synthetic biology, assess the risk assessment methodologies, and then describe research areas. I caught up with Gardner for this Radar Podcast episode to talk about the synthetic biology landscape and issues in research and experimentation that he’s addressing at Riffyn.
Defining synthetic biology
Among the areas of investigation discussed at the EU’s Synthetic Biology Working Group was defining synthetic biology. The official definition reads: “SynBio is the application of science, technology and engineering to facilitate and accelerate the design, manufacture and/or modification of genetic materials in living organisms.” Gardner talked about the significance of the definition:
“The operative part there is the ‘design, manufacture, modification of genetic materials in living organisms.’ Biotechnologies that don’t involve genetic manipulation would not be considered synthetic biology, and more or less anything else that is manipulating genetic materials in living organisms is included. That’s important because it gets rid of this semantic debate of, ‘this is synthetic biology, that’s synthetic biology, this isn’t, that’s not,’ that often crops up when you have, say, a protein engineer talking to someone else who is working on gene circuits, and someone will claim the protein engineer is not a synthetic biologist because they’re not working with parts libraries or modularity or whatnot, and the boundaries between the two are almost indistinguishable from a practical standpoint. We’ve wrapped it all together and said, ‘It basically advances in the capabilities of genetic engineering. That’s what synthetic biology is.'”
We need primitives; pipeline synthesis tools; and most importantly, error analysis and verification.
There are many algorithms with implementations that scale to large data sets (this list includes matrix factorization, SVM, logistic regression, LASSO, and many others). In fact, machine learning experts are fond of pointing out: if you can pose your problem as a simple optimization problem then you’re almost done.
Of course, in practice, most machine learning projects can’t be reduced to simple optimization problems. Data scientists have to manage and maintain complex data projects, and the analytic problems they need to tackle usually involve specialized machine learning pipelines. Decisions at one stage affect things that happen downstream, so interactions between parts of a pipeline are an area of active research.
In his Strata+Hadoop World New York presentation, UC Berkeley Professor Ben Recht described new UC Berkeley AMPLab projects for building and managing large-scale machine learning pipelines. Given AMPLab’s ties to the Spark community, some of the ideas from their projects are starting to appear in Apache Spark. Read more…