"functional programming" entries
Moving beyond traditional tools makes data analysis faster and more powerful
Garrett Grolemund is an O’Reilly author and teaches classes on data analysis for R Studios.
We sat down to discuss why data scientists, statisticians, and programmers alike can use the R language to make data analysis easier and more powerful.
Key points from the full video (below) interview include:
- R is a free, open-source language that has its roots in S-PLUS [Discussed at the 0:27 mark]
- What does it mean for R to be a programming language versus just a data analysis tool? [Discussed at the 1:00 mark]
- R comes with many useful data analysis methods already implemented, so you don’t have to start from scratch. [Discussed at the 4:23 mark]
- R is a mix of functional and object-oriented programming that is optimal for handling data structures that data analysts expect (e.g. vectors) [Discussed at the 6:08 mark]
- A discussion of using R in conjunction with other languages like Python, along with packages that help with this [Discussed at the 7:30 mark]
- Getting started using R isn’t really any harder than using a calculator [Discussed at the 9:28 mark]
You can view the entire interview in the following video.
Steve Vinoski on when to make the leap to functional programming.
Functional programming has a long and distinguished heritage of great work — that was only used by a small group of programmers. In a world dominated by individual computers running single processors, the extra cost of thinking functionally limited its appeal. Lately, as more projects require distributed systems that must always be available, functional programming approaches suddenly look a lot more appealing.
Steve Vinoski, an architect at Basho Technologies, has been working with distributed systems and complex projects for a long time, first as a tentative explorer and then leaping across to Erlang when it seemed right. Seventeen years as a columnist on C, C++, and functional languages have given him a unique viewpoint on how developers and companies are deciding whether and how to take the plunge.
Highlights from our recent interview include:
System administrators try to maintain reliability and other virtues while adopting cost-cutting innovations
I came to LISA, the classic USENIX conference, to find out this year who was using such advanced techniques as cloud computing, continuous integration, non-relational databases, and IPv6. I found lots of evidence of those technologies in action, but also had the bracing experience of getting stuck in a talk with dozens of Solaris fans.
Such is the confluence of old and new at LISA. I also heard of the continued relevance of magnetic tape–its storage costs are orders of magnitude below those of disks–and of new developements on NFS. Think of NFS as a protocol, not a filesystem: it can now connect many different filesystems, including the favorites of modern distributed system users.
LISA, and the USENIX organization that valiantly unveils it each year, are communities at least as resilient as the systems that their adherents spend their lives humming. Familiar speakers return each year. Members crowd a conference room in the evening to pepper the staff with questions about organizational issues. Attendees exchange their t-shirts for tuxes to attend a three-hour reception aboard a boat on the San Diego harbor, which this time was experiencing unseasonably brisk weather. (Full disclosure: I skipped the reception and wrote this article instead.) Let no one claim that computer administrators are anti-social.
Again in the spirit of full disclosure, let me admit that I perform several key operations on a Solaris system. When it goes away (which someday it will), I’ll have to alter some workflows.
The continued resilience of LISA
Conferences, like books, have a hard go of it in the age of instant online information. I wasn’t around in the days when people would attend conferences to exchange magnetic tapes with their free software, but I remember the days when companies would plan their releases to occur on the first day of a conference and would make major announcements there. The tradition of using conferences to propel technical innovation is not dead; for instance, OpenStack was announced at an O’Reilly Open Source convention.
But as pointed out by Thomas Limoncelli, an O’Reilly author (Time Management for System Administrators) and a very popular LISA speaker, the Internet has altered the equation for product announcements in two profound ways. First of all, companies and open source projects can achieve notoriety in other ways without leveraging conferences. Second, and more subtly, the philosophy of “release early, release often” launches new features multiple times a year and reduces the impact of major versions. The conferences need a different justification.
Limoncelli says that LISA has survived by getting known as the place you can get training that you can get nowhere else. “You can learn about a tool from the person who created the tool,” he says. Indeed, at the BOFs it was impressive to hear the creator of a major open source tool reveal his plans for a major overhaul that would permit plugin modules. It was sobering though to hear him complain about a lack of funds to do the job, and discuss with the audience some options for getting financial support.
LISA is not only a conference for the recognized stars of computing, but a place to show off students who can create a complete user administration interface in their spare time, or design a generalized extension of common Unix tools (grep, diff, and so forth) that work on structured blocks of text instead of individual lines.
Another long-time attendee told me that companies don’t expect anyone here to whip out a checkbook in the exhibition hall, but they still come. They have a valuable chance at LISA to talk to people who don’t have direct purchasing authority but possess the technical expertise to explain to their bosses the importance of new products. LISA is also a place where people can delve as deep as the please into technical discussions of products.
I noticed good attendance at vendor-sponsored Bird-of-a-Feather sessions, even those lacking beer. For instance, two Ceph staff signed up for a BOF at 10 in the evening, and were surprised to see over 30 attendees. It was in my mind a perfect BOF. The audience talked more than the speakers, and the speakers asked questions as well as delivering answers.
But many BOFs didn’t fit the casual format I used to know. Often, the leader turned up with a full set of slides and took up a full hour going through a list of new features. There were still audience comments, but no more than at a conference session.
One undeniable highlight of LISA was the keynote by Internet pioneer Vint Cerf. After years in Washington, DC, Cerf took visible pleasure in geeking out with people who could understand the technical implications of the movements he likes to track. His talk ranged from the depth of his wine cellar (which he is gradually outfitting with sensors for quality and security) to interplanetary travel.
The early part of his talk danced over general topics that I think were already adequately understood by his audience, such as the value of DNSSEC. But he often raised useful issues for further consideration, such as who will manage the billions of devices that will be attached to the Internet over the next few years. It can be useful to delegate read access and even write access (to change device state) to a third party when the device owner is unavailable. In trying to imagine a model for sets of device, Cerf suggested the familiar Internet concept of an autonomous system, which obviously has scaled well and allowed us to distinguish routers running different protocols.
The smart grid (for electricity) is another concern of Cerf’s. While he acknowledged known issues of security and privacy, he suggested that the biggest problem will be the classic problem of coordinated distributed systems. In an environment where individual homes come and go off the grid, adding energy to it along with removing energy, it will be hard to predict what people need and produce just the right amount at any time. One strategy involves microgrids: letting neighborhoods manage their own energy needs to avoid letting failures cascade through a large geographic area.
Cerf did not omit to warn us of the current stumbling efforts in the UN to institute more governance for the Internet. He acknowledged that abuse of the Internet is a problem, but said the ITU needs an “excuse to continue” as radio, TV, etc. migrate to the Internet and the ITU’s standards see decreasing relevance.
Cerf also touted the Digital Vellum project for the preservation of data and software. He suggested that we need a legal framework that would require software developers to provide enough information for people to continue getting access to their own documents as old formats and software are replaced. “If we don’t do this,” he warned, “our 22nd-century descendants won’t know much about us.”
Talking about OpenFlow and Software Defined Networking, he found its most exciting opportunity is to let us use content to direct network traffic in addition to, or instead of, addresses.
Another fine keynote was delivered by Matt Blaze on a project he and colleagues conducted to assess the security of the P25 mobile systems used everywhere by security forces, including local police and fire departments, soldiers in the field, FBI and CIA staff conducting surveillance, and executive bodyguards. Ironically, there are so many problems with these communication systems that the talk was disappointing.
I should in no way diminish the intelligence and care invested by these researchers from the University of Pennsylvania. It’s just the history of P25 makes security lapses seem inevitable. Because it was old, and was designed to accommodate devices that were even older, it failed to implement basic technologies such as asymmetric encryption that we now take for granted. Furthermore, most of the users of these devices are more concerned with getting messages to their intended destinations (so that personnel can respond to an emergency) than in preventing potential enemies from gaining access. Putting all this together, instead of saying “What impressive research,” we tend to say, “What else would you expect?”
Attendees certainly had their choice of virtualization and cloud solutions at the conference. A very basic introduction to OpenStack was offered, along with another by developers of CloudStack. Although the latter is older and more settled, it is losing the battle of mindshare. One developer explained that CloudStack has a smaller scope than OpenStack, because CloudStack is focused on high-computing environments. However, he claimed, CloudStack works on really huge deployments where he hasn’t seen other successful solutions. Yet another open source virtuallization platform presented was Google’s Ganeti.
I also attended talks and had chats with developers working on the latest generation of data stores: massive distributed file systems like Hadoop’s HDFS, and high-performance tools such as HBase and Impala, for accessing the data it stores. There seems be accordion effect in data stores: developers start with simple flat or key-value structures. Then they find the need over time–depending on their particular applications–for more hierarchy or delimited data, and either make their data stores more heavyweight or jerry-rig the structure through conventions such as defining fields for certain purposes. Finally we’re back at something mimicking the features of a relational database, and someone rebels and starts another bare-bones project.
One such developer told me hoped his project never turns into a behemoth like CORBA or (lamentably) what WS-* specifications seem to have wrought.
CORBA is universally recognized as dead–perhaps stillborn, because I never heard of major systems deployed in production. In fact, I never knew of an implementation that caught up with the constant new layers of complexity thrown on by the standards committee.
In contrast, WS-* specifications teeter on the edge of acceptability, as a number of organizations swear by it.
I pointed out to my colleague that most modern cloud or PC systems are unlikely to suffer from the weight of CORBA or WS-*, because the latter two systems were created for environments without trust. They were meant to tie together organizations with conflicting goals, and were designed by consortia of large vendors jockeying for market share. For both of these reasons, they have to negotiate all sorts of parameters and add many assurances to every communication.
Recently we’ve seen an increase of interest in functional programming. It occurred to me this week that many aspects of functional programming go nicely with virtualization and the cloud. When you write code with no side effects and no global lack of state, you can recover more easily when instances of your servers disappear. It’s fascinating to see how technologies coming from many different places push each other forward–and sometimes hold each other back.
The blurry line between markup and programming.
When XML exploded onto the scene, it ignited visions of magical communications, simplified document storage, and a whole new wave of application capabilities. Reality has proved calmer, with competition from JSON and other formats tackling a wide variety of problems, while the biggest of the big data problems have such volume that adding markup seems likely to create new problems.
However, at the in-progress Balisage conference, it’s clear that markup remains really good at solving a middle category of problems, where its richer structures can shine without creating headaches of volume or complication. In the past, Balisage often focused on hard problems most people didn’t yet have, but this year’s program tackles challenges that more developers are encountering as their projects grow in complexity.
Alex Payne on Scala's upside and combining object-oriented and functional capabilities.
Alex Payne, co-author of the "Programming Scala," talks about the advantages of using Scala.
The benefits of functional languages and functional language techniques.
O'Reilly editors Mike Loukides and Mike Hendrickson discuss the advantages of functional programming languages and how functional language techniques can be deployed with almost any language.