- Hack — PHP with types, generics, collections, lambdas. From Facebook.
- Solve Hard Things Early — Build great habits around communication and decision-making when everyone still knows each other well.
- Marginally Useful (Paul Ford) — The last two decades have suggested a post-scarcity economy, where infinite copies of attractive digital things have a price approaching $0. Maybe that was merely a passing moment that we will look back upon with wonder once limited coins enforce scarcity—once the owner of a piece of digital art can look upon it with satisfaction and know with total, cryptographic certainty that because he paid for it, it belongs to him and no one else.
- Go Pipelines and Cancellation — Go’s fascinating me, as an example of a language designed for concurrency and syntactic familiarity.
Powering your app with open source and OpenShift
As a software developer, you are no doubt familiar with the process of abstracting away unnecessary detail in code — imagine if that same principle were applied to application hosting. Say hello to Platform as a Service (PaaS), which enables you to host your applications in the cloud without having to worry about the logistics, leaving you to focus on your code. This post will discuss five ways in which PaaS benefits software developers, using the open source OpenShift PaaS by Red Hat as an example.
No More Tedious Config Tasks
Most of us don’t become developers to do system administration, but when you are running your own infrastructure you end up doing exactly that. A PaaS can take that pain away by handling pesky config and important security updates for you. As a bonus, it makes your sys admin happy too by allowing you to provision your own environment for that killer new app idea you want to tinker with, rather than nagging them for root access on a new VM.
On OpenShift, it goes like this: let’s say you decide you want to test an idea for a Java app, using Tomcat and PostgreSQL (yes, we could argue about the merits of those choices, but work with me here). You can spin that up with a one-line terminal command:
rhc app create myawesomeapp tomcat-7 postgresql-9.2 -s
That -s on the end is telling the platform to make the app auto-scaling, which I will elaborate on later; yes, that’s all it takes. RHC (Red Hat Cloud) is just a Ruby Gem wrapping calls to the OpenShift REST API. You could also use the OpenShift web console or an IDE plugin to do this, or call the API directly if that’s how you roll. The key technologies in play here are just plain old Git and SSH — there’s nothing proprietary.
What is Hack and what does it mean for the future of PHP?
Facebook recently released Hack, a new programming language that looks and acts like PHP. Underneath the hood, however, are a ton of features like static typing, generics, native collections, and many more features for which PHP developers have long been asking. Syntax aside, Hack is not PHP. Hack runs only on Facebook’s HipHop virtual machine (HHVM), a competitor to the traditional PHP Zend Engine.
Why did Facebook build Hack?
Much of Facebook’s internal code is first written with PHP. Facebook can onboard new developers quickly with PHP because the language is notoriously easy to learn and use. Granted, much of Facebook’s PHP code is likely converted to a C derivative before being pushed into production. The point is Facebook depends strongly on the PHP language to attract new talent and increase developer efficiency.
Unfortunately, PHP may not perform as well as possible at Facebook’s scale. PHP is a loosely typed language and type-related errors may not be recognized until runtime. This means Facebook must write more tests early to enforce type checking, or spend more time refactoring runtime errors after launch. To solve this problem, Facebook added strict typing and runtime-enforcement of return types to Hack. Strict typing nullifies the need for a lot of type-related unit tests and encourages developers to catch type-related errors sooner in the development process.
Instantaneous Type Checking
To make the development process and error-catching process even easier, Facebook includes a type-checking server with its HHVM engine. This server runs locally and monitors Hack code as it is written. Developers’ code editors and IDEs can use this type-checking server to immediately report syntax or type-related errors during code development.
20 years of efficiently computing Bacon numbers
The Oracle at Delphi spoke just one language, a cryptic one that priests “compiled” into ancient Greek. The Oracle of Bacon—the website that plays the Six Degrees of Kevin Bacon game for you—has, in its 20-year existence, been written in six languages. Read on for the history and the reasons why.
The original version of the Oracle of Bacon, written by Brett Tjaden in 1995, was all C. The current version, my stewardship of it, and my revision control history only go back to 1999, so that’s where I’ll start. In 1999, I rewrote the Oracle… still entirely in C. Expensive shortest-path and spell-check algorithms? Definitely in C. String processing to build the database? Also C! Presentation layer to parse CGI parameters and generate HTML? C here, too!
Why C? The rationale for the algorithmic component was straightforward: the Oracle of Bacon ran on a slow, shared Unix machine that other people were using to get actual work done. Minimizing CPU and memory resource requirements was the polite thing to do. I needed a compiled language that let me optimize time and space extensively. The loops all counted down, not up, because comparing against zero was fractionally faster on SPARC. It had to be C.
But why were the offline string processing and the CGIs in C? Mostly, I think, to reuse code from the other parts of the code base and from previous projects I’d written when C was the only language I knew.
As the site added features, I got tired of writing code to generate HTML in C. I wrote new CGIs, then rewrote existing CGIs, in Perl. Simply put, writing the CGIs in an interpreted language made me more productive. I had hash tables and vectors built into the language and CGI support a simple “use” statement away. I didn’t have to compile on one server and then deploy to another—I could edit the CGIs right there on the web server. Good deployment practices it wasn’t, but it made me more productive as a programmer, and the performance of the CGIs didn’t matter all that much.
PHP's experiencing a renaissance, with improvements and new standards
The programming language many love to hate is experiencing a renaissance. This is not your parents’ PHP. The new PHP is a more mature language with community standards, a growing affinity for interoperable components, and a passionate movement to improve performance. If you have bypassed PHP for alternative languages, or if you are a PHP veteran unaware of recent changes, you owe it to yourself to give PHP a second look.
PHP 5.5 (the latest stable build as of this writing) has made major progress from earlier versions. Recent PHP releases contain powerful new features and helpful developer tools, such as a built-in web server, generators for simpler iteration, and namespaces. With PHP 5.4, traits were introduced (a la Scala or Perl) to allow code reuse in single inheritance languages, as well as closures, which allow you to code PHP in a functional style. Other important features include the built-in FastCGI process manager and
phpdbg debugger, and a new password hashing API that makes it easy to hash and securely manage passwords in PHP.
An excerpt from the third edition of PHP Cookbook
Editor’s Note: The following excerpt is from the third edition of PHP Cookbook by David Sklar and Adam Trachtenberg. For those already familiar with PHP, PHP Cookbook shows you how to overcome specific problems in your everyday work. Programmers coming from other languages will also find recipes helpful which demonstrate how to accomplish a particular task in PHP, such as sending email or parsing JSON, that you may already know how to do in another language. The recipes are self-contained in a simple problem-solution-discussion format with explanations of how and why the code works the way it does.
This cookbook arms PHP developers with important information for key PHP updates, particularly data manipulation, web services, internationalization, database access, security, and testing. This excerpt, from the chapter focused on Web Fundamentals, demonstrates how to set the HTTP Status Code and how to redirect a user to a different web page than the one they requested. Portions of this chapter have been edited and condensed for the purposes of this excerpt.
Using bycrpt for safer password storage
As anyone whose used a web applications knows, the password is still the go-to form of identification. Sure, there have been lots of improvements in the world of identify over the last few years, but there’s still a constant flow of applications and websites that rely on this tried and true method of protection. Unfortunately, because it represents a single point of failure, it’s actually one of the least secure methods for providing your user is why they say they are.
There’s been a recent resurgence in alternate technologies to help protect your application’s users including a wide variety of two-factor solutions and things like federated identify providers. People are understanding more and more that a simple password isn’t enough. We see stories almost daily of some major company or group being hacked because of either bad passwords or bad password storage practices. Unfortunately, there’s only a limited amount of things you can do for the former (like more effective password policies), but there is a way to help with the second. It’s surprising to find out just how many companies and applications have made poor choices when it comes to how they protect their users’ passwords. There are even some that have made the disastrous choice to store them as plain text (it makes me cringe just thinking about it).
Taking a look at the usual suspects: SQLi, XSS & CSRF
As any PHP developer that’s been around for a while will tell you, there’s a certain kind of stigma that comes with the language. They’ll hear it from their peers using other languages that PHP is “sloppy” or that “it’s just a scripting language, not a real one.” There’s one other that seems to follow the language around as well—that it’s insecure. Sure, PHP’s not without its problems—but any language is going to have its share. Ruby’s had several major vulnerabilities in the press lately and Java has definitely had its own list over its extensive lifetime. People put down PHP for not being secure, but they forget that it’s not the language that makes for insecure code, it’s the developer.
PHP, by its nature is “meant to die” at the end of every request, so the developers don’t have to worry about some things that more persistent languages do. There’s still some common dangers, though, that you as a PHP developer should be aware of. The most common ones come from the well known OWASP Top 10 list. Here’s a quick look at how to help prevent just a few:
OSCON 2013 Speaker Series
NOTE: If you are interested in attending OSCON to check out Dave’s talk or the many other cool sessions, click over to the OSCON website where you can use the discount code OS13PROG to get 20% off your registration fee.
Since 2009, I’ve been leading the optimization team at AppNexus, a real-time advertising exchange. On this exchange, advertisers participate in real-time auctions to bid on individual ad impressions. The highest bid wins the auction, and that advertiser gets to show an ad. This allows advertisers to carefully target where they advertise—maximizing the effectiveness of their advertising budget—and lets websites maximize their ad revenue.
We do these auctions often (~50 billion a day) and fast (<100 milliseconds). Not surprisingly, this creates a lot of technical challenges. One of those challenges is how to automatically maximize the value advertisers get for their marketing budgets—systematically driving consumer engagement through ad placements on particular websites, times of day, etc.—and we call this process “optimization.” The volume of data is large, and the algorithms and strategies aren’t trivial.
In order to win clients and build our business to the scale we have today, it was crucial that we build a world-class optimization system. But when I started, we didn’t have a scalable tech stack to process the terabytes of data flowing through our systems every day, and we didn't have the team to do any of the required data modeling.
So, we needed to hire great people fast. However, there aren’t many veterans in the advertising optimization space, and because of that, we couldn’t afford to narrow our search to only experts in Java or R or Matlab. In order to give us the largest talent pool possible to recruit from, we had to choose a tech stack that is both powerful and accessible to people with diverse experience and backgrounds. So we chose Python.
Python is easy to learn. We found that people coding in R, Matlab, Java, PHP, and even those who have never programmed before could quickly learn and get up to speed with Python. This opened us up to hiring a tremendous pool of talent who we could train in Python once they joined AppNexus. To top it off, there’s a great community for hiring engineers and the PyData community is full of programmers who specialize in modeling and automation.
Additionally, Python has great libraries for data modeling. It offers great analytical tools for analysts and quants and when combined, Pandas, IPython, and Matplotlib give you a lot of the functionality of Matlab or R. This made it easy to hire and onboard our quants and analysts who were familiar with those technologies. Even better, analysts and quants can share their analysis through the browser with IPython.
Now that we had all of these wonderful employees, we needed a way to cut down the time to get them ramped up and pushing code to production.
First, we wanted to get our analysts and quants looking at and modeling data as soon as possible. We didn’t want them worrying about writing database connector code, or figuring out how to turn a cursor into a data frame. To tackle this, we built a project called Link.
Imagine you have a MySQL database. You don’t want to hardcode all of your connection information because you want to have a different config for different users, or for different environments. Link allows you to define your “environment” in a JSON config file, and then reference it in code as if it is a Python object.
Now, with only three lines of code you have a database connection and a data frame straight from your mysql database. This same methodology works for Vertica, Netezza, Postgres, Sqlite, etc. New “wrappers” can be added to accommodate new technologies, allowing team members to focus on modeling the data, not how to connect to all these weird data sources.
In : from link import lnk
In : my_db = lnk.dbs.my_db
In : df = my_db.select('select * from my_table').as_dataframe()
Int64Index: 325 entries, 0 to 324
id 325 non-null values
user_id 323 non-null values
app_id 325 non-null values
name 325 non-null values
body 325 non-null values
created 324 non-null values
By having the flexibility to easily connect to new data sources and APIs, our quants were able to adapt to the evolving architectures around us, and stay focused on modeling data and creating algorithms.
Second, we wanted to minimize the amount of work it took to take an algorithm from research/prototype phase to full production scale. Luckily, with everyone working in Python, our quants, analysts, and engineers are using the same language and data processing libraries. There was no need to re-implement an R script in Java to get it out across the platform.
Software grows patient
The biggest change I’ve seen in the last few years of software development isn’t a new language, a new environment, or magical new algorithms. The biggest change is that programmers in many different arenas, working independently, have come to accept waiting.
Part of the joy of computers, a magic that grew and grew as computers and networks got faster and faster, was the confidence that making a change was immediate. Yes, it took time to get a message over a network or for a CPU to execute calls—but things happened.
The event loop has been around for a long time—computers have always had to wait for us slow humans. Transactions have provided a buffer against the possibility of simultaneous changes to the same data, and certainly slowed things down, but the time that buffering took was generally considered a cost, not a feature. Message queues have existed for a long time, but again, seemed bulletproof but expensive.
Over the last few years, these approaches have become more common and better understood. I suspect that there are two main drivers of this change:
As larger scale projects have become more common, the knowledge needed to use these approaches has become more widely available. The average project may be larger scale as well, but these techniques are appearing even in cases where I wouldn’t have thought them necessary. (Of course, I also like playing with Erlang in tiny single-user environments.)
I saw a great talk last week on IndexedDB. It wasn’t the data storage or the asynchronous API that grabbed my attention, but the conversation about promises and ways to make “wait for it” seem like a normal programming idiom. There are a lot of those conversations happening now, about many environments.
Should we make asynchronous seem normal with syntax sugar? Or should we flag it, call attention to it, and make sure programmers remember that their code is waiting?