"API" entries

Mining One Million Tweets About #Syria

Surprising social media stats

by Matthew Russell | September 11, 2013

I’ve been filtering Twitter’s firehose for tweets about “#Syria” for about the past week in order to accumulate a sizable volume of data about an important current event. As of Friday, I noticed that the tally has surpassed one million tweets, so it seemed to be a good time to apply some techniques from Mining the Social Web and explore the data.

While some of the findings from a preliminary analysis confirm common intuition, others are a bit surprising. The remainder of this post explores the tweets with a cursory analysis addressing the “Who?, What?, Where?, and When?” of what’s in the data.

Read more…

The Myth of the Private API

The Fundamental Interconnectedness of Things

by George Reese | September 6, 2013

A little over a week ago, I wrote about how the authentication model for an unpublished Tesla REST API was architecturally flawed because it failed to take basic precautions against the sharing of credentials with third-parties common to most REST-based services these days. Since its publication, the main criticism of the article centered around the fact that the API is neither a published API nor has it been advertised as being meant for third-party consumption.

The adding of value to devices and services with or without the knowledge/permission of their creators is an integral part of the Internet of Things. These days, people expect an API around their devices. They will discover any APIs and add value to the device/service—even if the task requires a little reverse engineering work. A responsible creator of a device or service in today’s world defined by the Internet of Things must therefore do the following things—always:

Give it a public API
Protect any internal communications so they can’t be reverse engineered
Protect any public communications so that they don’t put end users at risk when they leverage third-party devices and services

Read more…

JavaScript Is Way Too Slow – for What?

There's more to mobile web apps than JavaScript

by Simon St. Laurent | @simonstl | +Simon St. Laurent | July 17, 2013

[contextly_sidebar id=”ffb2c80b279bd33d034a9da1444df42f”]

I keep finding that programmers—even web programmers—frequently think “web application” means “JavaScript application.” Programmers are, of course, used to working with programming languages, and often see application environments from the perspective of the programming language in use.

These blinders derail Drew Crawford’s detailed rant on Why mobile web apps are slow. It turns out that “slow for what?” is a key part of the question, as Crawford reveals near the very end:

JavaScript is too slow for mobile app use in 2013 (e.g., for photo editing, etc.).

Do we need to run Photoshop on our mobile phones using JavaScript underneath? I agree that cramming everything into an interpreted language on a memory-, processing-, and bandwidth-constrained device is a stupid idea—but “photo editing” is an extreme case, to put it mildly. Perhaps critics will move on to video editing when we eventually get to the next round.

Despite that flaw, if the rant had been titled “Why mobile JavaScript apps are slow,” it would have done better. His lists of benchmarks, discussions of processor challenges, and recognition that the easy work in JavaScript optimization is already done, are all tremendously useful. There is little reason to expect JavaScript performance to improve in the future at the rate is has achieved for the last decade.

It may also be true that the browser vendors have optimized their performance as far as they can, at least in the relatively stable fields of HTML parsing and processing, and CSS selectors and formatting. Adding querySelector to the DOM was a massive one-time speed upgrade, letting JavaScript developers drop most of their tree walking in favor of a more declarative approach relying on native code in which vendors had already invested heavily.

So how can we optimize mobile web development?

The first answer is pretty simple: don’t try to use JavaScript for tasks (like photo or video editing) that push it past its limits. As far as JavaScript has come, it wasn’t built for that. Native apps are indeed good for more than building browsers.

The second answer is trickier: it means recognizing that web applications include much more than JavaScript. JavaScript began as a scripting language for gluing together objects written in other languages, and it still excels at that. Build your application from the HTML—the easiest stuff to process!—up. (If you want graphics, SVG might be a better path.) Then layer CSS on top of that. Rely on APIs built into the browser whenever you can, using JavaScript to connect with their much faster processing. HTML5 keeps making that a better and better option.

When I was programming our HTML 4 & 5: The Complete Reference, I marveled regularly at how tasks that seemed huge, like loading a new CSS stylesheet, moved rapidly compared to implementing smaller changes using JavaScript. More broadly, Estelle Weyl’s JavaScript: You Don’t Need a Framework For That (slides) resonates both because it shows how to use smaller chunks of JavaScript and because it explores how to use CSS creatively to avoid using JavaScript at all.

Building great applications using JavaScript, especially great mobile web applications, often means using JavaScript itself as sparingly as possible. That’s always been a feature of JavaScript, not a bug.

Scaling People, Process, and Technology with Python

OSCON 2013 Speaker Series

by Dave Himrod | July 15, 2013

NOTE: If you are interested in attending OSCON to check out Dave’s talk or the many other cool sessions, click over to the OSCON website where you can use the discount code OS13PROG to get 20% off your registration fee.

Since 2009, I’ve been leading the optimization team at AppNexus, a real-time advertising exchange. On this exchange, advertisers participate in real-time auctions to bid on individual ad impressions. The highest bid wins the auction, and that advertiser gets to show an ad. This allows advertisers to carefully target where they advertise—maximizing the effectiveness of their advertising budget—and lets websites maximize their ad revenue.

We do these auctions often (~50 billion a day) and fast (<100 milliseconds). Not surprisingly, this creates a lot of technical challenges. One of those challenges is how to automatically maximize the value advertisers get for their marketing budgets—systematically driving consumer engagement through ad placements on particular websites, times of day, etc.—and we call this process “optimization.” The volume of data is large, and the algorithms and strategies aren’t trivial.

In order to win clients and build our business to the scale we have today, it was crucial that we build a world-class optimization system. But when I started, we didn’t have a scalable tech stack to process the terabytes of data flowing through our systems every day, and we didn't have the team to do any of the required data modeling.

People

So, we needed to hire great people fast. However, there aren’t many veterans in the advertising optimization space, and because of that, we couldn’t afford to narrow our search to only experts in Java or R or Matlab. In order to give us the largest talent pool possible to recruit from, we had to choose a tech stack that is both powerful and accessible to people with diverse experience and backgrounds. So we chose Python.

Python is easy to learn. We found that people coding in R, Matlab, Java, PHP, and even those who have never programmed before could quickly learn and get up to speed with Python. This opened us up to hiring a tremendous pool of talent who we could train in Python once they joined AppNexus. To top it off, there’s a great community for hiring engineers and the PyData community is full of programmers who specialize in modeling and automation.

Additionally, Python has great libraries for data modeling. It offers great analytical tools for analysts and quants and when combined, Pandas, IPython, and Matplotlib give you a lot of the functionality of Matlab or R. This made it easy to hire and onboard our quants and analysts who were familiar with those technologies. Even better, analysts and quants can share their analysis through the browser with IPython.

Process

Now that we had all of these wonderful employees, we needed a way to cut down the time to get them ramped up and pushing code to production.

First, we wanted to get our analysts and quants looking at and modeling data as soon as possible. We didn’t want them worrying about writing database connector code, or figuring out how to turn a cursor into a data frame. To tackle this, we built a project called Link.

Imagine you have a MySQL database. You don’t want to hardcode all of your connection information because you want to have a different config for different users, or for different environments. Link allows you to define your “environment” in a JSON config file, and then reference it in code as if it is a Python object.

 { "dbs":{
  "my_db": {
   "wrapper": "MysqlDB",
   "host": "mysql-master.123fakestreet.net",
   "password": "",
   "user": "",
   "database": ""
  }
 }}

Now, with only three lines of code you have a database connection and a data frame straight from your mysql database. This same methodology works for Vertica, Netezza, Postgres, Sqlite, etc. New “wrappers” can be added to accommodate new technologies, allowing team members to focus on modeling the data, not how to connect to all these weird data sources.

In [1]: from link import lnk
 
In [2]: my_db = lnk.dbs.my_db
 
In [3]: df = my_db.select('select * from my_table').as_dataframe()
 

Int64Index: 325 entries, 0 to 324
Data columns:
id    325 non-null values
user_id   323 non-null values
app_id   325 non-null values
name    325 non-null values
body    325 non-null values
created   324 non-null values

By having the flexibility to easily connect to new data sources and APIs, our quants were able to adapt to the evolving architectures around us, and stay focused on modeling data and creating algorithms.

Second, we wanted to minimize the amount of work it took to take an algorithm from research/prototype phase to full production scale. Luckily, with everyone working in Python, our quants, analysts, and engineers are using the same language and data processing libraries. There was no need to re-implement an R script in Java to get it out across the platform.
Read more…

eZ Publish: A CMS Framework with Open Source in Its DNA

Leading eZ Publish advocates look at what lies ahead for CMS programmers and users

by Meghan Blanchette | June 5, 2013

ez-publish-1

There are a variety of options when it comes to content management. We’ve explored Drupal a bit, and in this email interview I talked to some folks who work with eZ Publish. It is an open source (with commercial options) CMS written in PHP. Brandon Chambers and Greg McAvoy-Jensen talk about how the platform acts as a content management framework, how being open source has affected the project, and what we should expect to see coming up for CMS in general.

Brandon Chambers is a Senior Developer at Granite Horizon, an eZ Publish integrator. He has 14 years of web development experience focused on open source technologies such as PHP, MySQL, Python, Java, Android, HTML, JavaScript, AJAX, CSS and XML.

Greg McAvoy-Jensen is a member of the eZ Publish Community Project Board. He also founded and is the CEO of Granite Horizon, and has been developing with eZ Publish since 2002.

Q: What problems does eZ Publish solve for users?

A: eZ Publish grew up not just as a CMS, but as a content management framework. It sports a flexible and object-oriented content model (an important early decision), and provides developers an MVC framework as a platform for building complex web applications and extending the CMS. Like any CMS it makes content publishing accessible for the non-programmer, and provides an easy editorial interface. eZ Publish does a fine job of separating content from presentation and providing reusability and multi-channel delivery. It targets the enterprise more than smaller organizations, so the software quality remains pegged at high standards, and high degrees of flexibility and extensibility continue to be required.

Q: How you feel being open source has affected the project?

A: Fourteen years on, eZ Systems is still firm that open source is in its DNA. This foundational commitment created a culture of sharing, and it attracts developers who prefer to share their code and to collaborate with others outside their organization for the benefit of their customers. Contributions flow in as both extensions and core code pull requests. The commercial open source model, similar to Red Hat’s, means the vendor takes primary responsibility for code maintenance and development, and derives its profit from support subscriptions, while leaving customizations to its network of certified partners. Because the source is open, organizations evaluating the software can have their developers compare the code of, for example, eZ Publish and Drupal, and make their own determinations. This, in turn, keeps the vendor accountable for the code: eZ engineers program knowing full well that the world can see their work.

Q: What distinguishes eZ Publish from other CMS options?

A: While there may be a thousand or so CMS’s around, analysts typically look at something more like 30 that are important today. eZ Publish fits into that group, most recently by inclusion on Gartner’s Magic Quadrant beginning in 2011. Not all open source CMS’s have a vendor behind them who both provides support and has full control over the code, a level of accountability required in enterprise applications. eZ is a great fit for particularly complex implementations, or situations where there is no assurance that future needs will be simple. And despite the complex customizations developers do with eZ Publish, they rarely interfere with upgrades.

eZ’s engineers recently became dissatisfied with the merely vast degree of flexibility they had built into the MVC framework, so they’ve now moved the whole system on top of the Symfony PHP framework. eZ Publish is now a native Symfony application, the only CMS to utilize Symfony’s full stack. This leverages the great speed and excellent libraries Symfony provides, and makes eZ easier to learn by those who are familiar with Symfony. Some CMS’s require many plug-ins just to get a basic feature set going on a site, but eZ Publish has long included granular security, content versioning, multi-language support, multi-channel/multi-site capability, workflows, and the like as part of the kernel.
Read more…

Four short links: 27 May 2013

Search API, Cyberwar=Cyberbollocks, 4k Magic, and Geoparsing

by Nat Torkington | @gnat | +Nat Torkington | May 27, 2013

techu Search Server — Techu exposes a RESTful API for realtime indexing and searching with the Sphinx full-text search engine. We leverage Redis, Nginx and the Python Django framework to make searching easy to handle & flexible.
In Defence of Digital Freedom — a member of the European Parliament’s piece on the risks to our online freedoms caused by framing computer security into cyberwarfare. Digital freedoms and fundamental rights need to be enforced, and not eroded in the face of vulnerabilities, attacks, and repression. In order to do so, essential and difficult questions on the implementation of the rule of law, historically place-bound by jurisdiction rooted in the nation-state, in the context of a globally connected world, need to be addressed. This is a matter for the EU as a global player, and should involve all of society. (via BoingBoing)
Inside a 4k Demo — what it’s like to write an amazing demo with only 4k of code. (via Nelson Minar)
CLAVIN — open source (Apache2) Java library for document geotagging and geoparsing that employs context-based geographic entity resolution. (via Pete Warden)

From JavaScript to Declarative Markup

Evolving enhancements for web developers

by Simon St. Laurent | @simonstl | +Simon St. Laurent | May 22, 2013

Web architecture separates structured content (markup), presentation (style), and behavior (JavaScript). As recently as a decade ago, many developers worked in all three, but the years since Ajax arrived have brought more specialization. The rise of JavaScript in particular has led to approaches that have JavaScript carry the load. In the past few weeks, I’ve been delighted to see work that suggests a different path forward, one that takes greater advantage of the flexibility markup offers.

Last week, at the Artifact Conference, I was delighted to return to the world of web designers. The crowd was full of people who know very well what JavaScript can add to their site and how they want to include it, but who don’t focus on it. JavaScript is just a tool, often even a tool wielded later in the process after the basic framing of the site is complete.

Read more…

Exploring Hypermedia with Mike Amundsen

The Web Can Teach the Enterprise

by Simon St. Laurent | @simonstl | +Simon St. Laurent | May 16, 2013

The Web’s flexibility has helped it to survive and thrive, pushing well beyond the browser-based universe where it first showed its promise. While I’ve spent most of my time working with the HTML/CSS/JavaScript side, the HTTP side of the original HTML+HTTP combination that built the Web is perhaps even more flexible.

I enjoyed talking with Mike Amundsen, Principal API Architect at Layer 7 Technologies, who has spent much of his recent career encouraging enterprise customers to shift toward web architectures. While REST has emerged over the past decade to eclipse SOAP-based “web services”, Amundsen has eagerly promoted the next step beyond the simple CRUD-based model of early REST work: hypermedia.

Our conversation ranged from the history and foundations of REST through the many ways to integrate that work with existing enterprise practice to a glimpse at what the future might hold for frameworks, design, and architecture.

REST as enterprise architectures principles applied to hypermedia (at 1:57)
Transitioning from RPC-based models to hypermedia, by including additional information in response. (at 3:00)
The value of opinionated message formats and eventual integration into opinionated frameworks. (at 5:51)
Shifting from shared understandings of object models to messages. (at 8:50)
“Enough coupling, but not too much” to allow mixing of technologies. (at 11:15)
Human negotiation, HTTP negotiation, and responsive web design (at 14:20)

Read more…

A Matter of Semantics

Structuring client-server communications with hypermedia messages

by Mike Amundsen | May 16, 2013

Messages on the Web carry three levels of information: Structure Semantics, Protocol Semantics, and Application Semantics. No matter the implementation style, all three of these are needed for any successful communication between client and server. This threesome (S-P-A) forms the essentials of communication over distributed networks.

Most of the time, though, these levels are obscured or muddled at implementation time. For example, both Protocol Semantics (how we create valid network requests) and Application Semantics (domain names like users, customers, orders, etc) are often mixed together in conversation ("You POST new users to this URL") and both of these are usually only defined in human-readable documentation and implemented in the source code of the client application itself. In other words, the protocol-level and application-level semantics are tightly coupled. An easy way to discover this is to see if you can take the same message format and implement your API using a protocol other than HTTP (e.g. WebSockets or FTP). I illustrated this "protocol-agnostic" design pattern back in 2010 ("A RESTful Hypermedia API in Three Easy Steps").

But there is a way to keep these separate from each other and view each of these aspects in their own light. In doing so, you’ll strengthen the quality and value of your message design while increasing flexibility and choices.

Structure Semantics

Structure Semantics provides the set of rules regarding how to create a well-formed message. XML has rather simple structure semantics. JSON's rules for well-formedness are a bit more vague but reachable since JSON.parse(…) turns out to be the ultimate arbiter of such things. Determining well-formedness of other, more complex media types (HTML, Atom, HAL, Collection+JSON) is tougher, but do-able even if external validators are not always available.

Building a successful web implementation that does not contain structure semantics is difficult—and that's a good thing.

Read more…