"open source" entries

Four short links: 11 August 2015

Four short links: 11 August 2015

Real-time Sports Analytics, UI Regression Testing, AI vs. Charity, and Google's Data Pipeline Model

  1. Denver Broncos Testing In-Game Analytics — their newly hired director of analytics working with the coach. With Tanney nearby, Kubiak can receive a quick report on the statistical probabilities of almost any situation. Say that you have fourth-and-3 from the opponent’s 45-yard-line with four minutes to go. Do the large-sample-size percentages make the risk-reward ratio acceptable enough to go for it? Tanney’s analytics can provide insight to aid Kubiak’s decision-making. (via Flowing Data)
  2. Visual Review (GitHub) — Apache-licensed productive and human-friendly workflow for testing and reviewing your Web application’s layout for any regressions.
  3. Effective Altruism / Global AI (Vox) — fear of AI-run-amok (“existential risks”) contaminating a charity movement.
  4. The Dataflow Model (PDF) — Google Research paper presenting a model aimed at ease of use in building practical, massive-scale data processing pipelines.
Four short links: 4 August 2015

Four short links: 4 August 2015

Data-Flow Graphing, Realtime Predictions, Robot Hotel, and Open-Source RE

  1. Data-flow Graphing in Python (Matt Keeter) — not shared because data-flow graphing is sexy new hot topic that’s gonna set the world on fire (though, I bet that’d make Matt’s day), but because there are entire categories of engineering and operations migraines that are caused by not knowing where your data came from or goes to, when, how, and why. Remember Wirth’s “algorithms + data structures = programs”? Data flows seem like a different slice of “programs.” Perhaps “data flow + typos = programs”?
  2. Machine Learning for Sports and Real-time Predictions (Robohub) — podcast interview for your commute. Real time is gold.
  3. Japan’s Robot Hotel is Serious Business (Engadget) — hotel was architected to suit robots: For the porter robots, we designed the hotel to include wide paths.” Two paths slope around the hotel lobby: one inches up to the second floor, while another follows a gentle decline to guide first-floor guests (slowly, but with their baggage) all the way to their room. Makes sense: at Solid, I spoke to a chap working on robots for existing hotels, and there’s an entire engineering challenge in navigating an elevator that you wouldn’t believe.
  4. bokken — GUI to help open source reverse engineering for code.
Four short links: 3 August 2015

Four short links: 3 August 2015

Engineering Management, Smartphone Holograms, Multi-Protocol Server, and Collaborative CS

  1. A Conversation with Michael LoppMy job is to my get myself out of a job. I’m aggressively pushing things I think I could be really good at and should actually maybe own to someone else who’s gonna get a B at it, but they’re gonna get the opportunity to go do that. […] Delegation is helping someone else to learn. I’m all about the humans. If I don’t have happy, productive, growing engineers, I have exactly no job. That investment in the growth, in the happiness, the engineers being productive, that’s like my primary job.
  2. 3D Hologram Projector for Smartphone (BoingBoing) — is in hardware hack stage now, but OKYOUWIN maybe it’s the future.
  3. serve2dserve2 allows you to serve multiple protocols on a single socket. Example handlers include proxy, HTTP, TLS (through which HTTPS is handled), ECHO and DISCARD. More can easily be added, as long as the protocol sends some data that can be recognized. The proxy handler allows you to redirect the connection to external services, such as OpenSSH or Nginx, in case you don’t want or can’t use a Go implementation.
  4. GitXivIn recent years, a highly interesting pattern has emerged: Computer scientists release new research findings on arXiv and just days later, developers release an open-source implementation on GitHub. This pattern is immensely powerful. One could call it collaborative open computer science (COCS). GitXiv is a space to share collaborative open computer science projects. Countless Github and arXiv links are floating around the Web. It’s hard to keep track of these gems. GitXiv attempts to solve this problem by offering a collaboratively curated feed of projects. Each project is conveniently presented as arXiv + Github + Links + Discussion

Big data, interactive access: How Apache Drill makes it easy

True SQL queries? Yes. Parquet and other complex data structures? Yes. Drill 1.1 is full of surprises.


Register for the free webcast “Easy, real-time access to data with Apache Drill,” which will be held Thursday, July 30, 2015, at 10 a.m. PT. This panel discussion will explore the major role SQL-on-Hadoop technologies play in organizations.

Big data techniques are becoming mainstream in an increasing number of businesses, but how do people get self-service, interactive access to their big data? And how do they do this without having to train their SQL-literate employees to be advanced developers?

One solution is to take advantage of the rapidly maturing open source, open community software tool known as Apache Drill. Drill is not the first SQL-on-Hadoop tool. It is, however, a new and very sophisticated highly scalable SQL query engine that has been built from the ground up to be appropriate for use even in production settings. Drill extends query capabilities to a variety of new data sources and formats without the requirement for IT intervention that might be expected from a SQL query engine. In short, Drill allows self-exploration of data by providing flexibility along with performance.

As capabilities in the big data world have progressed, our understanding of what is needed for high-performance, enterprise-grade architectures have also increased. A need for a SQL solution for the Hadoop and NoSQL space was recognized fairly early, and it’s not surprising that to meet an urgent need, some of the first tools approached the problem with SQL-like syntax and made compromises that led to limitations in the data sources and formats they could handle well. Read more…

Comment: 1
Four short links: 24 July 2015

Four short links: 24 July 2015

Artificial Compound Eye, Google Patent Licensing, Monitoring and Alerting, Computer-Aided Inference

  1. A New Artificial Compound Eye (Robohub) — three hexagonal photodetectors arranged in a triangular shape, underneath a single lens. These photodetectors work together and combine perceived changes in structured light (optic flow) to present a 3D image that shows what is moving in the scene, and in which direction the movement is happening.
  2. Google’s Defensive Patent Initiative (TechCrunch) — good article, despite TechCrunch origin. Two-tiered program: give away groups of patents to startups with $500k-$20M in revenue, and sell patents to startups.
  3. Bosunan open-source, MIT licensed, monitoring and alerting system by Stack Exchange.
  4. The Rise of Computer-Aided Explanation (Michael Nielsen) — Hod Lipson of Columbia University. Lipson and his collaborators have developed algorithms that, when given a raw data set describing observations of a mechanical system, will actually work backward to infer the “laws of nature” underlying those data. (Paper)
Four short links: 23 July 2015

Four short links: 23 July 2015

Open Source, State of DevOps, History of Links, and Vote Rings

  1. The Future of Open Source (Allison Randal) — Inexperienced companies can cause a great deal of harm as they blunder around blindly in a collaborative project, throwing resources in ways that ultimately benefit no one, not even themselves. It is in our best interest as a community to actively engage with companies and teach them how to participate effectively, how to succeed at free software and open source. Their success feeds the success of free software and open source, which feeds the self-reinforcing cycle of accelerating software innovation.
  2. Puppet Labs’ State of DevOps Report (PDF) — Westrum’s model gives us the language to define and measure culture. Perhaps most interesting, Westrum’s model also predicts IT performance. This shows that information flow isn’t just essential to safety, it’s also a critical success factor for rapidly building and evolving resilient systems at scale.
  3. Beyond Conversation — tracing the history of the link from Memex to Web.
  4. Detecting Vote Rings in Product Hunt — worth implementing in every system that processes votes. Who are the jerks in a circle?
Comments: 2

Signals from OSCON 2015

From Pluto flybys to open source in the enterprise to engineering the future, here are key highlights from OSCON 2015.

Experts and advocates from across the open source world assembled in Portland, Ore., this week for OSCON 2015. Below you’ll find a handful of keynotes and interviews from the event that we found particularly notable.

Cracking open the IoT

In an interview at OSCON, Alasdair Allan, director at Babilim Light Industries, talked about the data coming out of the New Horizons Pluto flyby, the future of “personal space programs,” and the significance of Bluetooth LE to the Internet of Things:

Now that all the smartphones have Bluetooth LE — or at least the modern ones, there is a very easy way to produce low-power devices (wearables, embedded sensors) that anyone can access with a smartphone. … It’s a real lever to drive the Internet of Things forward, and you’re seeing a lot of the progress in the Internet of Things, a lot of the innovation, is happening — especially in Kickstarter — around BLE devices.

Read more…

Four short links: 21 July 2015

Four short links: 21 July 2015

Web Future, GCE vs Amazon, Scammy eBooks, and Container Clusters

  1. Web Design: The First 100 Years (Maciej Ceglowski) — There’s a William Gibson quote that Tim O’Reilly likes to repeat: “the future is here; it’s just not evenly distributed yet.” O’Reilly takes this to mean that if we surround ourselves with the right people, it can give us a sneak peek at coming attractions. I like to interpret this quote differently, as a call to action. Rather than waiting passively for technology to change the world, let’s see how much we can do with what we already have. Let’s reclaim the Web from technologists who tell us that the future they’ve imagined is inevitable, and that our role in it is as consumers.
  2. Comparing Cassandra Write Performance on Google Compute Engine and AWStl;dr – We achieved better Cassandra performance on GCE vs. Amazon, at close to half the cost. Also interesting for how they built the benchmark.
  3. The Scammy Underground World of Kindle eBooksThe biggest issue here isn’t that scammers are raking in cash from low-quality content; it’s that Amazon is allowing this to happen. Publisher brand value is the reliable expectation that buyers have of the book quality. Amazon’s publishing arm is spending the good brand value built by its distribution arm.
  4. Empire a 12-factor-compatible, Docker-based container cluster built on top of Amazon’s robust EC2 Container Service (ECS), complete with a full-featured command line interface. Open source.
Comment: 1

Big data, small cluster

Finding new ways to shrink disk space for storing partitionable data.


Register for the free webcast, “Extending Cassandra with Doradus OLAP for High Performance Analytics,” which will be held July 29 at 9 a.m. PT.

Engineers at Dell were developing customer apps when they found that the query response times their customers were demanding — something on the order of seconds (in other words, the need to scan millions of objects/second) — required a new type of query engine. This led them on a four-year journey to create Doradus, one of Dell Software Group’s first open-source projects.

Doradus is a server framework that runs on top of Cassandra. To build Doradus, the team borrowed from several well-accepted paradigms. They used traditional OLAP techniques to allow data to be arranged into static, multidimensional cubes. They leveraged the vertical orientation and efficient compression of columnar databases. And, from the NoSQL world, they employed sharding. The result: a storage and query engine called Doradus OLAP that stores data up to 1M objects/second/node, providing nearly real-time data warehousing. This architecture also allows for extreme compression of the data, sometimes producing up to a 99% reduction in space usage.

This extremely dense storage means that data that once took multiple nodes can now be stored on a single node, allowing for fast queries without the expense of a large cluster. Because Doradus is built on top of Cassandra, the option to scale out is still there. This allows for sharding and replication, and also takes advantage of Cassandra’s failover features. Read more…

Four short links: 20 July 2015

Four short links: 20 July 2015

Less Spam, Down on Dropdowns, Questioning Provable Security, and Crafting Packets

  1. Spam Under Half of Email (PDF) — Symantec report: There is good news this month on the email-based front of the threat landscape. According to our metrics, the overall spam rate has dropped to 49.7%. This is the first time this rate has fallen below 50% of email for over a decade. The last time Symantec recorded a similar spam rate was clear back in September of 2003.
  2. Dropdowns Should be the UI of Last Resort (Luke Wroblewski) — Well-designed forms make use of the most appropriate input control for each question they ask. Sometimes that’s a stepper, a radio group, or even a dropdown menu. But because they are hard to navigate, hide options by default, don’t support hierarchies, and only enable selection not editing, dropdowns shouldn’t be the first UI control you reach for. In today’s software designs, they often are. So instead, consider other input controls first and save the dropdown as a last resort.
  3. Another Look at Provable SecurityIn our time, one of the dominant paradigms in cryptographic research goes by the name “provable security.” This is the notion that the best (or, some would say, the only) way to have confidence in the security of a cryptographic protocol is to have a mathematically rigorous theorem that establishes some sort of guarantee of security (defined in a suitable way) under certain conditions and given certain assumptions. The purpose of this website is to encourage the emergence of a more skeptical and less credulous attitude toward this notion and to contribute to a process of critical analysis of the positive and negative features of the “provable security” paradigm.
  4. Pig (github) — a Linux packet crafting tool. You can use Pig to test your IDS/IPS among other stuffs.