"graphs" entries

Four short links: 25 August 2015

Four short links: 25 August 2015

Microservices Anti-Patterns, Reverse Engineering Course, Graph Language, and Automation Research

  1. Seven Microservices Anti-PatternsOne common mistake people made with SOA was misunderstanding how to achieve the reusability of services. Teams mostly focused on technical cohesion rather than functional regarding reusability. For example, several services functioned as a data access layer (ORM) to expose tables as services; they thought it would be highly reusable. This created an artificial physical layer managed by a horizontal team, which caused delivery dependency. Any service created should be highly autonomous – meaning independent of each other.
  2. CSCI 4974 / 6974 Hardware Reverse Engineering — RPI CS course in reverse engineering.
  3. The Gremlin Graph Traversal Language (Slideshare) — preso on a language for navigating graph data structures, which is part of the Apache TinkerPop (“Open Source Graph Computing”) suite.
  4. Why Are There Still So Many Jobs? The History and Future of Workplace Automation (PDF) — paper about the history of technology and labour. The issue is not that middle-class workers are doomed by automation and technology, but instead that human capital investment must be at the heart of any long-term strategy for producing skills that are complemented by rather than substituted for by technological change. Found via Scott Santens’s comprehensive rebuttal.
Four short links: 10 April 2015

Four short links: 10 April 2015

Graph Algorithm, Touchy Robots, Python Bolt-Ons, and Building Data Products

  1. Exact Maximum Clique for Large or Massive Real Graphs — explanation of how BBMCSP works.
  2. Giving Robots and Prostheses the Human Touchthe team, led by mechanical engineer Veronica J. Santos, is constructing a language of touch that both a computer and a human can understand. The researchers are quantifying this with mechanical touch sensors that interact with objects of various shapes, sizes, and textures. Using an array of instrumentation, Santos’ team is able to translate that interaction into data a computer can understand. The data is used to create a formula or algorithm that gives the computer the ability to identify patterns among the items it has in its library of experiences and something it has never felt before. This research will help the team develop artificial haptic intelligence, which is, essentially, giving robots, as well as prostheses, the “human touch.”
  3. boltons — things in Python that should have been builtins.
  4. Everything We Wish We’d Known About Building Data Products (DJ Patil and RusJan Belkin) — Data is super messy, and data cleanup will always be literally 80% of the work. In other words, data is the problem. […] “If you’re not thinking about how to keep your data clean from the very beginning, you’re fucked. I guarantee it.” […] “Every single company I’ve worked at and talked to has the same problem without a single exception so far — poor data quality, especially tracking data,” he says.“Either there’s incomplete data, missing tracking data, duplicative tracking data.” To solve this problem, you must invest a ton of time and energy monitoring data quality. You need to monitor and alert as carefully as you monitor site SLAs. You need to treat data quality bugs as more than a first priority. Don’t be afraid to fail a deploy if you detect data quality issues.

There are many use cases for graph databases and analytics

Business users are becoming more comfortable with graph analytics.

GraphLab graphThe rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connected devices by 2020 — one can imagine applications that depend on data stored in graphs with many more nodes and edges than the ones currently maintained by social media companies.

This means that researchers and companies will need to produce real-time tools and techniques that scale to much larger graphs (measured in terms of nodes & edges). I previously listed tools for tapping into graph data, and I continue to track improvements in accessibility, scalability, and performance. For example, at the just-concluded Spark Summit, it was apparent that GraphX remains a high-priority project within the Spark1 ecosystem.

Read more…

Network Science dashboards

Networks graphs can be used as primary visual objects with conventional charts used to supply detailed views

With Network Science well on its way to being an established academic discipline, we’re beginning to see tools that leverage it. Applications that draw heavily from this discipline make heavy use of visual representations and come with interfaces aimed at business users. For business analysts used to consuming bar and line charts, network visualizations take some getting used. But with enough practice, and for the right set of problems, they are an effective visualization model.

In many domains, networks graphs can be the primary visual objects with conventional charts used to supply detailed views. I recently got a preview of some dashboards built using Financial Network Analytics (FNA). Read more…

Big Data solutions through the combination of tools

Applications get easier to build as packaged combinations of open source tools become available

As a user who tends to mix-and-match many different tools, not having to deal with configuring and assembling a suite of tools is a big win. So I’m really liking the recent trend towards more integrated and packaged solutions. A recent example is the relaunch of Cloudera’s Enterprise Data hub, to include Spark1 and Spark Streaming. Users benefit by gaining automatic access to analytic engines that come with Spark2. Besides simplifying things for data scientists and data engineers, easy access to analytic engines is critical for streamlining the creation of big data applications.

Another recent example is Dendrite3 – an interesting new graph analysis solution from Lab41. It combines Titan (a distributed graph database), GraphLab (for graph analytics), and a front-end that leverages AngularJS, into a Graph exploration and analysis tool for business analysts:

Smiley face

Read more…

Four short links: 5 February 2014

Four short links: 5 February 2014

Graph Drawing, DARPA Open Source, Quantified Vehicle, and IoT Growth

  1. sigma.js — Javascript graph-drawing library (node-edge graphs, not charts).
  2. DARPA Open Catalog — all the open source published by DARPA. Sweet!
  3. Quantified Vehicle Meetup — Boston meetup around intelligent automotive tech including on-board diagnostics, protocols, APIs, analytics, telematics, apps, software and devices.
  4. AT&T See Future In Industrial Internet — partnering with GE, M2M-related customers increased by more than 38% last year. (via Jim Stogdill)

Semi-automatic method for grading a million homework assignments

Organize solutions into clusters and “force multiply” feedback provided by instructors

One of the hardest things about teaching a large class is grading exams and homework assignments. In my teaching days a “large class” was only in the few hundreds (still a challenge for the TAs and instructor). But in the age of MOOCs, classes with a few (hundred) thousand students aren’t unusual.

Researchers at Stanford recently combed through over one million homework submissions from a large MOOC class offered in 2011. Students in the machine-learning course submitted programming code for assignments that consisted of several small programs (the typical submission was about 16 lines of code). While over 120,000 enrolled only about 10,000 students completed all homework assignments (about 25,000 submitted at least one assignment).

The researchers were interested in figuring out ways to ease the burden of grading the large volume of homework submissions. The premise was that by sufficiently organizing the “space of possible solutions”, instructors would provide feedback to a few submissions, and their feedback could then be propagated to the rest.

Read more…

Four short links: 26 June 2013

Four short links: 26 June 2013

Neural Memory Allocation, DoD Synthbio, Sierra Leone Makers, and Complex Humanities Networks

  1. Memory Allocation in Brains (PDF) — The results reviewed here suggest that there are competitive mechanisms that affect memory allocation. For example, new dentate gyrus neurons, amygdala cells with higher excitability, and synapses near previously potentiated synapses seem to have the competitive edge over other cells and synapses and thus affect memory allocation with time scales of weeks, hours, and minutes. Are all memory allocation mechanisms competitive, or are there mechanisms of memory allocation that do not involve competition? Even though it is difficult to resolve this question at the current time, it is important to note that most mechanisms of memory allocation in computers do not involve competition. Does the dissector use a slab allocator? Tip your waiter, try the veal.
  2. Living Foundries (DARPA) — one motivating, widespread and currently intractable problem is that of corrosion/materials degradation. The DoD must operate in all environments, including some of the most corrosively aggressive on Earth, and do so with increasingly complex heterogeneous materials systems. This multifaceted and ubiquitous problem costs the DoD approximately $23 Billion per year. The ability to truly program and engineer biology, would enable the capability to design and engineer systems to rapidly and dynamically prevent, seek out, identify and repair corrosion/materials degradation. (via Motley Fool)
  3. Innovate Salone — finalists from a Sierra Leone maker/innovation contest. Part of David Sengeh‘s excellent work.
  4. Arts, Humanities, and Complex Networks — ebook series, conferences, talks, on network analysis in the humanities. Everything from Protestant letter networks in the reign of Mary, to the repertory of 16th century polyphony, to a data-driven update to Alfred Barr’s diagram of cubism and abstract art (original here).

Maps not lists: network graphs for data exploration

Preview of upcoming Strata session on data exploration

Amy Heineike is Director of Mathematics for Quid Inc, where she has been since its inception, prototyping and launching the company’s technology for analyzing document sets. Below is the teaser for her upcoming talk at Strata Santa Clara.

I recently discovered that my favorite map is online. It used to hang on my housemate’s wall in our little house in London back in 2005. At the time I was working to understand how London was evolving and changing, and how different policy or infrastructure changes (a new tube line, land use policy changes) would impact that.

The map was originally published as a center-page pull out from the Guardian, showing the ethnic groups that dominate different neighborhoods across the city. The legend was as long as the image, and the small print labels necessitated standing up close, peering and reading, tracing your finger to discover the Congolese on the West Green Road, our neighbors the Portuguese on the Stockwell Road, or the Tamils in Chessington in the distant south west.

Read more…

Four short links: 4 July 2012

Four short links: 4 July 2012

Inside Anonymous, Kanban Board, Extending Objective C, and Football Graphs

  1. How Anonymous Works (Wired) — Quinn Norton explains how the decentralized Anonymous operates, and how the transition to political activism happened. Required reading to understand post-state post-structure organisations, and to make sense of this chaotic unpredictable entity.
  2. Kanban For 1 — very nice progress board for tasks, for the lifehackers who want to apply agile software tools to the rest of their life.
  3. libextobj (GitHub) — library of extensions to Objective C to support patterns from other languages. (via Ian Kallen)
  4. Graph Theory to Understood Football (Tech Review) — players are nodes, passes build edges, and you can see strengths and strategies of teams in the resulting graphs.