- Exact Maximum Clique for Large or Massive Real Graphs — explanation of how BBMCSP works.
- Giving Robots and Prostheses the Human Touch — the team, led by mechanical engineer Veronica J. Santos, is constructing a language of touch that both a computer and a human can understand. The researchers are quantifying this with mechanical touch sensors that interact with objects of various shapes, sizes, and textures. Using an array of instrumentation, Santos’ team is able to translate that interaction into data a computer can understand. The data is used to create a formula or algorithm that gives the computer the ability to identify patterns among the items it has in its library of experiences and something it has never felt before. This research will help the team develop artificial haptic intelligence, which is, essentially, giving robots, as well as prostheses, the “human touch.”
- boltons — things in Python that should have been builtins.
- Everything We Wish We’d Known About Building Data Products (DJ Patil and RusJan Belkin) — Data is super messy, and data cleanup will always be literally 80% of the work. In other words, data is the problem. […] “If you’re not thinking about how to keep your data clean from the very beginning, you’re fucked. I guarantee it.” […] “Every single company I’ve worked at and talked to has the same problem without a single exception so far — poor data quality, especially tracking data,” he says.“Either there’s incomplete data, missing tracking data, duplicative tracking data.” To solve this problem, you must invest a ton of time and energy monitoring data quality. You need to monitor and alert as carefully as you monitor site SLAs. You need to treat data quality bugs as more than a first priority. Don’t be afraid to fail a deploy if you detect data quality issues.
Business users are becoming more comfortable with graph analytics.
The rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connected devices by 2020 — one can imagine applications that depend on data stored in graphs with many more nodes and edges than the ones currently maintained by social media companies.
This means that researchers and companies will need to produce real-time tools and techniques that scale to much larger graphs (measured in terms of nodes & edges). I previously listed tools for tapping into graph data, and I continue to track improvements in accessibility, scalability, and performance. For example, at the just-concluded Spark Summit, it was apparent that GraphX remains a high-priority project within the Spark1 ecosystem.
Networks graphs can be used as primary visual objects with conventional charts used to supply detailed views
With Network Science well on its way to being an established academic discipline, we’re beginning to see tools that leverage it. Applications that draw heavily from this discipline make heavy use of visual representations and come with interfaces aimed at business users. For business analysts used to consuming bar and line charts, network visualizations take some getting used. But with enough practice, and for the right set of problems, they are an effective visualization model.
In many domains, networks graphs can be the primary visual objects with conventional charts used to supply detailed views. I recently got a preview of some dashboards built using Financial Network Analytics (FNA). Read more…
Applications get easier to build as packaged combinations of open source tools become available
As a user who tends to mix-and-match many different tools, not having to deal with configuring and assembling a suite of tools is a big win. So I’m really liking the recent trend towards more integrated and packaged solutions. A recent example is the relaunch of Cloudera’s Enterprise Data hub, to include Spark1 and Spark Streaming. Users benefit by gaining automatic access to analytic engines that come with Spark2. Besides simplifying things for data scientists and data engineers, easy access to analytic engines is critical for streamlining the creation of big data applications.
Another recent example is Dendrite3 – an interesting new graph analysis solution from Lab41. It combines Titan (a distributed graph database), GraphLab (for graph analytics), and a front-end that leverages AngularJS, into a Graph exploration and analysis tool for business analysts:
Organize solutions into clusters and “force multiply” feedback provided by instructors
One of the hardest things about teaching a large class is grading exams and homework assignments. In my teaching days a “large class” was only in the few hundreds (still a challenge for the TAs and instructor). But in the age of MOOCs, classes with a few (hundred) thousand students aren’t unusual.
Researchers at Stanford recently combed through over one million homework submissions from a large MOOC class offered in 2011. Students in the machine-learning course submitted programming code for assignments that consisted of several small programs (the typical submission was about 16 lines of code). While over 120,000 enrolled only about 10,000 students completed all homework assignments (about 25,000 submitted at least one assignment).
The researchers were interested in figuring out ways to ease the burden of grading the large volume of homework submissions. The premise was that by sufficiently organizing the “space of possible solutions”, instructors would provide feedback to a few submissions, and their feedback could then be propagated to the rest.
Neural Memory Allocation, DoD Synthbio, Sierra Leone Makers, and Complex Humanities Networks
- Memory Allocation in Brains (PDF) — The results reviewed here suggest that there are competitive mechanisms that affect memory allocation. For example, new dentate gyrus neurons, amygdala cells with higher excitability, and synapses near previously potentiated synapses seem to have the competitive edge over other cells and synapses and thus affect memory allocation with time scales of weeks, hours, and minutes. Are all memory allocation mechanisms competitive, or are there mechanisms of memory allocation that do not involve competition? Even though it is difficult to resolve this question at the current time, it is important to note that most mechanisms of memory allocation in computers do not involve competition. Does the dissector use a slab allocator? Tip your waiter, try the veal.
- Living Foundries (DARPA) — one motivating, widespread and currently intractable problem is that of corrosion/materials degradation. The DoD must operate in all environments, including some of the most corrosively aggressive on Earth, and do so with increasingly complex heterogeneous materials systems. This multifaceted and ubiquitous problem costs the DoD approximately $23 Billion per year. The ability to truly program and engineer biology, would enable the capability to design and engineer systems to rapidly and dynamically prevent, seek out, identify and repair corrosion/materials degradation. (via Motley Fool)
- Innovate Salone — finalists from a Sierra Leone maker/innovation contest. Part of David Sengeh‘s excellent work.
- Arts, Humanities, and Complex Networks — ebook series, conferences, talks, on network analysis in the humanities. Everything from Protestant letter networks in the reign of Mary, to the repertory of 16th century polyphony, to a data-driven update to Alfred Barr’s diagram of cubism and abstract art (original here).
Preview of upcoming Strata session on data exploration
Amy Heineike is Director of Mathematics for Quid Inc, where she has been since its inception, prototyping and launching the company’s technology for analyzing document sets. Below is the teaser for her upcoming talk at Strata Santa Clara.
I recently discovered that my favorite map is online. It used to hang on my housemate’s wall in our little house in London back in 2005. At the time I was working to understand how London was evolving and changing, and how different policy or infrastructure changes (a new tube line, land use policy changes) would impact that.
The map was originally published as a center-page pull out from the Guardian, showing the ethnic groups that dominate different neighborhoods across the city. The legend was as long as the image, and the small print labels necessitated standing up close, peering and reading, tracing your finger to discover the Congolese on the West Green Road, our neighbors the Portuguese on the Stockwell Road, or the Tamils in Chessington in the distant south west.
Inside Anonymous, Kanban Board, Extending Objective C, and Football Graphs
- How Anonymous Works (Wired) — Quinn Norton explains how the decentralized Anonymous operates, and how the transition to political activism happened. Required reading to understand post-state post-structure organisations, and to make sense of this chaotic unpredictable entity.
- Kanban For 1 — very nice progress board for tasks, for the lifehackers who want to apply agile software tools to the rest of their life.
- libextobj (GitHub) — library of extensions to Objective C to support patterns from other languages. (via Ian Kallen)
- Graph Theory to Understood Football (Tech Review) — players are nodes, passes build edges, and you can see strengths and strategies of teams in the resulting graphs.
On Anonymous, Graph Database, Leap Second, and Debugging Creativity
- In Flawed, Epic Anonymous Book, the Abyss Gazes Back (Wired) — Quinn Norton’s review of a book about Anonymous is an excellent introduction to Anonymous. Anonymous made us, its mediafags, masters of hedging language. The bombastic claims and hyperbolic declarations must be reported from their mouths, not from our publications. And yet still we make mistakes and publish lies and assumptions that slip through. There is some of this in all of journalism, but in a world where nothing is true and everything is permitted, it’s a constant existential slog. It’s why there’s not many of us on this beat.
- Titan (GitHub) — Apache2-licensed distributed graph database optimized for storing and processing large-scale graphs within a multi-machine cluster. Cassandra and HBase backends, implements the Blueprints graph API. (via Hacker News)
- Extra Second This June — we’re getting a leap second this year: there’ll be 2012 June 30, 23h 59m 60s. Calendars are fun.
- On Creativity (Beta Knowledge) — I wanted to create a game where even the developers couldn’t see what was coming. Of course I wasn’t thinking about debugging at this point. The people who did the debugging asked me what was a bug. I could not answer that. — Keita Takahashi, game designer (Katamari Damacy, Noby Noby Boy). Awesome quote.