"Big Data" entries

Four short links: 4 November 2014

Four short links: 4 November 2014

3D Shares, Autonomous Golf Carts, Competitive Solar, and Interesting Data Problems

  1. Cooper-Hewitt Shows How to Share 3D Scan Data Right (Makezine) — important as we move to a web of physical models, maps, and designs.
  2. Singapore Tests Autonomous Golfcarts (Robohub) — a reminder that the future may not necessarily look like someone used the clone tool to paint Silicon Valley over the world.
  3. Solar Hits Parity in 10 States, 47 by 2016 (Bloomberg) — The reason solar-power generation will increasingly dominate: it’s a technology, not a fuel. As such, efficiency increases and prices fall as time goes on. The price of Earth’s limited fossil fuels tends to go the other direction.
  4. Facebook’s Top Open Data Problems (Facebook Research) — even if you’re not interested in Facebook’s Very First World Problems, this is full of factoids like Facebook’s social graph store TAO, for example, provides access to tens of petabytes of data, but answers most queries by checking a single page in a single machine. (via Greg Linden)
Comment
Four short links: 3 November 2014

Four short links: 3 November 2014

LittleBits Cloud, Big Data Futures, Predictable Robots, and New OS

  1. LittleBits Adds Functionality (MakeZine) — That next big idea might come from one of the latest bits in the littleBits catalog, the cloudBit. The piece enables wi-fi control of your circuit in various configurations — from the Internet to the bit, from the bit to the internet, or from bit to bit.
  2. Big Data’s Big Ideas (Ben Lorica) — this is a lot of what’s on the O’Reilly radar at the moment. Excellent short summary, with links.
  3. Rodney Brooks and Robotics (Boston Magazine) — [The robot] Baxter’s LCD eyes will look at the spot where it’s about to reach, making its movements, from a human perspective, more predictable. “If you want a machine to be able to interact with people,” Brooks says, “it better not do things that are surprising to people.”
  4. FUZIX — new open source OS from Alan Cox. Runs on Z80s, mostly runs on 6502s, and in theory if it’s got 8 bits and banked RAM you can probably run Fuzix OS on it. (via Alan Cox)
Comment

Big data’s big ideas

From cognitive augmentation to artificial intelligence, here's a look at the major forces shaping the data world.

Big data’s big ideas

Looking back at the evolution of our Strata events, and the data space in general, we marvel at the impressive data applications and tools now being employed by companies in many industries. Data is having an impact on business models and profitability. It’s hard to find a non-trivial application that doesn’t use data in a significant manner. Companies who use data and analytics to drive decision-making continue to outperform their peers.

Up until recently, access to big data tools and techniques required significant expertise. But tools have improved and communities have formed to share best practices. We’re particularly excited about solutions that target new data sets and data types. In an era when the requisite data skill sets cut across traditional disciplines, companies have also started to emphasize the importance of processes, culture, and people. Read more…

Comment: 1
Four short links: 21 October 2014

Four short links: 21 October 2014

Data Delusions, OS Robotics, Insecure Crypto, and Free Icons

  1. The Delusions of Big Data (IEEE) — When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.
  2. ROSCON 2014 — slides and videos of talks from Chicago open source robotics conference.
  3. Making Sure Crypto Stays Insecure (PDF) — Daniel J. Bernstein talk: This talk is actually a thought experiment: how could an attacker manipulate the ecosystem for insecurity?
  4. Material Design Icons — Google’s CC-licensed (attribution, sharealike) collection of sweet, straightforward icons.
Comment

Fast data fuels real-time streaming applications

A new report describes an imminent shift in real-time applications and the data architecture they require.

Fast_data_coverThe era is here: we’re starting to see computers making decisions that people used to make, through a combination of historical and real-time data. These streams of data come together in applications that answer questions like:

  • What news items or ads is this website visitor likely to be interested in?
  • Is current network traffic part of a Distributed Denial of Service attack?
  • Should our banking site offer a visitor a special deal on a mortgage, based on her credit history?
  • What promotion will entice this gamer to stay on our site longer?
  • Is a particular part of the assembly line overheating and need to be shut down?

Such decisions require the real-time collection of data from the particular user or device, along with others in the environment, and often need to be done on a per-person or per-event basis. For instance, leaderboarding (determining who is top candidate among a group of users, based on some criteria) requires a database that tracks all the relevant users. Such a database nowadays often resides in memory. Read more…

Comment

Big data’s move to the cloud

A new survey shows the market is ready for cloud-based big data services.

Migrating_big_data_analytics_cover_lg_2One night when our son was two years old, he abruptly decided that he didn’t like taking baths. As my wife recalls, he struggled mightily against the ritual of bathing for several months until, suddenly and mysteriously, he decided that he liked bathing again. We’re happy to report that he has managed to stay relatively clean ever since.

When I speak with CIOs and other IT leaders about moving big data operations into the cloud, I am reminded of our son’s unexplained loathing of the bathtub.

Nearly everyone associated with IT understands that most IT operations — including big data analytics — must eventually move into the cloud. The traditional on-premises approaches are simply too costly, and CIOs are under crushing pressure to shift budgetary resources to value-added, customer-facing activities.

For most companies, the writing is already on the wall. The cloud offers greater agility and elasticity, and quicker product development cycles — and can reduce costs. When you add up the benefits, it seems inevitable that the bulk of IT operations will move into the cloud. Nevertheless, the foot-dragging and excuse-making continues. Read more…

Comment: 1
Four short links: 25 September 2014

Four short links: 25 September 2014

Elevation Data, Soft Robots, Clean Data, and Security Souk

  1. NGA Releases Hi-Res Elevation Data — 30-meter topographic data for the world.
  2. Soft Roboticsa collection of shared resources to support the design, fabrication, modeling, characterization, and control of soft robotic devices. From Harvard.
  3. OpenGovIn many domains, it’s not so much about “big data” yet as it is about “clean data.”
  4. Mitnick’s Zero-Day Exploit Shop — marketplace connecting “corporate and government” buyers and sellers of zero-day exploits. Claims to vet buyers. Another hidden economy becoming public.
Comment
Four short links: 15 September 2014

Four short links: 15 September 2014

Weird Machines, Libraries May Scan, Causal Effects, and Crappy Dashboards

  1. The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code into metadata which is then, through a supernatural mix of pixies, steam engines, and binary, executed. This will make your brain leak. Weird machines are everywhere.
  2. European Libraries May Digitise Books Without Permission“The right of libraries to communicate, by dedicated terminals, the works they hold in their collections would risk being rendered largely meaningless, or indeed ineffective, if they did not have an ancillary right to digitize the works in question,” the court said. Even if the rights holder offers a library the possibility of licensing his works on appropriate terms, the library can use the exception to publish works on electronic terminals, the court ruled. “Otherwise, the library could not realize its core mission or promote the public interest in promoting research and private study,” it said.
  3. CausalImpact (GitHub) — Google’s R package for estimating the causal effect of a designed intervention on a time series. (via Google Open Source Blog)
  4. Laws of Crappy Dashboards — (caution, NSFW language … “crappy” is my paraphrase) so true. Not talking to users will result in a [crappy] dashboard. You don’t know if the dashboard is going to be useful. But you don’t talk to the users to figure it out. Or you just show it to them for a minute (with someone else’s data), never giving them a chance to figure out what the hell they could do with it if you gave it to them.
Comment: 1
Four short links: 1 September 2014

Four short links: 1 September 2014

Sibyl, Bitrot, Estimation, and ssh

  1. Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets)
  2. Bitrot from 1997That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves only 21 URLs as 200 OK and containing effectively the same content.
  3. What We Do And Don’t Know About Software Effort Estimation — nice rundown of research in the field.
  4. fabric — simple yet powerful ssh library for Python.
Comment: 1
Four short links: 27 August 2014

Four short links: 27 August 2014

Discourse 1.0, Programmable Matter, Versioned Databases, and What Humans Learned About Machine Learning

  1. Discourse turns 1.0 — community/forum software that doesn’t suck.
  2. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area.
  3. Liquibasesource control for your database. Apache 2.0 licensed.
  4. A Few Useful Things to Know About Machine Learning (PDF) — This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions. My fave: First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it, and how much trial and error can go into feature design.
Comments: 2