"data" entries

Signals from Strata + Hadoop World in Barcelona 2014

From the Internet of Things to data-driven fashion, here are key insights from Strata + Hadoop World in Barcelona 2014.

Experts from across the big data world came together for Strata + Hadoop World in Barcelona 2014. We’ve gathered insights from the event below.

#IoTH: The Internet of Things and Humans

“If we could start over with these capabilities we have now, how would we do it differently?” Tim O’Reilly continues to explore data and the Internet of Things through the lens of human empowerment and the ability to “use technology to give people superpowers.”

Read more…

Comment
Four short links: 14 November 2014

Four short links: 14 November 2014

Completing Maps, ChatOps, Career Design, and Data Privacy

  1. Missing Maps Fill In the Blanks (New Scientist) — OpenStreetMap project to crowdmap slums around the world.
  2. Chatops — devops deployment chatter with Hubot.
  3. Alternatives to Tech Career Ladders — Spotify trying to figure out how to keep engineers challenged as they become more senior.
  4. Mozilla’s Data Privacy Principles — well-articulated and useful: without pre-defined principles, it’s so easy to accidentally collect or poorly protect data.
Comment
Four short links: 11 November 2014

Four short links: 11 November 2014

High-Volume Logs, Regulated Broadband, Oculus Web, and Personal Data Vacuums

  1. Infrastructure for Data Streams — describing the high-volume log data use case for Apache Kafka, and how it plays out in storage and infrastructure.
  2. Obama: Treat Broadband and Mobile as Utility (Ars Technica) — In short, Obama is siding with consumer advocates who have lobbied for months in favor of reclassification while the telecommunications industry lobbied against it.
  3. MozVR — a website, and the tools that made it, designed to be seen through the Oculus Rift.
  4. All Cameras are Police Cameras (James Bridle) — how the slippery slope is ridden: When the Wall was initially constructed, the public were informed that this [automatic license plate recognition] data would only be held, and regularly purged, by Transport for London, who oversee traffic matters in the city. However, within less than five years, the Home Secretary gave the Metropolitan Police full access to this system, which allowed them to take a complete copy of the data produced by the system. This permission to access the data was granted to the Police on the sole condition that they only used it when National Security was under threat. But since the data was now in their possession, the Police reclassified it as “Crime” data and now use it for general policing matters, despite the wording of the original permission. As this data is not considered to be “personal data” within the definition of the law, the Police are under no obligation to destroy it, and may retain their ongoing record of all vehicle movements within the city for as long as they desire.
Comment
Four short links: 6 November 2014

Four short links: 6 November 2014

Javascript Testing, Dark Data, Webapp Design, and Design Trumps Data

  1. Karma — kick-ass open source Javascript test environment.
  2. The Dark Market for Personal Data (NYTimes) — can buy lists of victims of sexual assault, of impulse buyers, of people with sexually transmitted disease, etc. The cost of a false-positive when those lists are used for marketing is less than the cost of false-positive when banks use the lists to decide whether you’re a credit risk. The lists fall between the cracks in privacy legislation; essentially, the compilation and use of lists of people are unregulated territory.
  3. 7 Principles of Rich Web Applications — “rich web applications” sounds like 2007 wants its ideas back, but the content is modern and useful. Predict behaviour for negative latency.
  4. Collaborative Filtering at LinkedIn (PDF) — This paper presents LinkedIn’s horizontal collaborative filtering infrastructure, known as browsemaps. Great lessons learned, including context and presentation of browsemaps or any recommendation is paramount for a truly relevant user experience. That is, design and presentation represents the largest ROI, with data engineering being a second, and algorithms last. (via Greg Linden)
Comment
Four short links: 4 November 2014

Four short links: 4 November 2014

3D Shares, Autonomous Golf Carts, Competitive Solar, and Interesting Data Problems

  1. Cooper-Hewitt Shows How to Share 3D Scan Data Right (Makezine) — important as we move to a web of physical models, maps, and designs.
  2. Singapore Tests Autonomous Golfcarts (Robohub) — a reminder that the future may not necessarily look like someone used the clone tool to paint Silicon Valley over the world.
  3. Solar Hits Parity in 10 States, 47 by 2016 (Bloomberg) — The reason solar-power generation will increasingly dominate: it’s a technology, not a fuel. As such, efficiency increases and prices fall as time goes on. The price of Earth’s limited fossil fuels tends to go the other direction.
  4. Facebook’s Top Open Data Problems (Facebook Research) — even if you’re not interested in Facebook’s Very First World Problems, this is full of factoids like Facebook’s social graph store TAO, for example, provides access to tens of petabytes of data, but answers most queries by checking a single page in a single machine. (via Greg Linden)
Comment
Four short links: 29 October 2014

Four short links: 29 October 2014

Tweet Parsing, Focus and Money, Challenging Open Data Beliefs, and Exploring ISP Data

  1. TweetNLP — CMU open source natural language parsing tools for making sense of Tweets.
  2. Interview with Google X Life Science’s Head (Medium) — I will have been here two years this March. In nineteen months we have been able to hire more than a hundred scientists to work on this. We’ve been able to build customized labs and get the equipment to make nanoparticles and decorate them and functionalize them. We’ve been able to strike up collaborations with MIT and Stanford and Duke. We’ve been able to initiate protocols and partnerships with companies like Novartis. We’ve been able to initiate trials like the baseline trial. This would be a good decade somewhere else. The power of focus and money.
  3. Schooloscope Open Data Post-MortemThe case of Schooloscope and the wider question of public access to school data challenges the belief that sunlight is the best disinfectant, that government transparency would always lead to better government, better results. It challenges the sentiments that see data as value-neutral and its representation as devoid of politics. In fact, access to school data exposes a sharp contrast between the private interest of the family (best education for my child) and the public interest of the government (best education for all citizens).
  4. M-Lab Observatory — explorable data on the data experience (RTT, upload speed, etc) across different ISPs in different geographies over time.
Comment

Signals from Strata + Hadoop World New York 2014

From unique data applications to factories of the future, here are key insights from Strata + Hadoop World New York 2014.

Experts from across the data world came together in New York City for Strata + Hadoop World New York 2014. Below we’ve assembled notable keynotes, interviews, and insights from the event.

Unusual data applications and the correct way to say “Hadoop”

Hadoop creator and Cloudera chief architect Doug Cutting discusses surprising data applications — from dating sites to premature babies — and he reveals the proper (but in no way required) pronunciation of “Hadoop.”

Read more…

Comment

What happens when fashion meets data: The O’Reilly Radar Podcast

Liza Kindred on the evolving role of data in fashion and the growing relationship between tech and fashion companies.

Editor’s note: you can subscribe to the O’Reilly Radar Podcast through iTunes, SoundCloud, or directly through our podcast’s RSS feed.

In this podcast episode, I talk with Liza Kindred, founder of Third Wave Fashion and author of the new free report “Fashioning Data: How fashion industry leaders innovate with data and what you can learn from what they know.” Kindred addresses the evolving role data and analytics are playing in the fashion industry, and the emerging connections between technology and fashion companies. “One of the things that fashion is doing better than maybe any other industry,” Kindred says, “is facilitating conversations with users.”

Gathering and analyzing user data creates opportunities for the fashion and tech industries alike. One example of this is the trend toward customization. Read more…

Comment
Four short links: 15 September 2014

Four short links: 15 September 2014

Weird Machines, Libraries May Scan, Causal Effects, and Crappy Dashboards

  1. The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code into metadata which is then, through a supernatural mix of pixies, steam engines, and binary, executed. This will make your brain leak. Weird machines are everywhere.
  2. European Libraries May Digitise Books Without Permission“The right of libraries to communicate, by dedicated terminals, the works they hold in their collections would risk being rendered largely meaningless, or indeed ineffective, if they did not have an ancillary right to digitize the works in question,” the court said. Even if the rights holder offers a library the possibility of licensing his works on appropriate terms, the library can use the exception to publish works on electronic terminals, the court ruled. “Otherwise, the library could not realize its core mission or promote the public interest in promoting research and private study,” it said.
  3. CausalImpact (GitHub) — Google’s R package for estimating the causal effect of a designed intervention on a time series. (via Google Open Source Blog)
  4. Laws of Crappy Dashboards — (caution, NSFW language … “crappy” is my paraphrase) so true. Not talking to users will result in a [crappy] dashboard. You don’t know if the dashboard is going to be useful. But you don’t talk to the users to figure it out. Or you just show it to them for a minute (with someone else’s data), never giving them a chance to figure out what the hell they could do with it if you gave it to them.
Comment: 1
Four short links: 3 September 2014

Four short links: 3 September 2014

Distributed Systems Theory, Chinese Manufacturing, Quantified Infant, and Celebrity Data Theft

  1. Distributed Systems Theory for the Distributed Systems EngineerI tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer; what I consider ‘table stakes’ for distributed systems engineers competent enough to design a new system.
  2. Shenzhen Trip Report (Joi Ito) — full of fascinating observations about how the balance of manufacturing strength has shifted in surprising ways. The retail price of the cheapest full featured phone is about $9. Yes. $9. This could not be designed in the US – this could only be designed by engineers with tooling grease under their fingernails who knew the manufacturing equipment inside and out, as well as the state of the art of high-end mobile phones.
  3. SproutlingThe world’s first sensing, learning, predicting baby monitor. A wearable band for your baby, a smart charger and a mobile app work together to not only monitor more effectively but learn and predict your baby’s sleep habits and optimal sleep conditions. (via Wired)
  4. Notes on the Celebrity Data Theft — wonderfully detailed analysis of how photos were lifted, and the underground industry built around them. This was one of the most unsettling aspects of these networks to me – knowing there are people out there who are turning over data on friends in their social networks in exchange for getting a dump of their private data.
Comment