"cv" entries

Four short links: 1 September 2015

Four short links: 1 September 2015

People Detection, Ratings Patterns, Inspection Bias, and Cloud Filesystem

  1. End-to-End People Detection in Crowded Scenes — research paper and code. When parsing the title, bind “end-to-end” to “scenes” not “people”.
  2. Statistical Patterns in Movie Ratings (PLOSone) — We find that the distribution of votes presents scale-free behavior over several orders of magnitude, with an exponent very close to 3/2, with exponential cutoff. It is remarkable that this pattern emerges independently of movie attributes such as average rating, age and genre, with the exception of a few genres and of high-budget films.
  3. The Inspection Bias is EverywhereIn 1991, Scott Feld presented the “friendship paradox”: the observation that most people have fewer friends than their friends have. He studied real-life friends, but the same effect appears in online networks: if you choose a random Facebook user, and then choose one of their friends at random, the chance is about 80% that the friend has more friends. The friendship paradox is a form of the inspection paradox. When you choose a random user, every user is equally likely. But when you choose one of their friends, you are more likely to choose someone with a lot of friends. Specifically, someone with x friends is overrepresented by a factor of x.
  4. s3qla file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. (GPLv3)
Comment
Four short links: 27 July 2015

Four short links: 27 July 2015

Google’s Borg, Georgia v. Malamud, SLAM-aware system, and SmartGPA

  1. Large-scale Cluster Management at Google with BorgGoogle’s Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters, each with up to tens of thousands of machines. […] We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.
  2. Georgia Sues Carl Malamud (TechDirt) — for copyright infringement… for publishing an official annotated copy of the state's laws. […] the state points directly to the annotated version as the official laws of the state.
  3. Monocular SLAM Supported Object Recognition (PDF) — a monocular SLAM-aware object recognition system that is able to achieve considerably stronger recognition performance, as compared to classical object recognition systems that function on a frame-by-frame basis. (via Improving Object Recognition for Robots)
  4. SmartGPA: How Smartphones Can Assess and Predict Academic Performance of College Students (PDF) — We show that there are a number of important behavioral factors automatically inferred from smartphones that significantly correlate with term and cumulative GPA, including time series analysis of activity, conversational interaction, mobility, class attendance, studying, and partying.
Comment: 1
Four short links: 6 July 2015

Four short links: 6 July 2015

DeepDream, In-Flight WiFi, Computer Vision in Preservation, and Testing Distributed Systems

  1. DeepDream — the software that’s been giving the Internet acid-free trips.
  2. In-Flight WiFi Business — numbers and context for why some airlines (JetBlue) have fast free in-flight wifi while others (Delta) have pricey slow in-flight wifi. Four years ago ViaSat-1 went into geostationary orbit, putting all other broadband satellites to shame with 140 Gbps of total capacity. This is the Ka-band satellite that JetBlue’s fleet connects to, and while the airline has to share that bandwidth with homes across of North America that subscribe to ViaSat’s Excede residential broadband service, it faces no shortage of capacity. That’s why JetBlue is able to deliver 10-15 Mbps speeds to its passengers.
  3. British Library Digitising Newspapers (The Guardian) — as well as photogrammetry methods used in the Great Parchment Book project, Terras and colleagues are exploring the potential of a host of techniques, including multispectral imaging (MSI). Inks, pencil marks, and paper all reflect, absorb, or emit particular wavelengths of light, ranging from the infrared end of the electromagnetic spectrum, through the visible region and into the UV. By taking photographs using different light sources and filters, it is possible to generate a suite of images. “We get back this stack of about 40 images of the [document] and then we can use image-processing to try to see what is in [some of them] and not others,” Terras explains.
  4. Testing a Distributed System (ACM) — This article discusses general strategies for testing distributed systems as well as specific strategies for testing distributed data storage systems.
Comment
Four short links: 1 July 2015

Four short links: 1 July 2015

Recovering from Debacle, Open IRS Data, Time Series Requirements, and Error Messages

  1. Google Dev Apologies After Photos App Tags Black People as Gorillas (Ars Technica) — this is how you recover from a unequivocally horrendous mistake.
  2. IRS Finally Agrees to Release Non-Profit Records (BoingBoing) — Today, the IRS released a statement saying they’re going to do what we’ve been hoping for, saying they are going to release e-file data and this is a “priority for the IRS.” Only took $217,000 in billable lawyer hours (pro bono, thank goodness) to get there.
  3. Time Series Database Requirements — classic paper, laying out why time-series databases are so damn weird. Their access patterns are so unique because of the way data is over-gathered and pushed ASAP to the store. It’s mostly recent, mostly never useful, and mostly needed in order. (via Thoughts on Time-Series Databases)
  4. Compiler Errors for Humans — it’s so important, and generally underbaked in languages. A decade or more ago, I was appalled by Python’s errors after Perl’s very useful messages. Today, appreciating Go’s generally handy errors. How a system handles the operational failures that will inevitably occur is part and parcel of its UX.
Comment
Four short links: 10 June 2015

Four short links: 10 June 2015

Product Sins, Container Satire, Dong Detection, and Evolving Code Designs

  1. The 11 Deadly Sins of Product Development (O’Reilly Radar) — they’re traps that are easy to fall into.
  2. It’s the Future — satire, but like all good satire it’s built on a rich vein of truth. Genuine guffaw funny, but Caution: Contains Rude Words.
  3. Difficulty of Dong Detection — accessible piece about how automated “inappropriate” detection remains elusive. (via Mind Hacks)
  4. Evolution of Code Design at Facebook — you may not have Facebook-scale scale problems, but if you’re having scale problems then Facebook’s evolution (not just their solutions) will interest you.
Comment
Four short links: 5 June 2015

Four short links: 5 June 2015

IoT and New Hardware Movement, OpenCV 3, FBI vs Crypto, and Transactional Datastore

  1. New Hardware and the Internet of Things (Jon Bruner) — The Internet of Things and the new hardware movement are not the same thing. The new hardware movement is driven by new tools for: Prototyping (inexpensive 3D printers, CNC machine tools, cheap and powerful microcontrollers, high-level programming languages on embedded systems); Fundraising and business development (Highway1, Lab IX); Manufacturing (PCH, Seeed); Marketing (Etsy, Quirky). The IoT is driven by: Ubiquitous connectivity; Cheap hardware (i.e., the new hardware movement); Inexpensive data processing and machine learning.
  2. OpenCV 3.0 Released — I hadn’t realised how much hardware acceleration comes out of the box with OpenCV.
  3. FBI: Companies Should Help us Prevent Encryption (WaPo) — as Mike Loukides says, we are in a Post-Modern age where we don’t trust our computers and they don’t trust us. It’s jarring to hear the organisation that (over-zealously!) investigates computer crime arguing that citizens should not be able to secure their communications. It’s like police arguing against locks.
  4. cockroacha scalable, geo-replicated, transactional datastore. The Wired piece about it drops the factoid that the creators of GIMP worked on Google’s massive BigTable-successor, Colossus. From Photoshop-alike to massive file systems. Love it.
Comment
Four short links: 20 May 2015

Four short links: 20 May 2015

Robots and Shadow Work, Time Lapse Mining, CS Papers, and Software for Reproducibility

  1. Rise of the Robots and Shadow Work (NY Times) — In “Rise of the Robots,” Ford argues that a society based on luxury consumption by a tiny elite is not economically viable. More to the point, it is not biologically viable. Humans, unlike robots, need food, health care and the sense of usefulness often supplied by jobs or other forms of work. Two thought-provoking and related books about the potential futures as a result of technology-driven change.
  2. Time Lapse Mining from Internet Photos (PDF) — First, we cluster 86 million photos into landmarks and popular viewpoints. Then, we sort the photos by date and warp each photo onto a common viewpoint. Finally, we stabilize the appearance of the sequence to compensate for lighting effects and minimize flicker. Our resulting time-lapses show diverse changes in the world’s most popular sites, like glaciers shrinking, skyscrapers being constructed, and waterfalls changing course.
  3. Git Repository of CS PapersThe intention here is to both provide myself with backups and easy access to papers, while also collecting a repository of links so that people can always find the paper they are looking for. Pull the repo and you’ll never be short of airplane/bedtime reading.
  4. Software For Reproducible ScienceThis quality is indeed central to doing science with code. What good is a data analysis pipeline if it crashes when I fiddle with the data? How can I draw conclusions from simulations if I cannot change their parameters? As soon as I need trust in code supporting a scientific finding, I find myself tinkering with its input, and often breaking it. Good scientific code is code that can be reused, that can lead to large-scale experiments validating its underlying assumptions.
Comment
Four short links: 11 May 2015

Four short links: 11 May 2015

Age of Infrastructure, Facial Expressions, Proof Assistants, and Programmer Talent

  1. Welcome to the Age of Infrastructure (Annalee Newitz) — The Internet isn’t that thing in there, inside your little glowing box. It’s in your washing machine, kitchen appliances, pet feeder, your internal organs, your car, your streets, the very walls of your house. You use your wearable to interface with the world out there.
  2. Facial Performance Sensing Head-Mounted Display (YouTube) — glorious use of an Oculus headset, to capture (for reproduction on an avatar) fine-grained facial expressions. From SIGGRAPH 2015.
  3. Mathematical Proof Assistants — human augmentation in mathematics.
  4. The Programmer Talent Myth (LWN) — Jacob Kaplan-Moss on the distribution of programmer talent and the damage that the bimodal myth causes.
Comments: 3
Four short links: 12 March 2015

Four short links: 12 March 2015

Billion Node Graphs, Asynchronous Systems, Deep Learning Hardware, and Vision Resources

  1. Mining Billion Node Graphs: Patterns and Scalable Algorithms (PDF) — slides from a CMU academic’s talk at C-BIG 2012.
  2. There Is No NowOne of the most important results in the theory of distributed systems is an impossibility result, showing one of the limits of the ability to build systems that work in a world where things can fail. This is generally referred to as the FLP result, named for its authors, Fischer, Lynch, and Paterson. Their work, which won the 2001 Dijkstra Prize for the most influential paper in distributed computing, showed conclusively that some computational problems that are achievable in a “synchronous” model in which hosts have identical or shared clocks are impossible under a weaker, asynchronous system model.
  3. Deep Learning Hardware GuideOne of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high performance system.
  4. Awesome Computer Vision — curated list of computer vision resources.
Comment
Four short links: 22 December 2014

Four short links: 22 December 2014

Manufacturers and Consumers, Time Management, Ethical Decisions, and Faux Faces

  1. Manufacturers and Consumers (Matt Webb) — manufacturers never spoke to consumers before. They spoke with distributors and retailers. But now products are connected to the Internet, manufacturers suddenly have a relationship with the consumer. And they literally don’t know what to do.
  2. Calendar Hacks (Etsy) — inspiration for your New Year’s resolution to waste less time.
  3. Making an Ethical Decision — there actually is an [web] app for that.
  4. Masks That Look Human to Computers — an artist creates masks that look like faces to face-recognition algorithms, but not necessarily to us. cf Deep Neural Networks are Easily Fooled.
Comment: 1