Strata Week: Can big data save human language?

Big data and language preservation, growing data privacy concerns, and a comparison of big data to crude oil.

Preserving human language with big data

Inspired by Deb Roy’s 2011 TEDTalk, “The Birth of a Word,” Nataly Kelly at the Huffington Post’s TEDWeekends took a look at the potential effect big data could have on language — specifically, on preserving endangered and dying languages.

Kelly uses the story of the revitalization of the Wampanoag tribe’s ancestral language from her book Found in Translation. Wampanoag tribe member Jessie Little Doe Baird was able to literally bring the language back from the dead, Kelly notes, because of one key ingredient: data. Kelly writes:

“In the case of the Wampanoag language, this data came in the form of numerous historical documents written in Wampanoag — land deeds, legal contracts, and religious texts — along with English versions. The terminology and grammatical data embedded in these translated documents enabled Ms. Baird and her community to restore the language and use it in everyday life. Now, just imagine what they could have done with 90,000 hours of video.”

Big data may be the key, Kelly says, to helping preserve small languages and the knowledge they contain. “As we consider the broad-reaching implications of Roy’s research, let’s remember that big data projects not only enable machines to communicate in human-like ways,” Kelly writes. “They can enable human beings to continue to use human languages — and to help human knowledge continue to evolve.” You can read Kelly’s piece at Huffington Post TEDWeekends.

In conjunction with the Huffington Post’s TED Weekends, Shirin Samimi-Moore posted three related essays on the TED blog: Deb Roy: The Birth of a Word; Gayatri Devi: How Do I Improve My Memory? Forget More!; and Ben Hecht: Big Data Gets Personal in U.S. Cities.

Strata Conference Santa Clara — Strata Conference Santa Clara, being held Feb. 26-28, 2013 in California, gives you the skills, tools, and technologies you need to make data work today. Learn more

Data privacy concerns grow in education and on the web

Katrina Schwartz at Mind/Shift took a look this week at the role of data and data collection in education. Highlighting research by Reynol Junco, Schwartz writes: “Using data as formative assessment — providing feedback to students in incremental steps rather than with big tests like mid-terms or finals — can be helpful to both students and teachers, [Junco] says.”

The data can also make education more flexible and help educators customize lessons on an individual level. Schwartz quotes Junco from an episode of Tell Me More on NPR: “We’ve got data well before a student will flunk a first exam or a quiz and so we can make some predictions about the things that they’re doing and how we might intervene before we get to that point.”

Student data collection isn’t without its issues. Schwartz notes that such practices are beginning to draw more scrutiny, as “there are still major concerns about how that data will be used, including issues around student privacy and teacher evaluations.”

In related news, Antone Gonsalves at ReadWrite takes a look at a new study by market research firm Ovum that “indicates that millions of people could start ‘vanishing’ [going 'data dark'] from the web within a few years, causing major disruptions to the Internet economy.” Why? Privacy concerns.

Gonsalves takes a look at some signs that consumers are starting to understand and care about their data — if it were an option, 68% would block all tracking, according to the Ovum study — and he highlights signs that people’s attitudes toward guarding their personal data are changing. “As people become more aware of how Internet companies are using their personal data and how it affects their lives,” he notes, “they will start sharing less and demand more control.” You can read Gonsalves’ full report at ReadWrite.

Big data must be extracted and refined, much like crude oil

Serial entrepreneur Arvind Singh likened big data to crude oil this week in a post at Wired. “The current data bonanza harkens back to the early days of the oil boom,” he writes. “Big Data has become a vast, largely untapped resource capable of powering progress, and it requires a certain expertise to extract and refine to maximize its utility.”

It’s this need for expertise and refinement, Singh says, that’s critical for businesses to harness the power of big data. Without the proper tools and analysis, he writes, data can be more disruptive than informative for businesses. Singh notes the big picture:

“IDC projects that, while organizations will increase spending on big data from $3.2 billion in 2010 to $16.9 billion in 2015, more than 85% of Fortune 500 organizations will fail to exploit big data for competitive advantage.”

Singh outlines ways businesses can successfully take advantage of big data’s opportunities. You can read his piece in full at Wired.

Tip us off

News tips and suggestions are always welcome, so please send them along.

Related:

tags: , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.