A community data infrastructure to support learning improvement online.

World class repository
of education data.

Building on DataShop, the world's largest open repository of transactional data, and MOOCdb, a database design and supporting framework created to harness the vast amounts of data being generated by MOOCs, LearnSphere will integrate existing and new educational data infrastructures to offer a world class repository of education data.

Data-driven course design

LearnSphere will enable new opportunities for learning education researchers, course developers, and instructors to better evaluate causal claims, leading to improved teaching and learning. This data driven course redesign is possible both through better analytics of relational data and through online platform support of controlled experimentation.

Large distributed data infrastructure

LearnSphere will facilitate a distributed method of data storage and access control. LearnSphere offers a central portal for the sharing, storage, and analysis of public and private datasets. For private datasets, a local storage option allows researchers to share tools and results while maintaining ownership of their data.

Analytics sharing & use

While a standard set of analysis tools allow researchers to quickly start gathering information, users can also leverage user-contributed workflows and tools to perform other methods of analysis on their data. Using a community based tool repository, researchers will be able to quickly build new models, create derivative works, or improve existing tools and share their work with their team or the whole world.

Current Resources

The LearnLab DataShop is a data repository and web application for learning science researchers. It provides secure data storage as well as an array of analysis and visualization tools available through a web-based interface. DataShop was funded by a National Science Foundation grants (SBE-0836012, SBE-0354420) to LearnLab, the Pittsburgh Science of Learning Center.

The MOOCdb project aims to brings together education researchers, computer science researchers, machine learning researchers, technologists, database and big data experts to advance MOOC data science. The project founded at MIT includes a platform agnostic functional data model for data exhaust from MOOCs, a collaborative-open source-open access data visualization framework, a crowd sourced knowledge discovery framework and a privacy preserving software framework. The team is currently working to release a number of these tools and frameworks as open source.

DataStage is provided by the Vice Provost Office for Online Learning (VPOL) at Stanford, which facilitates the teaching of online classes. The instruction delivery platforms are instrumented to collect a variety of data around participants' interaction with the study material. Examples are participants manipulating video players as they view portions of a class, solution submissions to problem sets, uses of the online forum available for some classes, peer grading activities, and some demographic data. VPOL makes some of this data available for research on learning processes, and for explorations into improving instruction through Datastage.

DiscourseDB is a data infrastructure project, in the space of collaborative and Discussion-based learning, that aims to provide a common data model to accommodate diverse sources including but not limited to Chat, Threaded Discussions, Blogs, Twitter, Wikis and Text messaging. In the future, the project will make available analytics which will facilitate research questions related to the mediating and moderating effects of role taking, help exchange, collaborative knowledge construction and others.

Tigris is a workflow authoring tool which is part of the community software infrastructure being built for the LearnSphere project. The platform will provide a way to create custom analyses and interact with proprietary data formats and repositories, such as DataShop, MOOCdb, DiscourseDB and DataStage.