Sharing, analysis, and collaboration

This project is creating a community software infrastructure, called LearnSphere, that supports sharing, analysis, and collaboration across a wide variety of educational data. LearnSphere supports researchers as they improve their understanding of human learning. It also helps course developers and instructors improve teaching and learning through data-driven course redesign. The goal is to transform learning science and engineering through a large, distributed data infrastructure and develop the capacity for course developers, instructors, and learning engineers to make use of.

Central Repository

LearnSphere not only maintains a central store of metadata about what datasets exist, but also has distributed features allowing contributors control over access to their own data. It provides a hub to link many communities of educational researchers, provides a repository for researchers to store their data, and provides an open analytic method library and workflow-authoring environment for researchers to build models and run them across datasets.


The research team has extensive experience not only in using educational data mining to make discoveries and improve student outcomes, but also in the creation of educational data infrastructures. They have developed the DataShop infrastructure, which is currently the largest open repository of educational technology data including over 550 datasets. A newer data infrastructure, MOOCdb, is being developed to store and analyze Massively Open Online Course (MOOC) data. The Open Learning Initiative has produced data stored in DataShop for many years and is expanding into the MOOC space. Dialogue-based tutoring systems and student affect sensors are producing new kinds of data that are being added to LearnSphere. The researchers are further improving data collection infrastructure in MOOCs especially by adding platform components for massive multi-factor online experiments. The project is also creating new methods for data integration, discourse data storage and analytics, and new algorithms for automated discovery, as well as new learning science discoveries that result from these algorithms.

By integrating these building blocks in LearnSphere, the project will facilitate cross-modality and cross-domain educational data analysis that is not possible today.