When Sefaria was first launched in 2012, the library contained over four million words and almost eight thousand intertextual connections. Combining tech and Torah to create an interconnected text experience was a truly innovative approach. By 2020 we had grown the library to over 200 million words with almost 2.5 million intertextual links.
But the library as it was, was missing something. It was great for learners who knew what they were looking for or which book they wanted to study, but for learners with little textual background or who might not know where to look, the Jewish canon was difficult to navigate – even with an interconnected database and sleek user-interface. Sefaria Topics is the first step in our solution to make the Jewish canon even more accessible.
And so began the major work of creating the first Jewish ontology. An ontology refers to “a set of concepts and categories in a subject area or domain that shows their properties and the relations between them” – or simply put categorizing our texts into easy-to-navigate topics and mapping their relationships to each other. The project, which came to be known as Sefaria Topics, involved around 10 people, including engineers, scholars, writers, editors, and user experience specialists.
Visualization of the Ontology of Biblical Personalities
It all started with digitizing an online encyclopedia of Jewish thought, called Aspaklaria. We then included data from other reliable sources like Sefer HaAggadah and WikiData and utilized tags that users had attached to their source sheets. Along the way, we ran into some interesting challenges. One such challenge presented itself in the form of translations.
“In English, there is little distinction between happiness and joy – to the extent that there is a difference between osher and simcha… Another example of such categories is compassion and mercy. And it seems that there are others as well,” said Rabbi Francis Nataf, a scholar working on the project. So while you could easily navigate through these related topics in Hebrew, the separate topics would be indistinguishable from each other in English,.
Another obstacle we encountered was merging duplicate and similar topics. Since our data for Sefaria Topics comes from several sources, including user-generated tags, we needed to merge the data and create a common language. One way this played out was through different spellings of the same word such as Chanukah (Hanukkah, Chanukka, חנוכה), and Shavuot (Shavuos, שבועות).
While this issue was simple to solve and anticipate, it was a little more difficult to merge categories that were extremely similar. Noah Santacruz, the lead engineer on the project highlighted, “At first we thought the categories isolation and loneliness were essentially the same thing – coronavirus ended up proving this assumption wrong, with today’s technology connecting us even while physically apart”
Visualization of the Ontology of Halakha
“I’m extremely proud of what our team has accomplished over the last year and a half and the work we’ve done to make Sefaria even more accessible. We’ll continue to work on and evolve Sefaria Topics, and other features, to not only make learning easier but to create opportunities for more in-depth and compelling research in the future,” says Lev Israel, Sefaria’s Chief Data Officer.
Sefaria’s Jewish ontology currently contains 17,730 topics, 3,308 of which are linked to at least 5 sources, and 20,424 links connecting those topics to each other. Sefaria Topics lays the groundwork for developments to simplify the research involved in deeper learning. While we aren’t there quite yet, the team is hopeful that this development will create the potential for asking even more complex questions of our tradition’s core texts.