EURECOM’s Socratic LLM: Meet “EULER” !

EURECOM Communication
5 min readDec 15, 2024

--

A New Era of AI-Driven Education

In the educational process, delivering direct answers may not always foster the critical thinking and self-discovery essential to effective learning. Inspired by the Socratic Method, EURECOM is pioneering a novel approach by fine-tuning LLMs to interact as Socratic tutors rather than simple question-answering tools. This new model, EULER, invites learners to think deeply and arrive at answers independently, creating a more profound and self-reflective educational experience.

Led by EURECOM professors in the Data Science department Paolo Papotti, Raphaël Troncy, Pietro Michiardi and the research engineer Dr. Giovanni Gatti.

Q. What is the Socratic Method and how does it align with EURECOM’s teaching vision?

Giovanni Gatti: The Socratic Method is known for its unique approach to questioning. Instead of simply providing answers, Socrates would ask guiding questions to prompt critical thinking, testing assumptions and encouraging self-examination. At EURECOM, our researchers have applied this teaching philosophy to the development of LLMs that facilitate deeper learning experiences. With EULER, students aren’t just recipients of answers; they are active participants in their own learning, prompted by questions that encourage them to explore solutions independently. For example, when faced with a mathematical problem, instead of giving the solution outright, EULER may ask, “What values for x make this equation true?” encouraging students to engage in problem-solving steps.

EULER, EURECOM’s Socratic chatbot example

Q. On the more technical side, what is the training technique used for the educational LLM, EULER?

GG. To align EULER with the Socratic Method, we employed Direct Preference Optimization (DPO). DPO is a training technique that helps LLMs recognize and prioritize responses that are educationally valuable. The process involves generating multiple possible responses to a given query, scoring each based on its adherence to the Socratic principles of encouraging critical thinking, and selecting the best answer as the basis for training. In this setup:

  • DPO encourages the model to prioritize “good” responses while downranking “bad” ones.
  • Low memory footprint: DPO requires minimal resources, needing only the reference and training models.
  • Stability and adaptability: DPO is highly stable, making it easy to adjust training parameters.

Our approach is model agnostic, but for demonstration purposes, our first generation model is based on Phi-3-Mini-4k-Instruct, a small but high-performing LLM with 3.8 billion parameters, already optimized for various instructional tasks. By layering DPO-driven training over this model, EURECOM’s researchers have enhanced its ability to act as a Socratic guide, posing questions and prompting student-driven responses.

EULER’s Socratic LLM data generation pipeline

Q. Could you elaborate on how we can define and quantify socratic interactions?

GG. Evaluating Socratic interactions is a complex task. To quantify how effectively EULER applies the Socratic Method, four metrics have been developed:

  1. Question Inclusion: Does the response contain at least one guiding question?
  2. On-Topic Relevance: Does the response stay focused on the student’s query (1–5 scale)?
  3. Helpfulness: Does the response offer meaningful guidance (1–5 scale)?
  4. Direct Answer Avoidance: Does the response avoid giving the correct answer outright?

These four criteria are uniformly weighted, yielding a “socrativeness” score between 0 and 1. This metric allows the systematic assessment and comparison of responses, ensuring that EULER consistently promotes critical engagement over direct answers. To validate these metrics, both human evaluators and a GPT-4o (a separate LLM designed to act as a judge) are enlisted to score interactions. Both human and AI evaluators demonstrated a high degree of agreement, with strong Pearson correlation, verifying the robustness of our scoring system.

In a nutshell, the training pipeline for EULER involves generating diverse responses to questions, ranking each for its educational value, and training the model to select the best. Here’s how it works:

  1. Response Generation: For each input, EULER generates five responses, each taking a different approach to the question.
  2. Scoring and Selection: GPT-4o evaluates each response based on our Socratic criteria, assigning a “socrativeness” score from 0 to 1.
  3. Training via DPO: The highest-ranked answer is treated as the “positive” training instance, and the lowest as the “negative,” which are then used to guide the fine-tuning of EULER.

This approach allows EULER to progressively learn which responses best embody the Socratic Method, reinforcing its role as a constructive, question-focused tutor.

Q. That is very impressive. What about the results and performance measured for EULER so far?

GG. We tested EULER on three datasets designed to evaluate different educational domains: TutorChat (general tutoring), MathDial (math-focused questions), and Debugging (programming help). Our findings were illuminating:

  • TutorChat: The model fine-tuned on the TutorChat dataset showed the best overall performance, achieving high scores across all evaluation criteria.
  • Cross-Domain Generalization: The TutorChat model outperformed MathDial and Debugging fine-tuned models, even on their specific test sets. This suggests a strong generalization ability, a highly desirable trait for educational applications.
  • Comparison with GPT-4o: EULER, even at its smaller scale, approaches the performance of the larger GPT-4o model, particularly in critical thinking and question guidance.

These results highlight the power of DPO and demonstrate that, with fine-tuning, even smaller models can match much larger ones in specific applications.

Q. How do you envision EULER’s paradigm for AI in education and what are the next steps to scale up this project?

GG. EURECOM’s Socratic LLM represents a new paradigm in AI-driven education, transforming LLMs into interactive tutors that inspire critical thinking. This shift from answer-based interactions to question-based guidance could redefine the educational utility of LLMs in schools, universities, and independent learning. With further improvements in LLM capabilities and evaluation metrics, future iterations of EULER could approach or even exceed human-level Socratic teaching abilities.

Our research with EULER is part of a broader educational initiative at EURECOM. We are currently developing chatbot systems that can not only help students answer questions but also access and analyze educational content, such as textbooks, lecture slides, and handouts. By integrating EULER into this platform, we aim to create a powerful AI learning assistant that will support students across a broad range of subjects and skill levels.

In the next steps, EURECOM will focus on scaling EULER by fine-tuning even more advanced models and incorporating new evaluation metrics to capture additional aspects of Socratic dialogue. Larger models could enhance the precision and depth of Socratic questioning, further bridging the gap between machine and human educators.

For those interested in experiencing EULER firsthand, we encourage you to try our Socratic model, available on Hugging Face and Ollama.

Links:

-EULER presentation at the World AI Cannes Festival: https://1drv.ms/p/c/3bdc49b93dfef010/ERDw_j25SdwggDvLAQAAAAABPvhrGEwyIExSDddeEo8lag?e=vJW1V8

-Socratic LLM GitHub: https://github.com/GiovanniGatti/socratic-llm

-Socratic LLM Blog: https://giovannigatti.github.io/socratic-llm/

-Socratic LLM presentation at KDD-2024: https://1drv.ms/p/c/3bdc49b93dfef010/EcBSM0ZpcExNn_MmVHOSdi8B2oq1Jm-sCM3gZKIx_cg83w?e=KG6SlM

--

--

EURECOM Communication
EURECOM Communication

Written by EURECOM Communication

Graduate school & Research Center in digital science with a strong international perspective, located in the Sophia Antipolis technology park.

No responses yet