Abstract:
This Final Master Thesis is the result of an initiative proposed by the Prodis Foundation, aimed at creating tools that promote inclusion and personal development through technology. In response, an intelligent conversational system capable of simulating personalised interviews with public figures has been designed and implemented, with potential applications in educational, training, and communication contexts. The system combines advanced artificial intelligence technologies, including large language models (LLMs), web information retrieval through APIs, automatic question generation, answer analysis via an evaluator agent, voice synthesis and recognition, and visual representation through an interactive 3D avatar. Its architecture is structured in a frontend based on React and Three.js, and a modular backend in Python with FastAPI and LangChain, organised in four functional layers: orchestrator, intelligent agents, internal services, and external services.
The flow starts with the automatic collection of information about a public figure, which is summarised and stored semantically. From this, a guided interview is generated, evaluated, and narrated in real time using voice and facial animation. The whole process is saved and visualised in an accessible final report.
A comparison of different language models has been conducted, evaluating their performance in content generation, contextual evaluation, and narrative quality. The results confirm a greater robustness and consistency in OpenAI and Gemini, compared to the more limited approach of LLaMA. However, the latter stands out for its accessibility as an open-source model.
The system establishes a solid foundation for the development of expressive and accessible conversational assistants. Future lines of improvement include the integration of emotional analysis, multilingual support, and new automatic evaluation mechanisms to enhance its applicability in real environments with social impact.