AI4IDF is an alliance of the leading AI institutes in Île-de-France, aiming to strengthen human-centered artificial intelligence research, while leveraging the region's scientific excellence and unique industrial ecosystem.
Île-de-France is home to the world's largest mathematics community, several of France's largest computer science laboratories, but also a dense industrial fabric in artificial intelligence.
In this extremely rich context, the four main Artificial Intelligence (AI) institutes - DATAIA, Hi! PARIS, PRAIRIE and SCAI - propose to create an alliance to structure and animate the community, and to offer industrial and international partners a unified vision of the exceptional forces at work.
AI4IDF is a structuring and federating research project devoted to artificial intelligence, a scientific and technological field that has become unavoidable.
AI4IDF aims to deepen knowledge in AI while keeping the human being at the center of its concerns. The Paris Region must play a major role in this future sector, thanks to its scientific excellence and its incomparable ecosystem.
The 4 axes
The research program is divided into four major areas, each corresponding to major current issues aimed at integrating AI into the human environment in order to improve life.
Axis 1Learning and optimization: between algorithmic efficiency and theoretical guarantees
Many of the recent advances in artificial intelligence, especially in computer vision and natural language processing, have been achieved using machine learning algorithms and in particular large-scale optimization.
If the successful implementation of learning in these domains requires "business" expertise (physics of image formation, linguistics) and methods adapted to the corresponding constraints, we can identify scientific questions common to all fields of AI impacted by learning and requiring the combination of new mathematical and computational approaches to be tackled and solved efficiently. These questions constitute the core of the "learning and optimization" axis, with three main themes.
Axis 1.1Deep learning theory
The theoretical foundations of linear models of classical pattern recognition tasks such as classification or regression are well understood. The same is not true for deep neural networks, which are nevertheless the most successful today in computer vision or natural language processing. This lack of theoretical basis hinders their deployment for critical applications, notably in medicine and transportation. Fundamental topics of study in this area are the generalizability and universal approximation capability of deep networks, the robustness of the corresponding optimization algorithms, such as stochastic gradient descent, as well as the explicability and certification of the results produced by these networks.
Axis 1.2Optimal use of resources
In learning and optimization, as in most disciplines related to computer science, a key issue is the efficient use of the computational power available at any given time, typically to minimize the computation time or memory footprint, as for example in distributed learning. Two fundamental questions arise today: (1) how to handle the many problems where large amounts of data are simply not available ("small data"), including simpler but perhaps more robust models such as kernel methods or the use of techniques to improve or enrich the training data; and (2) minimizing the environmental footprint of the optimization algorithms involved, including a finer understanding of their convergence, beyond worst-case analysis, and a consideration of total energy expended rather than computation time.
Axis 1.3Going beyond pattern recognition
The major recent advances have been achieved through supervised learning applied to regression or classification tasks (e.g., training a machine to decide whether or not a photograph contains a predefined set of objects based on a corpus of tens of thousands of manually annotated images). Modern learning problems go beyond this and require new algorithms and innovative analyses: (1) low supervision: in order to reduce the cost of labeling, many paradigms have emerged (self-supervised learning, "meta-learning", multi-task learning), and their theoretical understanding remains a largely open problem; (2) structured learning: what we seek to predict in this case goes beyond a simple label (e.g., a sequence of words in machine translation), which poses new algorithmic and theoretical questions; (3) reinforcement learning: prediction models interact with their environment (as in games or robotics), bringing learning closer to optimal control. We intend to explore the exchanges between these two disciplines (see axis 3).
Axis 2NLP and dialogue with humans
The second line of research will focus on AI domains related to language, and will notably bring together automatic language processing (NLP) and human-machine dialogue (conversational agents, etc.). It will be organized around four major themes, in strong interaction with each other.
Axis 2.1Large-scale language models
The focus here is on neural models that can have billions of parameters, themselves pre-trained in an unsupervised way on huge datasets, and then refined for a particular task. Models such as BERT (and its variants RoBERTa or, for French, CamemBERT), GPT-3, BART or T5 dominate today's automatic language processing and are the first and main examples of what are sometimes called "foundation models". We are already involved in collaborative initiatives around their design and learning, such as the BigScience project. In connection with axis 1, we propose to work on improving the architecture of these models, for example at the level of the elementary units of which they are composed in order to improve their generalization capacity and their robustness (cf. character-based models such as CharacterBERT or CANINE), or at the level of their algorithmic performance (linear approximations of attention, for example). In this context, the issues of "small data" and environmental footprint mentioned above are clearly relevant, in particular for the management of poorly endowed languages. Finally, we will work on the interpretability of these models, i.e. to better understand what the language models learn, but also when and how they learn it during training. This work will contribute to the previous directions. It will also contribute to scientific advances on the presence, impact and management of biases in language models, and more generally to their societal impact.
Axis 2.2Information extraction and text mining
We will rely in particular on the major language models discussed in the previous paragraph, but will also study approaches based on syntactic analysis. The documents concerned fall into three main categories: medical and biomedical documents, directly related to axis 4.1, technical documents (legal, financial, technical, scientific...) and heritage documents (archives, etc.), with many applications in digital humanities, notably in collaboration with the DIM MAP (Ancient and Heritage Materials).
Axis 2.3Controlled generation
This theme includes two main directions, text transformation (text-controlled generation) on the one hand and human-machine dialogue on the other. Concerning text transformation, we will focus on three applications: machine translation, automatic simplification, in particular to contribute to the development of FALC ("easy to read and understand"), and automatic summarization. Regarding machine-human dialogue, we will focus on three applications: multimodal synthesis (speech and non-verbal behaviors) for robots and other embodied conversational agents in connection with the robotics axis; persona-based synthesis allowing for a more polite and thus more efficient dialogue style in intelligent tutor systems; and synthesis guided by reinforcement learning, where accumulated rewards encourage collusion with the user.
Axis 2.4Transmodal and multimodal tasks
Cross-modal tasks include text generation from quantitative data (data2text), automatic captioning and subtitling (from an image or video), but also automatic speech transcription. The main multimodal issues are at the intersection between textual data and image/video data, or even speech data as well.
Two cross-cutting issues underlying these four themes will be at the heart of a significant part of our work: (1) the robustness of the models to linguistic variability (social network language, code switching, non-contemporaneous language states), and (2) the consideration of the linguistic and, for example in multimodal situations, extra-linguistic context. Finally, special attention will be paid to the production of datasets (annotated or raw, textual or multimodal), without which no machine learning is possible.
Axis 3Robotics, motion and interaction with humans
Industrial robots are still, for the most part, automatons repeating the same task over and over again, for example in a painting or welding cell in the automotive industry. Autonomous cars, announced at least 20 years ago, are still at the experimental stage, with French manufacturers rightly aiming in the medium term at a level of automation known as "2+", where a driver can entirely delegate driving (speed and direction) to the system under certain conditions, while remaining able to regain control at any time.
These rather slow advances are partly due to the difficulty of making robots and humans cohabit and communicate, for obvious safety reasons, but also to factors related to the modeling of an environment, often unstructured, indoors and outdoors, where natural and artificial agents communicate and interact in a dynamic and sometimes unpredictable way. These difficulties also represent a major obstacle to the development of high impact domains where robots must operate in the presence of (or even in contact with) people -- for example, cobotics, assistance to the elderly or sick, or intervention robotics for civil security or defense. In order for robotics to live up to its promises and achieve the expected socio-economic objectives, major scientific advances are still required. They will require both "hardware" experimental platforms benefiting from the latest progress in mechatronics and sensors (see the section "Robotics platform" later in this document) and software "intelligence" obtained through fundamental progress in several key areas of AI. We propose in this DRIM to focus on four complementary and interconnected research areas in which researchers from the four Institutes involved in AI4IDF excel.
Axis 3.1Statistical learning and optimal control
Statistical learning and optimal control come together in robotics. In particular, AI4IDF researchers are studying their links in the context of reinforcement learning for locomotion and navigation, but also via the representation of complex dynamic systems by Koopman theory. For robustness and energy consumption, many robotic platforms use underactuated systems, with fewer motors than degrees of freedom (the car is a good example). The control of such non-holonomic systems requires complex mathematical tools whose real-time numerical implementation poses formidable challenges and for which statistical learning is a very promising avenue.
Axis 3.2Visual perception
Artificial vision capabilities are essential for any autonomy in a dynamic environment. In the context of robotics, they are intimately linked to the task being performed. In particular, AI4IDF researchers are studying the predictive analysis of video streams for robot control and the coupling between visual information and reinforcement learning for locomotion and manipulation. Multimodal perception also plays an important role in research on social robotics for health (visual, audio, physiological data) and robotics in outdoor environments (image and LIDAR data).
The planning of a robot's actions must mix learning, control, perception and reasoning in a single or multi-robot context, indoors or outdoors, possibly taking into account the non-holonomy constraints mentioned for axis 3.1. In this context, AI4IDF researchers are studying in particular the fusion of mechanical and visual constraints in manipulation, as well as new algorithmic and theoretical tools for motion planning in the presence of obstacles in a dynamic and only partially known environment, in particular thanks to statistical learning.
Axis 3.4Robotics in interaction with human beings
The robots of tomorrow will have to fit into society. In this context, AI4IDF researchers are working on the design of interfaces, robots and interaction scenarios (HMI) and command sharing. They also conduct fundamental and applied research in the field of perception, interaction and social robotics, aiming in particular at automatically measuring, characterizing and modeling motor intentions and behaviors, and at establishing and maintaining personalized human-machine interactions. High-impact applications range from cobotics to assistance and rehabilitation..
Many of the scientific challenges in these areas are shared with those in the "learning and optimization" focus of this proposal, particularly with respect to weakly supervised learning. Nevertheless, many of these challenges are specific to robotics, such as the reproducibility of results and the fact that the assumption of independent and identically distributed data is generally not valid in an environment where (semi) autonomous agents evolve. These difficulties are partially compensated by the availability of accurate models of the corresponding physical processes (sensors, kinematics, dynamics), allowing for example the use of realistic synthetic data during learning. We also propose to use as test tasks for our work manipulation (object grasping, reorientation, assembly), locomotion (quadrupedal, or even bipedal robots, evolving in rough terrain) and multimodal perception (camera, lidar, radar), with applications in domains such as manufacturing robotics and (semi) autonomous vehicles..
Note to conclude that, as mentioned above, one of the major challenges of robotics is to make (semi) autonomous agents and humans cohabit. There are therefore real opportunities for collaboration with researchers in the humanities and social sciences in this context.
Axis 4AI in human life: examples from health, education and creation
AI has also, through its diffusion as a widely used technique, profoundly modified many other disciplines whose impact on the daily life of humans is tangible. We will develop these impacts through three application domains that irrigate various parts of society.
Axis 4.1Example of health
Significant advances in digital technologies in recent years have led to potentially major changes in patient care, although the promise of breakthroughs in digital health, in terms of personalized medicine, including prevention, diagnosis, treatment, and prognosis, has yet to be realized. The availability of patient data from electronic medical records, which will potentially be chained to data from city medicine, opens up access for researchers to the entire trajectory and temporality of patient health. In addition, biotechnology and preclinical developments, including molecular modeling, organoids, and single cells, as well as recurrent discoveries of new biomarkers coupled with relevant omics data, have paved the way for personalized, optimized medicine and systems medicine/biology approaches. In parallel, the digitization of biomedical imaging has increased access to patient anatomy and physiological information. Finally, the development of AI-based digital applications and software as medical devices (SaMD) aimed at personalizing patient treatment and medical decision making has raised the question of their clinical evaluation and validation. This massive, multi-source, multi-scale, heterogeneous, dispersed, structured or unstructured health data, including clinical (patient markers, trajectory and temporality), laboratory (hematological, biochemical, therapeutic drug monitoring, ...) and omics data will have to be used for a better patient care and the maintenance of a healthy population. The achievement of such objectives requires innovative and reliable statistical and machine learning models and tools for health data, which present strong particularities and make the modeling process specific and tedious. The work carried out within this axis will allow to propose, develop and transfer these models to optimize tomorrow's medicine.
Axis 4.2Example of education
"Currently, there is a gap between the school world and the rest of society. Society is evolving and taking a digital turn at a pace that schools are struggling to keep up with. ... if AI becomes more and more a part of our daily lives, will schools be able to afford to ignore it?"
The answer is of course no: AI has a real role to play in schools. It is obviously not about replacing teachers, but about facilitating their task for a richer educational experience for them as well as for their students. AI will allow for example:
- an assessment partly integrated into the teaching itself and largely automated, thus freeing the teacher from time-consuming spot checks and allowing him or her to focus time on tasks that are more rewarding for the students;
- a better understanding of the instructional process through the ongoing collection of data on the evolution of students' mastery of knowledge;
- Instruction tailored to each student (School of One effect) and individual support for struggling students (smart tutors).
To achieve such improvements, it is essential to conduct work that brings AI specialists and education specialists together in joint projects.
Axis 4.3Example of the creation
In the field of AI for creation, there is relatively little work today attempting to understand creative intelligence, the notion at the heart of this application example. The study of this new paradigm is essential through two aspects. On the one hand, it aims to understand creativity, the mechanism that so fundamentally distinguishes human beings from other branches of the animal kingdom. On the other hand, its objective is to be able to model cognitive and perceptual phenomena, which are still particularly elusive. The growing interest in these issues is reflected in the increasing use of generative systems by a wide variety of researchers from different backgrounds (from industry to fundamental science). To this end, music provides a fertile framework for developing our understanding of the creative mechanisms of intelligence.
In fact, the mechanisms of creativity, especially in musical improvisation and audio synthesis, bring together challenging theoretical questions and cognitive processes that are difficult to model. More precisely, the notion of musical time is a primordial component, inseparable from a music that develops on multiple scales. Thus, through the understanding of musical creativity, we find most of the current challenges in the field of machine learning: the issue of temporality, multimodal information, data sparsity, hierarchical structures, and lack of formal goal. These issues are also reflected in a very wide range of research areas. The goal of this axis is therefore to provide new answers to the field of AI through a two-pronged approach. First, to make new discoveries by exploiting the latest advances in artificial intelligence for music data. Second, by applying these innovative methods to other research areas through partnerships in the field of environmental perception and monitoring, which face the same scientific obstacles.