Overview
Traditionally, building machine learning models in different domains requires collecting and labeling large amounts of domain-specific data to train individual models for each task. This approach is resource-intensive and inflexible to adapt to new tasks and domains. However, the foundation model approach develops a single model on massive amounts of diverse and unlabeled data, allowing it to learn general language understanding capabilities that can be adapted to various domains with a much smaller labeled data requirement.
The domain adaptation process leverages the pre-trained foundation model's knowledge and fine-tunes it on task-specific data, resulting in faster development and deployment of high-performance models. This paradigm shift towards foundation models represents a significant step forward in natural language processing, enabling more efficient, scalable, and adaptable AI solutions.
The Foundation Models program is Phase 2 of the AI for Everyone Program. In Phase 1, we produced open-source tools and datasets, including eight models and four publicly available datasets. One of the models, WangchanBERTa, has already gained significant traction with 1.2 million downloads. Phase 1 concluded in April 2023, and we have launched Phase 2 of the program in May 2023. The Foundation Models program applies state-of-the-art natural language processing techniques to various business domains, including finance, healthcare, retail, and legal. The program aims to develop a small set of core foundation models and toolsets to adapt them to domain-specific use cases. We also plan to facilitate the development of proof-of-concept applications.
Subprograms
The program consists of three subprograms.
- Core Model Development: The Core Model Development subprogram aims to develop and improve the multilingual foundation models used in the program. Tasks include exploring new architectures, applying new unsupervised learning methods, and leveraging the concept of cross-lingual knowledge transfer.
- Domain Adaptation: The Domain Adaptation subprogram focuses on adapting the foundation models to specific domains such as finance, healthcare, retail, and legal. This involves exploring techniques such as fine-tuning the models on domain-specific data to improve their performance on tasks within that domain, applying a low-rank adapter to reduce the domain adaptation cost, and employing knowledge distillation techniques to reduce the model size before applying to a specific task.
- Talent Cultivation: This subprogram aims to develop and nurture a multidisciplinary community of scientists and engineers with expertise in machine learning and domain-specific knowledge. Through various initiatives, including internships, workshops, and hackathons, we aim to foster collaboration and knowledge-sharing between individuals from diverse backgrounds, including academia, industry, and non-profit organizations. This approach will lead to more well-rounded researchers and contribute to developing more practical and impactful AI solutions that can address real-world challenges.
FACILITIES
In addition to the IST GPU cluster, which includes 16 nodes of 4 V100 32GB GPUs and two DGX-1 nodes of 8 V100 32GB GPUs, the Foundation Models team will also have access to the following machines:
- 2 nodes of 4 A100 80GB GPUs
- 2 DGX-1 nodes of 8 V100 32GB GPUs