REFERENCES

1. Zhang, J.; Li, J.; Zhao, G.; Wang, Q.; Guo, Y.; Yang, C. Mining solid-state electrolytes from metal-organic framework databases through large language models and representation clustering. J. Am. Chem. Soc. 2025, 147, 40496-506.

2. Moses, I. A.; Barone, V.; Peralta, J. E. Accelerating the discovery of battery electrode materials through data mining and deep learning models. J. Power. Sources. 2022, 546, 231977.

3. Fan, Q.; Min, G.; Liu, L.; Zhao, Y.; Yu, X.; Yun, S. Accelerate the design of new superhard carbon allotropes in Pca21 space group: high-throughput screening and machine learning strategies. Diamond. Relat. Mater. 2024, 143, 110928.

4. Tshitoyan, V.; Dagdelen, J.; Weston, L.; et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 2019, 571, 95-8.

5. M. Bran, A.; Cox, S.; Schilter, O.; Baldassari, C.; White, A. D.; Schwaller, P. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 2024, 6, 525-35.

6. Brown, T. B.; Mann, B.; Ryder, N.; et al. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. Available online: https://doi.org/10.48550/arXiv.2005.14165 (accessed 28 May 2026).

7. Ouyang, L.; Wu, J.; Jiang, X.; et al. Training language models to follow instructions with human feedback. arXiv 2022, arXiv:2203.02155. Available online: https://doi.org/10.48550/arXiv.2203.02155 (accessed 28 May 2026).

8. Schilling-wilhelmi, M.; Ríos-garcía, M.; Shabih, S.; et al. From text to insight: large language models for chemical data extraction. Chem. Soc. Rev. 2025, 54, 1125-50.

9. Wang, X.; Huey, S. L.; Sheng, R.; Mehta, S.; Wang, F. SciDaSynth: interactive structured data extraction from scientific literature with large language model. Campbell. Syst. Rev. 2025, 21, e70073.

10. Chen, H.; Liu, H.; Tew, Y.; Ren, X.; Tang, X.; Wang, X. Distilling knowledge from catalysis literature with long-context large language model agents. ACS. Catal. 2025, 15, 18244-54.

11. Fu, F.; Li, Q.; Wang, F.; et al. Synergizing a knowledge graph and large language model for relay catalysis pathway recommendation. Natl. Sci. Rev. 2025, 12, nwaf271.

12. Bai, X.; He, S.; Li, Y.; et al. Construction of a knowledge graph for framework material enabled by large language models and its application. npj. Comput. Mater. 2025, 11, 51.

13. Ma, Q.; Zhou, Y.; Li, J. Automated retrosynthesis planning of macromolecules using large language models and knowledge graphs. Macromol. Rapid. Commun. 2025, 2500065.

14. Zhang, D.; Chen, Y.; Liu, C.; et al. Accelerating catalyst materials discovery with large artificial intelligence models. Angew. Chem. Int. Ed. 2026, 65, e26150.

15. Zhang, D.; Jia, X.; Wang, Y.; et al. Digital materials ecosystem: from databases to AI agents for autonomous discovery. Chem. Sci. 2026, 17, 5782-804.

16. Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention is all you need. arXiv 2017, arXiv:1706.03762. Available online: https://doi.org/10.48550/arXiv.1706.03762 (accessed 28 May 2026).

17. Bennani, S.; Moslonka, C. A systematic analysis of chunking strategies for reliable question answering. arXiv 2026, arXiv:2601.14123. Available online: https://doi.org/10.48550/arXiv.2601.14123 (accessed 28 May 2026).

18. Allamraju, A.; Chitale, M. P.; Adibhatla, H. S.; Mishra, R.; Shrivastava, M. Breaking it down: domain-aware semantic segmentation for retrieval augmented generation. arXiv 2025, arXiv:2512.00367. Available online: https://doi.org/10.48550/arXiv.2512.00367 (accessed 28 May 2026).

19. Narimissa, E.; Raithel, D. Exploring information retrieval landscapes: an investigation of a novel evaluation techniques and comparative document splitting methods. arXiv 2024, arXiv:2409.08479. Available online: https://doi.org/10.48550/arXiv.2409.08479 (accessed 28 May 2026).

20. Jiang, X.; Wang, W.; Tian, S.; Wang, H.; Lookman, T.; Su, Y. Applications of natural language processing and large language models in materials discovery. npj. Comput. Mater. 2025, 11, 79.

21. Yong, G.; Jeon, K.; Gil, D.; Lee, G. Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model. Comput. Aided. Civil. Infrastruct. Eng. 2023, 38, 1536-54.

22. Wei, J.; Wang, X.; Schuurmans, D.; et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv 2022, arXiv.2201.11903. Available online: https://doi.org/10.48550/arXiv.2201.11903 (accessed 28 May 2026).

23. Gupta, T.; Zaki, M.; Krishnan, N. M. A.; Mausam. MatSciBERT: a materials domain language model for text mining and information extraction. npj Comput. Mater. 2022, 8, 102.

24. Hu, E. J.; Shen, Y.; Wallis, P.; et al. LoRA: low-rank adaptation of large language models. arXiv 2022, arXiv.2106.09685. Available online: https://doi.org/10.48550/arXiv.2106.09685 (accessed 28 May 2026).

25. Li, S.; Wei, S.; Huang, C.; Zhang, Y.; Zhang, G.; Sun, S. Extracting and reconstructing knowledge in materials science literature using large language models. Commun. Mater. 2026, 7, 31.

26. Zhang, D.; Jia, X.; Tran, H. B.; et al. “DIVE” into hydrogen storage materials discovery with AI agents. Chem. Sci. 2026, 17, 3031-42.

27. Gu, J.; Jiang, X.; Shi, Z.; et al. A survey on LLM-as-a-judge. The. Innovation. 2026, 101253.

28. Leaman, R.; Islamaj, R.; Adams, V.; et al. Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. Database 2023, 2023, baad005.