fig4

Figure 4. Statistical analysis of the training dataset. (A) The distribution of seven crystal systems, with cubic being the most common (3,847 structures), followed by tetragonal (2,055 structures), while triclinic is the least one (199 structures); (B) Distribution of range of number of atoms in the primitive cell (1-160 atoms) across the dataset; (C) Elemental distribution that illustrates the frequency of 84 distinct elements. The dataset encompasses transition metals, main group elements, and rare earth elements, with oxygen showing the highest frequency.