Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
锂离子电池硅酸盐正极材料的晶系分类预测
晶系分类
机器学习
锂离子电池
晶系分类机器学习 锂离子电池
dengb@dp.tech
发布于 2023-11-09
推荐镜像 :Basic Image:ubuntu20.04-py3.10
推荐机型 :c2_m4_cpu
Feature engineering and selection
Observe datatypes
Categorical encoding methods
1. One Hot Encoding
2. Encode to ordinal variables
3. Feature Hashing
4. Other methods
'Materials Id' column
'Has Bandstructure' column
'Spacegroup' column
'Formula' column
'Crystal System' column
Test performance

Feature engineering and selection

Physical and chemical properties of the Lithium-ion silicate cathodes are used to predict the crystal structure of a Lithium-ion battery as monoclinic, orthorhombic and triclinic. This case study demonstrates how feature engineering improves the classification results. See the Li-Ion Feature Engineering case study for additional information.

代码
文本

Background: Lithium-ion batteries are commonly used for portable electronics, electric vehicles, and aerospace applications. During discharge, Lithium ions move from the negative electrode through an electrolyte to the positive electrode to create a voltage and current. During recharging, the ions migrate back to the negative electrode. The crystal structure (monoclinic, orthorhombic, triclinic) is available for 339 different chemicals that contain Li-ion.

Lithium-ion Chemical Properties and Crystal Structure Data

url = 'http://apmonitor.com/pds/uploads/Main/lithium_ion.txt'

Objective: Predict the crystal structure type (monoclinic, orthorhombic, triclinic) from Lithium-ion physical and chemical compound information.

This tutorial covers the following:

  • Categorical transformation techniques
  • Feature creation
  • Feature selection
代码
文本
[62]
!pip install pandas
!pip install seaborn
!pip install scikit-learn
!pip install matplotlib
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pandas in /opt/mamba/lib/python3.10/site-packages (1.5.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/mamba/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /opt/mamba/lib/python3.10/site-packages (from pandas) (2022.7.1)
Requirement already satisfied: numpy>=1.21.0 in /opt/mamba/lib/python3.10/site-packages (from pandas) (1.24.2)
Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: seaborn in /opt/mamba/lib/python3.10/site-packages (0.13.0)
Requirement already satisfied: matplotlib!=3.6.1,>=3.3 in /opt/mamba/lib/python3.10/site-packages (from seaborn) (3.8.1)
Requirement already satisfied: pandas>=1.2 in /opt/mamba/lib/python3.10/site-packages (from seaborn) (1.5.3)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in /opt/mamba/lib/python3.10/site-packages (from seaborn) (1.24.2)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (3.1.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (1.4.5)
Requirement already satisfied: cycler>=0.10 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (0.12.1)
Requirement already satisfied: packaging>=20.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (23.0)
Requirement already satisfied: contourpy>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (1.2.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (4.44.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (2.8.2)
Requirement already satisfied: pillow>=8 in /opt/mamba/lib/python3.10/site-packages (from matplotlib!=3.6.1,>=3.3->seaborn) (10.1.0)
Requirement already satisfied: pytz>=2020.1 in /opt/mamba/lib/python3.10/site-packages (from pandas>=1.2->seaborn) (2022.7.1)
Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.3->seaborn) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: scikit-learn in /opt/mamba/lib/python3.10/site-packages (1.3.2)
Requirement already satisfied: numpy<2.0,>=1.17.3 in /opt/mamba/lib/python3.10/site-packages (from scikit-learn) (1.24.2)
Requirement already satisfied: scipy>=1.5.0 in /opt/mamba/lib/python3.10/site-packages (from scikit-learn) (1.10.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/mamba/lib/python3.10/site-packages (from scikit-learn) (3.2.0)
Requirement already satisfied: joblib>=1.1.1 in /opt/mamba/lib/python3.10/site-packages (from scikit-learn) (1.3.2)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: matplotlib in /opt/mamba/lib/python3.10/site-packages (3.8.1)
Requirement already satisfied: packaging>=20.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (23.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (3.1.1)
Requirement already satisfied: numpy<2,>=1.21 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.24.2)
Requirement already satisfied: cycler>=0.10 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: contourpy>=1.0.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (4.44.0)
Requirement already satisfied: pillow>=8 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (10.1.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/mamba/lib/python3.10/site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: six>=1.5 in /opt/mamba/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本
[63]
try:
import chemparse
except:
!pip install chemparse
print('May need to restart kernel to use chemparse')
代码
文本
[64]
# Import libraries and data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import FeatureHasher
#from sklearn.metrics import confusion_matrix,plot_confusion_matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.tree import DecisionTreeClassifier

代码
文本
[65]
# Load and display data
url = 'http://apmonitor.com/pds/uploads/Main/lithium_ion.txt'
data = pd.read_csv(url)
data.sample(20)
Materials Id Formula Spacegroup Formation Energy (eV) E Above Hull (eV) Band Gap (eV) Nsites Density (gm/cc) Volume Has Bandstructure Crystal System
94 mp-766984 Li2Fe(Si2O5)3 P21 -2.890 0.095 0.332 48 2.555 621.599 True monoclinic
122 mp-763385 Li2Co(Si2O5)2 P21/c -2.858 0.075 3.254 68 2.429 943.786 False monoclinic
132 mp-763500 LiCoSiO4 P21/c -2.341 0.095 0.892 28 3.840 273.243 True monoclinic
306 mp-772589 Li2Fe(Si2O5)2 P1 -2.911 0.064 3.079 34 2.633 431.422 False triclinic
46 mp-767077 Li5Fe(SiO4)2 C2 -2.677 0.014 2.466 16 2.616 174.413 True monoclinic
220 mp-863888 Li2Fe(SiO3)2 Pmn21 -2.730 0.076 2.624 66 3.023 731.236 True orthorhombic
186 mp-762581 LiFeSiO4 Pn21a -2.604 0.018 2.961 28 2.890 355.979 True orthorhombic
117 mp-779186 Li3Co2(SiO4)2 P21 -2.431 0.064 0.022 30 3.032 353.617 True monoclinic
275 mp-850159 Li2Mn(Si2O5)2 P1 -2.958 0.054 3.036 34 2.633 430.361 True triclinic
296 mp-761820 LiFeSi3O8 P1 -2.886 0.041 3.160 26 2.703 337.873 True triclinic
61 mp-762613 Li2Fe2Si2O7 P21/c -2.598 0.051 3.159 52 3.149 619.645 True monoclinic
155 mp-761666 Li3Mn(Si2O5)3 Pcmn -2.968 0.055 1.154 100 2.760 1165.318 False orthorhombic
148 mp-761776 LiMn(SiO3)2 Pbca -2.824 0.036 0.037 80 3.343 850.626 False orthorhombic
302 mp-780681 LiFe3(SiO4)2 P1 -2.468 0.058 0.631 42 3.133 570.329 True triclinic
141 mp-849238 Li2MnSiO4 Pmnb -2.695 0.010 2.882 32 2.970 359.824 True orthorhombic
204 mp-762703 LiFeSiO4 P21nb -2.566 0.055 2.630 28 2.882 356.872 True orthorhombic
313 mp-868319 Li5Fe5Si7O24 P1 -2.646 0.072 2.598 41 2.546 583.402 True triclinic
207 mp-762570 Li3FeSi2O7 Pbnm -2.691 0.057 2.512 52 2.890 562.749 True orthorhombic
291 mp-761459 LiFeSi3O8 P1 -2.896 0.032 3.342 26 2.760 330.953 False triclinic
168 mp-775156 LiMnSiO4 Pbca -2.595 0.082 1.267 56 2.994 683.102 True orthorhombic
代码
文本

Observe datatypes

代码
文本
[66]
data.dtypes
Materials Id              object
Formula                   object
Spacegroup                object
Formation Energy (eV)    float64
E Above Hull (eV)        float64
Band Gap (eV)            float64
Nsites                     int64
Density (gm/cc)          float64
Volume                   float64
Has Bandstructure           bool
Crystal System            object
dtype: object
代码
文本
[67]
# Separate into numerical features that don't need preprocessing, and categorical features that need to be transformed
num_feat = data.select_dtypes(include=['int64','float64']).columns
cat_feat = data.select_dtypes(include=['object','bool']).columns
代码
文本
[68]
data[cat_feat].describe()
Materials Id Formula Spacegroup Has Bandstructure Crystal System
count 339 339 339 339 339
unique 339 114 44 2 3
top mp-849394 LiFeSiO4 P1 True monoclinic
freq 1 42 72 274 139
代码
文本

Categorical encoding methods

1. One Hot Encoding

Method: Encode each category value into a binary vector, with size = # of distinct values. See https://towardsdatascience.com/understanding-feature-engineering-part-2-categorical-data-f54324193e63

Example: Has Bandstructure column has 2 distinct values, True and False. Create a new column where 1 = True and 0 = False.

Pros: simple and rugged method to get categorical features into unique and useful numerical features

Cons: m unique values results in m unique new features. This is fine when there are only 2-3 unique values (such as hi/lo, yes/no), but creates issues when there are more. Can't handle new categories that weren't in training data, and easily overfit. Sparse data.

2. Encode to ordinal variables

Method: assign each unique value to a unique number.

Example: Spacegroup = Pc is assigned to 0, Spacegroup = P21/c is assigned to 1, etc.

Pros: simple and quick, 1 column in -> 1 column out

Cons: residual "structure" (number assigned is arbitrary, and it leads algorithms to assume that a Spacegroup with a value of 20 is higher value than a Spacegroup of value 1)

3. Feature Hashing

Method: Encode each unique category into a non-binary vector

Example: Spacegroup = Pc is assigned to [1,0,0], Spacegroup = P21/c is assigned to [1,2,-1], etc. Specify number of columns (length of vector)

Pros: low dimensionality so really efficient.

Cons: potential collisions (for example the 1st value in example has both Spacegroups sharing a '1'); hashed features aren't interpretable so can't be used in feature importance.

4. Other methods

Primarily involve prior knowledge about dataset. Encode with own algorithm to include closely related features.

Variation on One Hot Encoding for large numbers of unique values: classify infrequent instances into "rare" category. May lose some granularity and important info, but also allows for new categories that aren't in training data

代码
文本

'Materials Id' column

代码
文本
[69]
data['Materials Id'].describe()
count           339
unique          339
top       mp-849394
freq              1
Name: Materials Id, dtype: object
代码
文本

339 unique values for 339 unique entries; there is no useful information in this column and it can be dropped

代码
文本
[70]
data.drop(columns=['Materials Id'],inplace=True)
data.columns
Index(['Formula', 'Spacegroup', 'Formation Energy (eV)', 'E Above Hull (eV)',
       'Band Gap (eV)', 'Nsites', 'Density (gm/cc)', 'Volume',
       'Has Bandstructure', 'Crystal System'],
      dtype='object')
代码
文本

'Has Bandstructure' column

代码
文本
[71]
data['Has Bandstructure'].value_counts().plot(kind='bar')
代码
文本

2 unique values, True and False. Classic example of when to use one-hot encoding

代码
文本
[72]
# One-hot encode 'Has Bandstructure'
data['Has Bandstructure'] = data['Has Bandstructure'].map({True:1, False:0})
代码
文本

'Spacegroup' column

代码
文本
[73]
data['Spacegroup'].value_counts().plot(kind='bar')
print(data['Spacegroup'].nunique())
代码
文本

44 unique values, with most of them occuring multiple times

Option 1: One-hot encoding will result in 44 new feature columns; inefficient and memory-intensive.

Option 2: Encode to ordinal numbers. Will possibly work, but does leave a residual structure that may affect model performance

Option 3: Use Feature Hashing to create a vector representation of each unique Spacegroup. Note that if vector size = 44, it's the same as one-hot encoding, and if vector size = 1, it's the same as encoding to ordinal variables. Use vector size = 3 for this

代码
文本
[74]
# Option 1: One-hot encoding (not used)
pd.get_dummies(data['Spacegroup'])
C2 C2/c C2/m C222 C2221 C2cm Cc Ccme Cmce Cmcm ... Pcmn Pm21n Pmc21 Pmn21 Pmnb Pn21a Pna21 Pnc2 Pnca Pnma
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
334 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
335 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
336 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
337 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
338 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

339 rows × 44 columns

代码
文本
[75]
# Option 2: Ordinal number encoding
data['Spacegroup (ordinal)'] = pd.factorize(data['Spacegroup'])[0]

# Can also order by most common to least common
data['Spacegroup'].rank(method="dense").astype(int)
0      32
1      22
2       7
3       2
4       2
       ..
334    17
335    17
336    17
337    17
338    17
Name: Spacegroup, Length: 339, dtype: int64
代码
文本
[76]
# Option 3: Feature Hashing
n = 3
fh = FeatureHasher(n_features=n, input_type='string')
#hashed_tag = fh.fit_transform(data['Spacegroup']).toarray()
hashed_tag = fh.fit_transform([[x] for x in data['Spacegroup']]).toarray()
ht_df = pd.DataFrame(hashed_tag)
#ht_df.columns = ['Spacegroup'+str(i) for i in range(n)]
ht_df.columns = ['Spacegroup'+str(i)+'_ht' for i in range(n)]

data = data.join(ht_df)
代码
文本
[77]
print(data)
             Formula Spacegroup  Formation Energy (eV)  E Above Hull (eV)  \
0          Li2MnSiO4         Pc                 -2.699              0.006   
1          Li2MnSiO4      P21/c                 -2.696              0.008   
2         Li4MnSi2O7         Cc                 -2.775              0.012   
3       Li4Mn2Si3O10       C2/c                 -2.783              0.013   
4       Li2Mn3Si3O10       C2/c                 -2.747              0.016   
..               ...        ...                    ...                ...   
334     Li6Co(SiO4)2         P1                 -2.545              0.071   
335     LiCo3(SiO4)2         P1                 -2.250              0.076   
336  Li5Co4(Si3O10)2         P1                 -2.529              0.082   
337         LiCoSiO4         P1                 -2.348              0.087   
338    Li3Co2(SiO4)2         P1                 -2.406              0.090   

     Band Gap (eV)  Nsites  Density (gm/cc)   Volume  Has Bandstructure  \
0            3.462      16            2.993  178.513                  1   
1            2.879      32            2.926  365.272                  1   
2            3.653      28            2.761  301.775                  1   
3            3.015      38            2.908  436.183                  1   
4            2.578      36            3.334  421.286                  1   
..             ...     ...              ...      ...                ...   
334          2.685      17            2.753  171.772                  1   
335          0.005      42            3.318  552.402                  1   
336          0.176      35            2.940  428.648                  1   
337          1.333      14            2.451  214.044                  1   
338          0.323      15            3.043  176.207                  0   

    Crystal System  Spacegroup (ordinal)  Spacegroup0_ht  Spacegroup1_ht  \
0       monoclinic                     0             0.0             0.0   
1       monoclinic                     1             1.0             0.0   
2       monoclinic                     2             1.0             0.0   
3       monoclinic                     3             1.0             0.0   
4       monoclinic                     3             1.0             0.0   
..             ...                   ...             ...             ...   
334      triclinic                    43             0.0             1.0   
335      triclinic                    43             0.0             1.0   
336      triclinic                    43             0.0             1.0   
337      triclinic                    43             0.0             1.0   
338      triclinic                    43             0.0             1.0   

     Spacegroup2_ht  
0               1.0  
1               0.0  
2               0.0  
3               0.0  
4               0.0  
..              ...  
334             0.0  
335             0.0  
336             0.0  
337             0.0  
338             0.0  

[339 rows x 14 columns]
代码
文本

For now, keep both sets of new features, and we'll see which one performs better

代码
文本

'Formula' column

代码
文本
[78]
data['Formula'].value_counts()
LiFeSiO4           42
LiCoSiO4           29
Li2FeSiO4          15
Li2CoSiO4          14
Li2MnSiO4          12
                   ..
Li3Co2Si3O10        1
Li10Co(SiO5)2       1
Li4Co2Si3O10        1
Li2FeSi4O11         1
Li5Co4(Si3O10)2     1
Name: Formula, Length: 114, dtype: int64
代码
文本

114 unique values, most only occuring once. One-hot encoding is out of the question

Option 1,2,3: one-hot encoding, ordinal number encoding, and feature hashing all become inefficient with such variety.

Option 4: Use domain knowledge to create additional features. For example, we can look at the LiFeSiO4 formula, and turn it into 4 new columns, each one indicating how many of each atom are in the formula (for example, {Li: 1, Fe: 1, Si: 1, O: 4})

代码
文本
[ ]
# Option 4: use chemparse package to create the new features of atom counts
chem_data = data['Formula'].apply(chemparse.parse_formula)

# Convert the dictionary into a dataframe and fill NaN's with zero's
chem_data = pd.json_normalize(chem_data)
chem_data = chem_data.fillna(0)
#chem_data = chem_data.add_suffix('_chem') # specify suffix for column names
data = data.join(chem_data)
代码
文本

'Crystal System' column

This is the target column, and there are 3 different types of crystal structures we're trying to classify. To properly transform this to numerical data, we have to understand if we are working on a multi-class problem or a multi-label problem.

  • A multi-class problem is one in which there is only one distinct type of classification for each row. For example, a fruit is either an apple or an orange, but cannot be both. For a multi-class problem, the target value should be a single value, such as a 0 for apple and 1 for orange. In other words, it would be encoded to ordinal numbers.
  • A multi-label problem is one in which there are possibly multiple labels for each row. For example, classifying pictures of apples and oranges can include a picture of an apple alone, an orange alone, or both an apple and an orange. For a multi-label problem, the target value should be a vector representation, such as [1,0] for apple, [0,1] for orange, and [1,1] for both apple and orange. In other words, we would have to one-hot encode the target feature.

Since the crystal system structure is unique, this is a multi-class problem. The target output should be encoded to a 0, 1, or 2. If it were a multi-label problem, the target output would have to be encoded to a vector of length 3.

代码
文本
[80]
# Encode Crystal System to ordinal values for multi-class problem
labelencoder = LabelEncoder() #initializing an object of class LabelEncoder
data['Crystal System (#)'] = labelencoder.fit_transform(data['Crystal System'])

# For a multi-label problem, use one-hot encoding
data[['monoclinic','orthorhombic','triclinic']] = pd.get_dummies(data['Crystal System'])
代码
文本
[81]
# Check for balance
data['Crystal System'].value_counts().plot(kind='bar')
代码
文本
[82]
# Save new features in dataframe
data.to_csv('lithium_ion_data.csv',index=False)
代码
文本

Test performance

代码
文本
[83]
# All new numerical features (Crystal System excluded, since it's int32)
features = list(data.select_dtypes(include=['int64','float64']).columns.values)

ord_feat = ['Formation Energy (eV)','E Above Hull (eV)','Band Gap (eV)',
'Nsites','Density (gm/cc)','Volume','Has Bandstructure',
'Spacegroup (ordinal)','Li','Mn','Si','O','Fe','Co'
]

hash_feat = ['Formation Energy (eV)','E Above Hull (eV)','Band Gap (eV)',
'Nsites','Density (gm/cc)','Volume','Has Bandstructure',
'Spacegroup0','Spacegroup1','Spacegroup2',
'Li','Mn','Si','O','Fe','Co'
]

labels = ['Crystal System (#)']
代码
文本
[84]
plt.figure(figsize=(16,5))

titles = ['Original Numerical Features Only',
'With Encoded Features\n(Ordinal Spacegroup)',
'With Encoded Features\n(Hashed Spacegroup)'
]

for i, feat in enumerate([num_feat,ord_feat,hash_feat]):
X = data[feat]
y = data[labels]

# 80% training data and 20% testing
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2)

dtree = DecisionTreeClassifier()
dtree.fit(Xtrain,ytrain)
yp = dtree.predict(Xtest)

# Plot confusion matrix
plt.subplot(1,3,i+1)
cm = confusion_matrix(ytest,yp)
sns.heatmap(cm,annot=True)
plt.title(titles[i])

plt.savefig('li-ion.png')
代码
文本
[ ]

代码
文本
晶系分类
机器学习
锂离子电池
晶系分类机器学习 锂离子电池
点个赞吧
本文被以下合集收录
机器学习
bohrb060ec
更新于 2024-07-18
17 篇0 人关注
推荐阅读
公开
AI+电芯 | 基于LSTM和GRU模型的SOH预测¶
AI+电芯中文锂电池
AI+电芯中文锂电池
JiaweiMiao
发布于 2023-09-14
2 赞2 转存文件
公开
全固态电池:新的锂电池技术以及电化学仿真如何构建新的固态仿真模型
中文电池
中文电池
KaiqiYang
发布于 2023-10-19
5 赞7 转存文件