Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
assignment_2024副本
python
Question Answering
Machine Learning
pythonQuestion AnsweringMachine Learning
李心慧-化院-2200011701
更新于 2024-07-03
推荐镜像 :Basic Image:bohrium-notebook:2023-04-07
推荐机型 :c2_m4_cpu
ml-intro(v2)

AI化学上机实操(入门)

Introduction to Scientific Programming and Machine Learning Application in Chemistry

Assignment

代码
文本

Part 1

Plot the speed distribution for ideal gas particles as required by the instructions below. Answer the following questions.

The speed(v) probability distribution P(v) for gas particles follows Maxwell-Boltzmann distribution.

where v is the speed in m/s, M is the gas particle's molar mass in kg/mol and R is ideal gas constant (R=8.314 J/(mol \dot K)) and T is temperature in Kelvin.

  • (1) Plot P(v) for Ne gas from 0~2000 m/s at room temperature. Set the line color to be "purple". Set the x-axis label to be "speed v (m/s)", the y-axis label to be "P(v)", the graph title to be "Speed Probability Distribution". Set the line legend to be "Ne, T=XXX K". Fill the area under the plot with the same color as the line but with alpha=0.3 transparancy. (Make sure the number of points you generate afford a smooth line.)
    Hint: use np.pi and scipy.constants.R to get the scientific constant values.

  • (2) Plot P(v) for CO2 gas on the same figure. Set the line color to be a different color from Ne. Set the line legend to be "CO2, T=XXX K". Fill the area under the plot with the same color as the line but with alpha=0.3 transparancy.

  • (3) Calculate the area under each line (the integration for each P(v)) and print them out in the following format:


    area under _ gas name _ P(v) at T=XXX K = (_ integral value _)

    Are the values of the two gasses equal or not? What are the values? Rationalize the physical meaning of the integration values.

  • (4) What is the most probable speed for Ne at room temperature based on the speed distribution?
    Hint: consider the useful functions: np.argmax(). Note: np.argmax() returns the index of the argument (v in this case), so you need to use v[np.argmax()] to get the argument value corresponding to the maximun function value.

  • (5) Calculate the root mean square speed of Ne at room temperature using the formula:
    and print out:
    rms speed = value with 1 decimal places
    Hint: useful function: np.sqrt()
    Without any calculation, predict if CO2's root mean square speed is greater than, smaller than or equal to that of Ne? Why or why not?

  • (6) Challenge Qeustion The root mean sqaure speed formula in (5) is derived as follows:
    Compute of Ne at room temperature by carrying out the above integral computation. Compare the values you obtained in this question with that in question (5). What are the similarities and differences?

back to top

代码
文本
[21]
#Enter your code for Part 1 Assignment below.
#
import numpy as np
import matplotlib.pyplot as plt
import scipy.constants as sc
import scipy.integrate as integrate
import math
from scipy.integrate import trapz, simps

#print(sc.R)
print(sc.R)
#define Boltzmann distrubution function P
def Boltzmanndistrubutiion_P(v, T, M):

return 4*np.pi*(v**2)*math.pow(M/(2*(np.pi)*sc.R*T),3/2)*np.exp(-M*(v**2)/(2*sc.R*T))

# M is molar mass of gas particle in kg/mol
M_Ne = 0.020 #Neon molar mass kg/mol
M_CO2 = 0.044 #CO2 molar mass kg/mol

# set temperature
T= 298

# generate 1000 points for v between 0~2000 m/s
v = np.arange(0,2000,2)


xnew_Ne = Boltzmanndistrubutiion_P(v, T,M_Ne)
xnew_CO2 = Boltzmanndistrubutiion_P(v, T,M_CO2)

# (1) Plot P(v) for Ne
plt.plot(v,xnew_Ne,color='orange', label="Ne, T=298 K")
plt.fill_between(v,xnew_Ne, alpha=0.3, color='orange')
#(2) Plot P(v) for CO2
plt.plot(v,xnew_CO2,color='blue', label="CO2, T=298 K")
plt.fill_between(v,xnew_CO2, alpha=0.3, color='blue')
plt.xlim(0, 2000)
plt.ylim(0.0000, 0.0030)
plt.xlabel('speed v (m/s)')
plt.ylabel('P(v)')
plt.title('Speed Probability Distrubution', fontsize=12)
#(3) Print the area under each line

integral = integrate.quad(lambda v:Boltzmanndistrubutiion_P(v, T,M_Ne),0,2000)
print('the integral for Ne is', round(integral[0],2))
integral = integrate.quad(lambda v:Boltzmanndistrubutiion_P(v, T,M_CO2),0,2000)
print('the integral for CO2 is', round(integral[0],2))

#(4) calculate the v_mp (most probable) for Ne at 298K
v_mp = v[np.argmax(xnew_Ne)]
print("the vmp of Ne is",v_mp)

#(5) calculate the rms speed of Ne from the forumla
v_rms_Ne_1 = np.sqrt(3*sc.R*T/M_Ne)
print("vrms for Ne_1 is ", v_rms_Ne_1)
#(6) calculate the rms speed of Ne from integration
v_rms_Ne_2 = np.sqrt(integrate.quad(lambda v:v*v*Boltzmanndistrubutiion_P(v,T,M_Ne),0,np.inf))
print("vrms for Ne_2 is ", v_rms_Ne_2)
plt.legend()
plt.show()
8.314462618
the integral for Ne is 1.0
the integral for CO2 is 1.0
the vmp of Ne is 498
vrms for Ne_1 is  609.6363498222527
vrms for Ne_2 is  [6.09636350e+02 4.66515075e-02]
代码
文本

Part 2

Import the data file ROH_data.csv containing data on simple alcohols and train a random forest algorithm to predict whether or not an alcohol is aliphatic. Remember to split the data set using train_test_split() and evaluate the quality of the predictions.

代码
文本
[23]
# Imports
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load the data
data = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/ROH_data.csv')
target = data['aliphatic']
features = data.drop('aliphatic', axis=1)
X_train, X_test, y_train, y_test = train_test_split(
features, target, test_size=0.25, random_state=18)
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(X_train, y_train)
rf.predict(X_test)
rf.score(X_test, y_test)
from sklearn.metrics import confusion_matrix
conf_matrix = confusion_matrix(y_test, rf.predict(X_test))
conf_matrix
import seaborn as sns

sns.heatmap(conf_matrix, annot=True, cmap='Blues')
plt.xlabel('True Value')
plt.ylabel('Predicted Value');
代码
文本

Part 3

  1. At the end of 2.2.1. Dimensional Reduction , you reduced the wine data into two dimensions.
  • Plot the scattered plot of the wine data in the two principle components but do not label the data points by the type (wine_y). -
  • Use DBSCAN algorithm to study the 2-dimensional wine data and label the types. Show the DBSCAN types with color map. (Hint: adjust the eps around 0.55.)
代码
文本
[26]
#write your code here
import numpy as np

import matplotlib.pyplot as plt
# Pandas is powerful and easy to use Python library for opening source data analysis and manipulation...
import pandas as pd
import sklearn
from sklearn.datasets import load_wine
wine = load_wine(as_frame=True)
wine
# Extract the feature data only and save as a dataframe
wine_X = wine['data']

# Extract the target data only and save as a dataframe
wine_y = wine['target']

# Extract the feature and target data together and save as a dataframe
wine_df = wine['frame']

wine_df
from sklearn.preprocessing import StandardScaler
SS = StandardScaler()
Wine_X_ss = SS.fit_transform(wine_X)
Wine_X_ss
import numpy as np
import matplotlib.pyplot as plt
# Pandas is powerful and easy to use Python library for opening source data analysis and manipulation...
import pandas as pd
import sklearn
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
trans_data = pca.fit_transform(Wine_X_ss)
trans_data.shape #now the number of column is reduced to 2
plt.scatter(trans_data[:,0], trans_data[:,1], cmap="Accent")


plt.xlabel('principal component 1')
plt.ylabel('principal component 2')

from sklearn.cluster import DBSCAN
DB_wine = DBSCAN(eps=0.55, min_samples=5)
wine_db=DB_wine.fit(trans_data)
DB_wine.labels_.shape

plt.scatter(trans_data[:,0], trans_data[:,1], c = DB_wine.labels_);
/tmp/ipykernel_99/228043796.py:34: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  plt.scatter(trans_data[:,0], trans_data[:,1],  cmap="Accent")
代码
文本
    • Identify what molecule each component in 2.2.3 Blind Signal Separation is . The mixtures are composed of acetone, cyclohexane, methanol and toluene in random ratios.
    • Does the extracted single-component's spectrum match the corresponding pure chemical spectrum excactly? Explain.
代码
文本

component1:methanol component2:toluene component3:cyclohexane component4:acetone 应为acetone图谱的拆分与标准谱有些许区别,因为拆分时假定无相互作用,但实际可能有氢键

代码
文本
代码
文本
  1. Open the file titled NMR_mixed_problem.csv which contains three H NMR spectra. Each spectrum (columns) is a mixture of three chemical compounds in different ratios (artificially generated). Use fastICA (set random_state=15) to separate out three pure H NMR spectra of each component. Compare your separated spectra to the pure NMR spectra in NMR_pure_problem.csv.
代码
文本
[27]
#read NMR spectra data from csv file and save in a dataframe
mix_nmr_df = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/NMR_mixed.csv')
mix_nmr_df

sample1 sample2 sample3
0 0.001858 0.000819 0.001190
1 0.001827 0.000809 0.001177
2 0.001863 0.000820 0.001188
3 0.001842 0.000814 0.001178
4 0.001874 0.000823 0.001196
... ... ... ...
13102 0.001835 0.000844 0.001210
13103 0.001832 0.000831 0.001202
13104 0.001867 0.000837 0.001213
13105 0.001859 0.000832 0.001208
13106 0.001873 0.000833 0.001213

13107 rows × 3 columns

代码
文本
[28]
#convert dataframe to numpy array
mix_NMR = mix_nmr_df.to_numpy()
# numpy array needs to be used for plotting lines
mix_NMR.shape

(13107, 3)
代码
文本
[36]
shift_df = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/shift.csv')
shift = shift_df.to_numpy()
shift.shape
(13107, 1)
代码
文本
[37]
#plot IR spectra
fig2 = plt.figure(figsize=(12,6))
#shift=np.linspace(0,10,13107)
#plot NMR spectra for NMR sample 1-3
plt.plot(shift, mix_NMR[:,0],label='sample1')
plt.plot(shift, mix_NMR[:,1],label='sample2')
plt.plot(shift, mix_NMR[:,2],label='sample3')
#ax3.invert_xaxis()
plt.title('Three Mixed NMR Spectra')
plt.xlabel('chemical shift (ppm)')
plt.ylabel('absorbance')

plt.gca().invert_xaxis()

plt.legend()
plt.show()
代码
文本
[38]
from sklearn.decomposition import FastICA
ica = FastICA(n_components=3, random_state=15)
nmr_fit = ica.fit_transform(mix_NMR)
nmr_fit
array([[ 0.00066023, -0.00086398,  0.00061835],
       [ 0.0006635 , -0.00086419,  0.00062081],
       [ 0.00065969, -0.00086678,  0.00061703],
       ...,
       [ 0.00066079, -0.00085538,  0.0006137 ],
       [ 0.00066138, -0.0008554 ,  0.00061573],
       [ 0.00065954, -0.00085494,  0.00061637]])
代码
文本
[42]
fig3, ax3 = plt.subplots(3, 1, figsize=(9, 8))

# Define a function to add common properties to subplots
def plot_component(ax, x, y, title):
ax.plot(x, y)
ax.set_title(title)
ax.set_xlabel('chemical shift (ppm)') # Set x-axis label to "Wavenumbers"
ax.invert_xaxis() # Invert the x-axis

# Plot NMR for each component separated by the ICA algorithm
plot_component(ax3[0], shift, nmr_fit[:, 0], 'Component 1')
plot_component(ax3[1], shift, nmr_fit[:, 1], 'Component 2')
plot_component(ax3[2], shift, nmr_fit[:, 2], 'Component 3')


# Adjust layout
plt.tight_layout()

# Display the plot
plt.show()
代码
文本
[44]
#read NMR spectra data from csv file and save in a dataframe
pure_NMR_df = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/NMR_pure_problem.csv')
#convert dataframe to numpy array
pure_NMR = pure_NMR_df.to_numpy()
# numpy array needs to be used for plotting lines
pure_NMR_df
pure1 pure2 pure3
0 -0.000897 -0.000817 -0.00215
1 -0.000883 -0.000815 -0.00211
2 -0.000904 -0.000809 -0.00216
3 -0.000900 -0.000803 -0.00213
4 -0.000899 -0.000818 -0.00218
... ... ... ...
13102 -0.000956 -0.000842 -0.00209
13103 -0.000922 -0.000841 -0.00210
13104 -0.000923 -0.000844 -0.00215
13105 -0.000911 -0.000844 -0.00214
13106 -0.000908 -0.000846 -0.00217

13107 rows × 3 columns

代码
文本
[47]
import matplotlib.pyplot as matplotlib
fig4, ax4 = plt.subplots(3, 1, figsize=(9, 8))

# Plot pure NMR 1-3 by calling plot_components()
plot_component(ax4[0], shift, pure_NMR[:, 0], 'pure 1')
plot_component(ax4[1], shift, pure_NMR[:, 1], 'pure 2')
plot_component(ax4[2], shift, pure_NMR[:, 2], 'pure 3')
# Adjust layout
plt.tight_layout()

# Display the plot
plt.show()

# save figure to local folder
#plt.savefig('./pure_NMR.tif',dpi=1000)
代码
文本

Compare the NMR of extracted components from mixed nmr spectra with the NMR spectra of pure compounds. How does the ICA algorithm do in seperating the NMR signals?

代码
文本
[ ]

代码
文本

pure3--component1 pure1--component3 pure2--component2 pure2拆分有细微差别,其他都挺好的

代码
文本
python
Question Answering
Machine Learning
pythonQuestion AnsweringMachine Learning
点个赞吧
推荐阅读
公开
assignment_2024_呼默雷
pythonQuestion AnsweringMachine Learning
pythonQuestion AnsweringMachine Learning
呼默雷
更新于 2024-07-04
1 赞
公开
张学丽-第3天-2403-计算材料学实战
2403-计算材料学实战
2403-计算材料学实战
ZXL
发布于 2024-03-11
5 赞1 转存文件