空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

assignment_2024副本

python

Question Answering

Machine Learning

pythonQuestion AnsweringMachine Learning

李心慧-化院-2200011701

更新于 2024-07-03

推荐镜像 :Basic Image:bohrium-notebook:2023-04-07

推荐机型 :c2_m4_cpu

数据集

ml-intro(v2)

AI化学上机实操（入门）

Introduction to Scientific Programming and Machine Learning Application in Chemistry

Assignment

代码

文本

Student Name:

Student ID:

代码

文本

Part 1

Plot the speed distribution for ideal gas particles as required by the instructions below. Answer the following questions.

The speed(v) probability distribution P(v) for gas particles follows Maxwell-Boltzmann distribution.

$P (v) = 4 π v^{2} (\frac{M}{2 π RT})^{\frac{3}{2}} e^{\frac{- M v ^{2}}{2 RT}}$

where v is the speed in m/s, M is the gas particle's molar mass in kg/mol and R is ideal gas constant (R=8.314 J/(mol \dot K)) and T is temperature in Kelvin.

(1) Plot P(v) for Ne gas from 0~2000 m/s at room temperature. Set the line color to be "purple". Set the x-axis label to be "speed v (m/s)", the y-axis label to be "P(v)", the graph title to be "Speed Probability Distribution". Set the line legend to be "Ne, T=XXX K". Fill the area under the plot with the same color as the line but with alpha=0.3 transparancy. (Make sure the number of points you generate afford a smooth line.)
Hint: use np.pi and scipy.constants.R to get the scientific constant values.
(2) Plot P(v) for CO2 gas on the same figure. Set the line color to be a different color from Ne. Set the line legend to be "CO2, T=XXX K". Fill the area under the plot with the same color as the line but with alpha=0.3 transparancy.
(3) Calculate the area under each line (the integration for each P(v)) and print them out in the following format:

area under _ gas name _ P(v) at T=XXX K = (_ integral value _)

Are the values of the two gasses equal or not? What are the values? Rationalize the physical meaning of the integration values.
(4) What is the most probable speed for Ne at room temperature based on the speed distribution?
Hint: consider the useful functions: np.argmax(). Note: np.argmax() returns the index of the argument (v in this case), so you need to use v[np.argmax()] to get the argument value corresponding to the maximun function value.
(5) Calculate the root mean square speed of Ne at room temperature using the formula:
$\overline{v^{2}} = \frac{3 RT}{M}$ and print out:
rms speed = value with 1 decimal places
Hint: useful function: np.sqrt()
Without any calculation, predict if CO2's root mean square speed is greater than, smaller than or equal to that of Ne? Why or why not?
(6) Challenge Qeustion The root mean sqaure speed formula in (5) is derived as follows：
$\overline{v^{2}} = \int_{0}^{\infty} v^{2} P (v) d v$ Compute $\overline{v^{2}}$ of Ne at room temperature by carrying out the above integral computation. Compare the values you obtained in this question with that in question (5). What are the similarities and differences?

代码

文本

[21]

#Enter your code for Part 1 Assignment below.

import numpy as np

import matplotlib.pyplot as plt

import scipy.constants as sc

import scipy.integrate as integrate

import math

from scipy.integrate import trapz, simps

#print(sc.R)

print(sc.R)

#define Boltzmann distrubution function P

def Boltzmanndistrubutiion_P(v, T, M):

return 4*np.pi*(v**2)*math.pow(M/(2*(np.pi)*sc.R*T),3/2)*np.exp(-M*(v**2)/(2*sc.R*T))

# M is molar mass of gas particle in kg/mol

M_Ne = 0.020 #Neon molar mass kg/mol

M_CO2 = 0.044 #CO2 molar mass kg/mol

# set temperature

T= 298

# generate 1000 points for v between 0~2000 m/s

v = np.arange(0,2000,2)

xnew_Ne = Boltzmanndistrubutiion_P(v, T,M_Ne)

xnew_CO2 = Boltzmanndistrubutiion_P(v, T,M_CO2)

# (1) Plot P(v) for Ne

plt.plot(v,xnew_Ne,color='orange', label="Ne, T=298 K")

plt.fill_between(v,xnew_Ne, alpha=0.3, color='orange')

#(2) Plot P(v) for CO2

plt.plot(v,xnew_CO2,color='blue', label="CO2, T=298 K")

plt.fill_between(v,xnew_CO2, alpha=0.3, color='blue')

plt.xlim(0, 2000)

plt.ylim(0.0000, 0.0030)

plt.xlabel('speed v (m/s)')

plt.ylabel('P(v)')

plt.title('Speed Probability Distrubution', fontsize=12)

#(3) Print the area under each line

integral = integrate.quad(lambda v:Boltzmanndistrubutiion_P(v, T,M_Ne),0,2000)

print('the integral for Ne is', round(integral[0],2))

integral = integrate.quad(lambda v:Boltzmanndistrubutiion_P(v, T,M_CO2),0,2000)

print('the integral for CO2 is', round(integral[0],2))

#(4) calculate the v_mp (most probable) for Ne at 298K

v_mp = v[np.argmax(xnew_Ne)]

print("the vmp of Ne is",v_mp)

#(5) calculate the rms speed of Ne from the forumla

v_rms_Ne_1 = np.sqrt(3*sc.R*T/M_Ne)

print("vrms for Ne_1 is ", v_rms_Ne_1)

#(6) calculate the rms speed of Ne from integration

v_rms_Ne_2 = np.sqrt(integrate.quad(lambda v:v*v*Boltzmanndistrubutiion_P(v,T,M_Ne),0,np.inf))

print("vrms for Ne_2 is ", v_rms_Ne_2)

plt.legend()

plt.show()

8.314462618
the integral for Ne is 1.0
the integral for CO2 is 1.0
the vmp of Ne is 498
vrms for Ne_1 is  609.6363498222527
vrms for Ne_2 is  [6.09636350e+02 4.66515075e-02]

代码

文本

Part 2

Import the data file ROH_data.csv containing data on simple alcohols and train a random forest algorithm to predict whether or not an alcohol is aliphatic. Remember to split the data set using train_test_split() and evaluate the quality of the predictions.

代码

文本

[23]

# Imports

import pandas as pd

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

# Load the data

data = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/ROH_data.csv')

target = data['aliphatic']

features = data.drop('aliphatic', axis=1)

X_train, X_test, y_train, y_test = train_test_split(

features, target, test_size=0.25, random_state=18)

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()

rf.fit(X_train, y_train)

rf.predict(X_test)

rf.score(X_test, y_test)

from sklearn.metrics import confusion_matrix

conf_matrix = confusion_matrix(y_test, rf.predict(X_test))

conf_matrix

import seaborn as sns

sns.heatmap(conf_matrix, annot=True, cmap='Blues')

plt.xlabel('True Value')

plt.ylabel('Predicted Value');

代码

文本

Part 3

At the end of 2.2.1. Dimensional Reduction , you reduced the wine data into two dimensions.

Plot the scattered plot of the wine data in the two principle components but do not label the data points by the type (wine_y). -
Use DBSCAN algorithm to study the 2-dimensional wine data and label the types. Show the DBSCAN types with color map. (Hint: adjust the eps around 0.55.)

代码

文本

[26]

#write your code here

import numpy as np

import matplotlib.pyplot as plt

# Pandas is powerful and easy to use Python library for opening source data analysis and manipulation...

import pandas as pd

import sklearn

from sklearn.datasets import load_wine

wine = load_wine(as_frame=True)

wine

# Extract the feature data only and save as a dataframe

wine_X = wine['data']

# Extract the target data only and save as a dataframe

wine_y = wine['target']

# Extract the feature and target data together and save as a dataframe

wine_df = wine['frame']

wine_df

from sklearn.preprocessing import StandardScaler

SS = StandardScaler()

Wine_X_ss = SS.fit_transform(wine_X)

Wine_X_ss

import numpy as np

import matplotlib.pyplot as plt

# Pandas is powerful and easy to use Python library for opening source data analysis and manipulation...

import pandas as pd

import sklearn

from sklearn.decomposition import PCA

pca = PCA(n_components=2)

trans_data = pca.fit_transform(Wine_X_ss)

trans_data.shape #now the number of column is reduced to 2

plt.scatter(trans_data[:,0], trans_data[:,1], cmap="Accent")

plt.xlabel('principal component 1')

plt.ylabel('principal component 2')

from sklearn.cluster import DBSCAN

DB_wine = DBSCAN(eps=0.55, min_samples=5)

wine_db=DB_wine.fit(trans_data)

DB_wine.labels_.shape

plt.scatter(trans_data[:,0], trans_data[:,1], c = DB_wine.labels_);

/tmp/ipykernel_99/228043796.py:34: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  plt.scatter(trans_data[:,0], trans_data[:,1],  cmap="Accent")

代码

文本

- Identify what molecule each component in 2.2.3 Blind Signal Separation is . The mixtures are composed of acetone, cyclohexane, methanol and toluene in random ratios.
- Does the extracted single-component's spectrum match the corresponding pure chemical spectrum excactly? Explain.

代码

文本

component1:methanol component2:toluene component3:cyclohexane component4:acetone 应为acetone图谱的拆分与标准谱有些许区别，因为拆分时假定无相互作用，但实际可能有氢键

代码

文本

Your answer:

代码

文本

Open the file titled NMR_mixed_problem.csv which contains three $^{1}$ H NMR spectra. Each spectrum (columns) is a mixture of three chemical compounds in different ratios (artificially generated). Use fastICA (set random_state=15) to separate out three pure $^{1}$ H NMR spectra of each component. Compare your separated spectra to the pure NMR spectra in NMR_pure_problem.csv.

代码

文本

[27]

#read NMR spectra data from csv file and save in a dataframe

mix_nmr_df = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/NMR_mixed.csv')

mix_nmr_df

	sample1	sample2	sample3
0	0.001858	0.000819	0.001190
1	0.001827	0.000809	0.001177
2	0.001863	0.000820	0.001188
3	0.001842	0.000814	0.001178
4	0.001874	0.000823	0.001196
...	...	...	...
13102	0.001835	0.000844	0.001210
13103	0.001832	0.000831	0.001202
13104	0.001867	0.000837	0.001213
13105	0.001859	0.000832	0.001208
13106	0.001873	0.000833	0.001213

13107 rows × 3 columns

代码

文本

[28]

#convert dataframe to numpy array

mix_NMR = mix_nmr_df.to_numpy()

# numpy array needs to be used for plotting lines

mix_NMR.shape

(13107, 3)

代码

文本

[36]

shift_df = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/shift.csv')

shift = shift_df.to_numpy()

shift.shape

(13107, 1)

代码

文本

[37]

#plot IR spectra

fig2 = plt.figure(figsize=(12,6))

#shift=np.linspace(0,10,13107)

#plot NMR spectra for NMR sample 1-3

plt.plot(shift, mix_NMR[:,0],label='sample1')

plt.plot(shift, mix_NMR[:,1],label='sample2')

plt.plot(shift, mix_NMR[:,2],label='sample3')

#ax3.invert_xaxis()

plt.title('Three Mixed NMR Spectra')

plt.xlabel('chemical shift (ppm)')

plt.ylabel('absorbance')

plt.gca().invert_xaxis()

plt.legend()

plt.show()

代码

文本

[38]

from sklearn.decomposition import FastICA

ica = FastICA(n_components=3, random_state=15)

nmr_fit = ica.fit_transform(mix_NMR)

nmr_fit

array([[ 0.00066023, -0.00086398,  0.00061835],
       [ 0.0006635 , -0.00086419,  0.00062081],
       [ 0.00065969, -0.00086678,  0.00061703],
       ...,
       [ 0.00066079, -0.00085538,  0.0006137 ],
       [ 0.00066138, -0.0008554 ,  0.00061573],
       [ 0.00065954, -0.00085494,  0.00061637]])

代码

文本

[42]

fig3, ax3 = plt.subplots(3, 1, figsize=(9, 8))

# Define a function to add common properties to subplots

def plot_component(ax, x, y, title):

ax.plot(x, y)

ax.set_title(title)

ax.set_xlabel('chemical shift (ppm)') # Set x-axis label to "Wavenumbers"

ax.invert_xaxis() # Invert the x-axis

# Plot NMR for each component separated by the ICA algorithm

plot_component(ax3[0], shift, nmr_fit[:, 0], 'Component 1')

plot_component(ax3[1], shift, nmr_fit[:, 1], 'Component 2')

plot_component(ax3[2], shift, nmr_fit[:, 2], 'Component 3')

# Adjust layout

plt.tight_layout()

# Display the plot

plt.show()

代码

文本

[44]

#read NMR spectra data from csv file and save in a dataframe

pure_NMR_df = pd.read_csv('/bohr/2024-ml-intro-0hof/v2/data/NMR_pure_problem.csv')

#convert dataframe to numpy array

pure_NMR = pure_NMR_df.to_numpy()

# numpy array needs to be used for plotting lines

pure_NMR_df

	pure1	pure2	pure3
0	-0.000897	-0.000817	-0.00215
1	-0.000883	-0.000815	-0.00211
2	-0.000904	-0.000809	-0.00216
3	-0.000900	-0.000803	-0.00213
4	-0.000899	-0.000818	-0.00218
...	...	...	...
13102	-0.000956	-0.000842	-0.00209
13103	-0.000922	-0.000841	-0.00210
13104	-0.000923	-0.000844	-0.00215
13105	-0.000911	-0.000844	-0.00214
13106	-0.000908	-0.000846	-0.00217

13107 rows × 3 columns

代码

文本

[47]

import matplotlib.pyplot as matplotlib

fig4, ax4 = plt.subplots(3, 1, figsize=(9, 8))

# Plot pure NMR 1-3 by calling plot_components()

plot_component(ax4[0], shift, pure_NMR[:, 0], 'pure 1')

plot_component(ax4[1], shift, pure_NMR[:, 1], 'pure 2')

plot_component(ax4[2], shift, pure_NMR[:, 2], 'pure 3')

# Adjust layout

plt.tight_layout()

# Display the plot

plt.show()

# save figure to local folder

#plt.savefig('./pure_NMR.tif',dpi=1000)

代码

文本

Compare the NMR of extracted components from mixed nmr spectra with the NMR spectra of pure compounds. How does the ICA algorithm do in seperating the NMR signals?

代码

文本

[ ]

代码

文本

pure3--component1 pure1--component3 pure2--component2 pure2拆分有细微差别，其他都挺好的

代码

文本

python

Question Answering

Machine Learning

pythonQuestion AnsweringMachine Learning

点个赞吧