

AI化学上机实操(入门)
Introduction to Scientific Programming and Machine Learning Application in Chemistry
Assignment
Part 1
Plot the speed distribution for ideal gas particles as required by the instructions below. Answer the following questions.
The speed(v) probability distribution P(v) for gas particles follows Maxwell-Boltzmann distribution.
where v is the speed in m/s, M is the gas particle's molar mass in kg/mol and R is ideal gas constant (R=8.314 J/(mol \dot K)) and T is temperature in Kelvin.
(1) Plot P(v) for Ne gas from 0~2000 m/s at room temperature. Set the line color to be "purple". Set the x-axis label to be "speed v (m/s)", the y-axis label to be "P(v)", the graph title to be "Speed Probability Distribution". Set the line legend to be "Ne, T=XXX K". Fill the area under the plot with the same color as the line but with alpha=0.3 transparancy. (Make sure the number of points you generate afford a smooth line.)
Hint: use np.pi and scipy.constants.R to get the scientific constant values.
(2) Plot P(v) for CO2 gas on the same figure. Set the line color to be a different color from Ne. Set the line legend to be "CO2, T=XXX K". Fill the area under the plot with the same color as the line but with alpha=0.3 transparancy.
(3) Calculate the area under each line (the integration for each P(v)) and print them out in the following format:
area under _ gas name _ P(v) at T=XXX K = (_ integral value _)
Are the values of the two gasses equal or not? What are the values? Rationalize the physical meaning of the integration values.
(4) What is the most probable speed for Ne at room temperature based on the speed distribution?
Hint: consider the useful functions: np.argmax(). Note: np.argmax() returns the index of the argument (v in this case), so you need to use v[np.argmax()] to get the argument value corresponding to the maximun function value.(5) Calculate the root mean square speed of Ne at room temperature using the formula:
and print out:
rms speed = value with 1 decimal places
Hint: useful function: np.sqrt()
Without any calculation, predict if CO2's root mean square speed is greater than, smaller than or equal to that of Ne? Why or why not?(6) Challenge Qeustion The root mean sqaure speed formula in (5) is derived as follows:
Compute of Ne at room temperature by carrying out the above integral computation. Compare the values you obtained in this question with that in question (5). What are the similarities and differences?
8.314462618 the integral for Ne is 1.0 the integral for CO2 is 1.0 the vmp of Ne is 498 vrms for Ne_1 is 609.6363498222527 vrms for Ne_2 is [6.09636350e+02 4.66515075e-02]
Part 2
Import the data file ROH_data.csv containing data on simple alcohols and train a random forest algorithm to predict whether or not an alcohol is aliphatic. Remember to split the data set using train_test_split()
and evaluate the quality of the predictions.
Part 3
- At the end of 2.2.1. Dimensional Reduction , you reduced the wine data into two dimensions.
- Plot the scattered plot of the wine data in the two principle components but do not label the data points by the type (wine_y). -
- Use DBSCAN algorithm to study the 2-dimensional wine data and label the types. Show the DBSCAN types with color map. (Hint: adjust the eps around 0.55.)
/tmp/ipykernel_99/228043796.py:34: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored plt.scatter(trans_data[:,0], trans_data[:,1], cmap="Accent")
- Identify what molecule each component in 2.2.3 Blind Signal Separation is . The mixtures are composed of acetone, cyclohexane, methanol and toluene in random ratios.
- Does the extracted single-component's spectrum match the corresponding pure chemical spectrum excactly? Explain.
component1:methanol component2:toluene component3:cyclohexane component4:acetone 应为acetone图谱的拆分与标准谱有些许区别,因为拆分时假定无相互作用,但实际可能有氢键
- Open the file titled NMR_mixed_problem.csv which contains three H NMR spectra. Each spectrum (columns) is a mixture of three chemical compounds in different ratios (artificially generated). Use fastICA (set random_state=15) to separate out three pure H NMR spectra of each component. Compare your separated spectra to the pure NMR spectra in NMR_pure_problem.csv.
sample1 | sample2 | sample3 | |
---|---|---|---|
0 | 0.001858 | 0.000819 | 0.001190 |
1 | 0.001827 | 0.000809 | 0.001177 |
2 | 0.001863 | 0.000820 | 0.001188 |
3 | 0.001842 | 0.000814 | 0.001178 |
4 | 0.001874 | 0.000823 | 0.001196 |
... | ... | ... | ... |
13102 | 0.001835 | 0.000844 | 0.001210 |
13103 | 0.001832 | 0.000831 | 0.001202 |
13104 | 0.001867 | 0.000837 | 0.001213 |
13105 | 0.001859 | 0.000832 | 0.001208 |
13106 | 0.001873 | 0.000833 | 0.001213 |
13107 rows × 3 columns
(13107, 3)
(13107, 1)
array([[ 0.00066023, -0.00086398, 0.00061835], [ 0.0006635 , -0.00086419, 0.00062081], [ 0.00065969, -0.00086678, 0.00061703], ..., [ 0.00066079, -0.00085538, 0.0006137 ], [ 0.00066138, -0.0008554 , 0.00061573], [ 0.00065954, -0.00085494, 0.00061637]])
pure1 | pure2 | pure3 | |
---|---|---|---|
0 | -0.000897 | -0.000817 | -0.00215 |
1 | -0.000883 | -0.000815 | -0.00211 |
2 | -0.000904 | -0.000809 | -0.00216 |
3 | -0.000900 | -0.000803 | -0.00213 |
4 | -0.000899 | -0.000818 | -0.00218 |
... | ... | ... | ... |
13102 | -0.000956 | -0.000842 | -0.00209 |
13103 | -0.000922 | -0.000841 | -0.00210 |
13104 | -0.000923 | -0.000844 | -0.00215 |
13105 | -0.000911 | -0.000844 | -0.00214 |
13106 | -0.000908 | -0.000846 | -0.00217 |
13107 rows × 3 columns
Compare the NMR of extracted components from mixed nmr spectra with the NMR spectra of pure compounds. How does the ICA algorithm do in seperating the NMR signals?
pure3--component1 pure1--component3 pure2--component2 pure2拆分有细微差别,其他都挺好的



