An analysis of published synthetic routes, route targets and reaction types (2000 – 2020)

Gareth Howell,Samuel Genheden
DOI: https://doi.org/10.26434/chemrxiv-2024-tj3p3
2024-09-03
Abstract:Using a large dataset (640k synthetic routes and 2.4m reactions) compiled from six popular journals between 2000 – 2020, trends are identified and discussed for topics including journal publishing rates, availability of machine-readable data, characteristics of synthetic route targets and starting materials (molecular weight, complexity, elemental composition, chirality and ring-systems) and the reaction classes utilised in these synthetic routes. We provide evidence of an ongoing shift away from large natural product or “total” syntheses amongst the academic data and a gradual increase in the size and complexity of industrial/medicinal target molecules. The reaction class analyses show key differences between the academic and industrial sectors and how a small number of reaction types have proliferated in the latter, giving rise to a possible lack of target diversity. Overall, there is evidence to support an ongoing increase in synthetic efficiency whereby, as a community, we are synthesizing larger, more-complex molecules from smaller, simpler starting materials, in fewer steps and with diminished reliance on non-productive reaction types such as protecting group manipulations, redox reactions and functional group interconversions.
Chemistry
What problem does this paper attempt to address?
This paper attempts to address the following issues: 1. **Trend Analysis**: Through large-scale data analysis of synthetic routes and reaction types published between 2000 and 2020, it identifies and discusses the trend changes in the field of synthetic chemistry within academia, the pharmaceutical industry, and the industrial sector. 2. **Data Availability**: It examines the availability and extraction rate of machine-readable data and explores the differences between various journals. 3. **Molecular Characteristics**: It analyzes the changes in molecular weight, complexity, elemental composition (nitrogen, oxygen, sulfur, halogens), chirality, and ring systems of starting materials and target molecules. 4. **Synthetic Efficiency**: It explores the trend changes in the length of synthetic routes, finding that overall synthetic efficiency has improved, meaning larger and more complex molecules are synthesized with fewer steps. 5. **Reaction Types**: Using an automated classification system, it categorizes a large number of reactions, revealing differences in reaction types between academia and industry, and discusses the significant increase of certain reaction types in the industrial sector. Overall, this paper aims to reveal the development trends in the field of organic synthesis through big data analysis, particularly in terms of synthetic efficiency and changes in reaction types.