MCCCS Towhee: Configurational-bias Monte Carlo

MCCCS Towhee: Configurational-bias Monte Carlo

Overview

This section gives a basic overview of the Configurational-bias Monte Carlo (CBMC) algorithm that is implemented into Towhee and it was last updated for Towhee version 6.1.1. CBMC algorithm development remains a major research activity, and a particular favorite of the lead Towhee developer, and the goal of this essay is to explain all of the algorithmic details that often get overlooked in the many publications describing CBMC algorithm development over the course of the last two decades. It is my hope that by presenting a detailed explanation of the many aspects of the CBMC algorithm here, the next generation of algorithm developers will be well positioned to increase the power of this method even further. The general CBMC algorithm

Equation 1

_max

Equation 1

Equation 2

Siepmann 1990

Siepmann and Frenkel 1992

Frenkel et al. 1992

Laso et al. 1992

Siepmann and McDonald 1992

Vlugt et al. 1999

Martin and Siepmann 1999

Martin and Thompson 2004

Vlugt et al. 1998

Wick and Siepmann 2000

Martin and Thompson 2004

Martin and Frischknecht 2006

Equation 3

Martin and Frischknecht 2006

Equation 4

Equation 4

_bias

_exist

Equation 5

Equation 5

Equation 6

Equation 6

_step

Equation 7

CBMC Trial Generation Form

max_bond_length

Martin and Siepmann 1999

Martin and Siepmann JPCB 1999: uses the original coupled-decoupled formulation presented in the appendix of Martin and Siepmann 1999, with the extension for flexible bond lengths briefly mentioned in Martin and Thompson 2004. The steps consist of decoupled Bond length selection, decoupled bending A selection, decoupled bending B selection, and dihedral angle selection coupled to nonbonded selection. The algorithm is decoupled to the maximum extent possible (it is not possible to decoupled dihedral selection from nonbonded selection) as generating reasonable bond lengths, angles, and dihedral angles required a relatively large number of trial as this time as the use of arbitrary trial distributions for those terms had not yet come into fashion. There is some number of growth steps (n_step) and in each step some number of atoms are generated (n_g), where all of these generated atoms are bonded to the same atom (f). The generated atoms are labelled g_a, where the index a ranges from 1 to n_g. The first step is to select all of the g_a-f bond lengths in a decoupled manner.

Equation MS.1

Equation MS.2

Where l_{g_a,f} is the bond length between the generated atom (g_a) and the "growing from" atom (f). For each generated atom, nch_vib trial bond lengths are considered and then one bond length is selected according to the probability in Equation MS.1. The trial distributions are discussed further in the Arbitrary Trial Distribution section. Once bond lengths are selected for all of the generated atoms (g_a) the next step is the Part A portion of the decoupled bending angle selection.

Equation MS.3

Equation MS.4

The angle θ_{g_a,f,p} is the angle formed by the atom being grown (g_a), the atom being grown from (f), and an atom that is previous to the atom being grown from (p). There are two cases to consider when determining the "previous" atom. Case 1 occurs when there already exists an atom bonded to the from atom (f) that was either grown in an earlier step of this CBMC growth process, or was not scheduled for regrowth during this CBMC move. In this case the "p" atom is selected randomly from all possible atoms that exist bonded to the "f" atom. Case 2 occurs when there are no existing atoms bonded to the "f" atom. In this case one of the atoms grown during this step is selected at random to be treated as the "p" atom. Once all of the θ_{g_a,f,p} angles are determined, then the pseudo-dihedral angles between the atoms being grown are sampled.

Equation MS.5

Equation MS.6

Where φ_{g_b,f,p,g_a} is the pseudo-dihedral angle between the planes formed by the g_a-f-p and g_b-f-p atom triplets. The sampling is performed based upon that pseudo-dihedral angle, and then some elementary geometery is used to convert the φ_{g_b,f,p,g_a} values into the θ_{g_a,f,g_b} angles required to compute the bending angle energies. One might assume the next step would be selection of the dihedral angle, but once that is known it also implies the positions for the nonbonded selection step. A single trial of the nonbonded selection is computationally quite expensive compared to a single trial of the dihedral selection so a selection step for dihedral angles is coupled to the nonbonded selection.

Equation MS.7

Equation MS.8

Equation MS.9

Equation MS.10

The above equations are realized by performing a full dihedral selection for each trial of the nonbonded selection. The dihedral angle selection obviously includes the dihedral energy for the angle being sampled, but also often includes bond, angle, improper and additional dihedral angles in the case where we are connecting back to some other atoms in the moleule. This occurs when closing a ring, or when performing interior conformation sampling moves in a molecule. The selection is performed on a single dihedral angle with one of the atoms grown this step (g₁) with the from atom (f), previous atom (p) and a randomly selected atom bonded to p that is not atom f (q). That dihedral angle implies a large set of dihedral angles that involve the atom being grown and any existing atoms (c, d, or e) that form a dihedral angle that is now completely described by the addition of the atoms grown this step. It also could imply bond terms that do not involve the "from" atom, angle terms not centered on the "from" atom, and improper torsions not centered on the "from" atom. The nonbonded selection involves the Rosenbluth weight of the torsion selection and the sum of the nonbonded terms according to the partial nonbonded terms (u_nb^part). The partial nonbonded terms are often set to be shorter ranged than the total nonbonded terms as a time saving move first described as dual-cutoff in Vlugt et al. 1998. This also often occurs for charged systems as the long-ranged portion of the Ewald sum is only computed once the entire molecule has been grown, as it is both expensive and a bit ill-defined to compute during the growth process. The final acceptance probability for the move contains is a more detailed version of the general Equation 6. The correction for using a partial nonbonded potential is equivalent to removing the bias implied during the nonbonded selection. That implicit bias is related to the Boltzmann weighted difference between the true nonbonded potential and the partial nonbonded potential.

Equation MS.11

Equation MS.12
Coupled to pre-nonbond: uses the formulation first described in Martin and Frischknecht 2006, and was enabled by advances in arbitrary trial distribution functions. The bond length selection, bending angle A selection, bending angle B selection, and dihedral angle selection are all decoupled from each other, but coupled in turn to a new selection step that occurs immediately prior to the nonbonded selection. The idea is to better sample the joint distribution of bond lengths, bending angles, dihedral angles, and improper torsion to improve the acceptance rate in cases where these energies are not independent, and often in competition with each other. There is some number of growth steps (n_step) and in each step some number of atoms are generated (n_g), where all of these generated atoms are bonded to the same atom (f). The generated atoms are labelled g_a, where the index a ranges from 1 to n_g. For each index of the pre-nonbond selection, the bond length, angles, and dihedrals are sequentially generated in a decoupled manner.

Equation CPN.1

Equation CPN.2

Where l_{g_a,f} is the bond length between the generated atom (g_a) and the "growing from" atom (f). For each generated atom, nch_vib trial bond lengths are considered and then one bond length is selected according to the probability in Equation CPN.1. The trial distributions are discussed further in the Arbitrary Trial Distribution section. Once bond lengths are selected for all of the generated atoms (g_a) the next step is the Part A portion of the decoupled bending angle selection.

Equation CPN.3

Equation CPN.4

The angle θ_{g_a,f,p} is the angle formed by the atom being grown (g_a), the atom being grown from (f), and an atom that is previous to the atom being grown from (p). There are two cases to consider when determining the "previous" atom. Case 1 occurs when there already exists an atom bonded to the from atom (f) that was either grown in an earlier step of this CBMC growth process, or was not scheduled for regrowth during this CBMC move. In this case the "p" atom is selected randomly from all possible atoms that exist bonded to the "f" atom. Case 2 occurs when there are no existing atoms bonded to the "f" atom. In this case one of the atoms grown during this step is selected at random to be treated as the "p" atom. Once all of the θ_{g_a,f,p} angles are determined, then the pseudo-dihedral angles between the atoms being grown are sampled.

Equation CPN.5

Equation CPN.6

Where φ_{g_b,f,p,g_a} is the pseudo-dihedral angle between the planes formed by the g_a-f-p and g_b-f-p atom triplets. The sampling is performed based upon that pseudo-dihedral angle, and then some elementary geometery is used to convert the φ_{g_b,f,p,g_a} values into the θ_{g_a,f,g_b} angles required to compute the bending angle energies. The next step is a decoupled selection for dihedral angles.

Equation CPN.7

Equation CPN.8

Equation CPN.9

The above equations are realized by performing a full dihedral selection for each trial of the nonbonded selection. The dihedral angle selection obviously includes the dihedral energy for the angle being sampled, but also often includes bond, angle, improper and additional dihedral angles in the case where we are connecting back to some other atoms in the moleule. This occurs when closing a ring, or when performing interior conformation sampling moves in a molecule. The selection is performed on a single dihedral angle with one of the atoms grown this step (g₁) with the from atom (f), previous atom (p) and a randomly selected atom bonded to p that is not atom f (q). That dihedral angle implies a large set of dihedral angles that involve the atom being grown and any existing atoms (c, d, or e) that form a dihedral angle that is now completely described by the addition of the atoms grown this step. It also could imply bond terms that do not involve the "from" atom, angle terms not centered on the "from" atom, and improper torsions not centered on the "from" atom. The bond, bending angle, and dihedral selections are all coupled to the pre-nonbond selection. The pre-nonbond selection is in turn coupled to the nonbonded selection.

Equation CPN.10

Equation CPN.11

Equation CPN.12

Equation CPN.13

The nonbonded selection involves the Rosenbluth weight of the pre-nonbond selection and the sum of the nonbonded terms according to the partial nonbonded terms (u_nb^part). The partial nonbonded terms are often set to be shorter ranged than the total nonbonded terms as a time saving move first described as dual-cutoff in Vlugt et al. 1998. This also often occurs for charged systems as the long-ranged portion of the Ewald sum is only computed once the entire molecule has been grown, as it is both expensive and a bit ill-defined to compute during the growth process. The final acceptance probability for the move contains is a more detailed version of the general Equation 6. The correction for using a partial nonbonded potential is equivalent to removing the bias implied during the nonbonded selection. That implicit bias is related to the Boltzmann weighted difference between the true nonbonded potential and the partial nonbonded potential.

Equation CPN.14

Equation CPN.15

Arbitrary Trial Distributions

Snurr, Bell, and Theodorou 1993

Martin and Biddy 2005

Martin and Frischknecht 2006

_sample

Equation 1

_sample

Insertion of the first atom

Equation NB_ONE.1

DIST_UNIFORM: generates trials using the uniform probability distribution. This is used when the nch_nb_one_generation option is set to 'uniform', or when computing an insertion into the ideal resiviour box (in the Grand Canonical ensemble).

Equation NB_ONE.2
DIST_ENERGY_BIAS: generates trials using a version of the Snurr et al. 1993 energy biasing to preferentially place the first atom in the cavities of a fixed geometric structure, such as a zeolite. This is used when the nch_nb_one_generation option is set to 'energy bias'

Equation NB_ONE.3

Where the box volume is divided into subcells of volume V_cell. Trial positions are selected by first choosing a cell based upon the weighting factor X_cell, normalized by the total sum of all of the cell weights. Once a cell is selected, then a position is chosen uniformly within that cell.

Generation of bond lengths

max_bond_length

_max

max_bond_length

Equation BOND.1

max_bond_length

Equation BOND.2

_max

max_bond_length

Equation BOND.3

DIST_DELTA: generates trials whenever the bond potential has a finite number of allowed bond lengths (n_finite). The most common case is a single rigid bond length, but this would also work (in theory) for a potential that allowed multiple rigid bond lengths.

Equation BOND.4
DIST_R_SQ_IDEAL: generates trials using the ideal probability density, bounded only by the max_bond_length (r_max). This is used for non-rigid bond potentials when the cmbc_bond_generation option is set to 'ideal'. Bond lengths are generated on the interval (0, r_max).

Equation BOND.5
DIST_R_SQ_WITH_BOUNDS: generates trials using a bounded version of the continuous ideal probability density. This is used for non-rigid bond potentials when the cmbc_bond_generation option is set to 'r^2 with bounds'. Bond lengths are generated on the interval (r_low, r_high), where r_low = vibrang(1)*bond_equil, r_high = vibrang(2)*bond_equil, and bond_equil is the equilibrium bond length. This is also used automatically in combination with the Infinite Square Well bond potential where the upper and lower bounds are set to the parameters in that potential.

Equation BOND.6
DIST_GAUSSIAN: generates trials using a bounded version of the Gaussian probability density. This is used for non-rigid bond potentials when the cmbc_bond_generation option is set to 'global gaussian' or 'autofit gaussian'. Bond lengths are generated on the interval (0,r_max) using a truncated Gaussian distribution. The standard deviation (σ) and mean (μ) are determined in different ways for the 'global gaussian' and 'autofit gaussian' options.

Equation BOND.7

Generation of bending angles

pseudo

Generation of Bending A angles

Equation BEND_A.1

_finite

Equation BEND_A.2

DIST_DELTA: generates trials when there are a finite number of angles (n_finite) that have a nonzero Boltzmann weight. The most common occurance is a single, rigid bending angle, but this is also functional for multiple allowable rigid bending angles.

Equation BEND_A.3
DIST_SINE: generates trials using the ideal Sine distribution. This is used for non-rigid bending angle potentials when the cbmc_bend_generation option is set to 'ideal'. Angles are generated on the interval (0, π).

Equation BEND_A.4
DIST_GAUSSIAN: generates trials using the Gaussian distribution. This is used for non-rigid bending angle potentials when the cbmc_bend_generation option is set to 'global gaussian' or 'autofit gaussian'. When cbmc_bend_generation is set to 'global gaussian' then the mean (μ) set to the equilibrium bending angle, and the standard deviation (σ) is set to sdevbena. When cbmc_bend_generation is set to 'autofit gaussian' then the mean (μ) and standard deviation (σ) are fit to Sin(θ)exp(-β u(θ)) for every angle in each type of molecule in the simulation, and then the standard deviations are scaled by the bend_a_sdev_multiplier. Angles are generated on the interval (0, π).

Equation BEND_A.5
DIST_SINE_GAUSSIAN: generates trials using a linear combination of the Sine distribution and the Gaussian distribution. This is used for non-rigid bending angle potentials when the cbmc_bend_generation option is set to 'ideal + autofit gaussian'. The mean (μ) and standard deviation (σ) are fit to Sin(θ)exp(-β u(θ)) for every angle in each type of molecule in the simulation, and then the standard deviations are scaled by the bend_a_sdev_multiplier. The fraction of ideal moves (f_ideal) is set to bend_a_ideal_fraction. Angles are generated on the interval (0, π).

Equation BEND_A.6
DIST_BOUNDED_SINE: generates trials using a Sine distribution bounded above (by θ_high) and below (by θ_low). This is used automatically in combination with the Infinite Square Well Angle potential and the bounding angles are determined for each generation based upon the 1-3 distance range allowed by the potential, and the current bond lengths. Angles are generated on the interval (θ_low, θ_high).

Equation BEND_A.7

Generation of Bending B angles

pseudo

Equation BEND_B.1

_finite

Equation BEND_B.2

DIST_DELTA: generates trials when the pseudo-dihedral angle implies any angles that have a potential with a finite number of angles (n_finite) that have a nonzero Boltzmann weight (i.e. a rigid angle). There are usually 2 viable values of the pseudo-dihedral angle for each viable implied regular angle (a positive and negative rotation).

Equation BEND_B.3
DIST_UNIFORM: generates trials for non-rigid angles when there are no energy terms to consider for this step, when the cbmc_bend_generation is set to 'ideal', or when the cbmc_bend_generation is set to 'global gaussian' and a hybridization match is not found for the "from" atom (f).

Equation BEND_B.4
DIST_GAUSSIAN: generates trials for non-rigid angles when the cbmc_bend_generation is set to 'global gaussian' and a hybridization match is found for the "from" atom (f), or when the cbmc_bend_generation is set to 'autofit gaussian'. The (-π, π) interval is broken into some number of subregions (n_sub) and each of these subregions is described by a Gaussian distribution. Each subregion (i) is equally likely (uniform on the number of subregions) and is described by an upper limit (hi(φ)), a lower limit (lo(φ)), a mean (μ(φ)) and a standard deviation (σ(φ)).

Equation BEND_B.5
DIST_UNIFORM_GAUSSIAN: generates trials using a linear combination of the uniform and Gaussian distributions for non-rigid angles when the cbmc_bend_generation is set to 'ideal + autofit gaussian'. The uniform (ideal) distribution is used with a probability equation to the bend_b_ideal_fraction (f_ideal). Otherwise the (-π, π) interval is broken into some number of subregions (n_sub) and each of these subregions is described by a Gaussian distribution. In that case, each subregion (i) is equally likely (uniform on the number of subregions) and is described by an upper limit (hi(φ)), a lower limit (lo(φ)), a mean (μ(φ)) and a standard deviation (σ(φ)).

Equation BEND_B.6

Generation of Dihedral Angles

Equation DIHED.1

DIST_DELTA: generates trials when the primary dihedral angle is rigid (or multi-rigid), or when it implies a term that is rigid. In either case, there are a finite number of dihedral angles (n_finite) that have a nonzero Boltzmann weight for the sum of the energies involved.

Equation DIHED.2
DIST_UNIFORM: generates trials for non-rigid angles when there are no dihedral energy terms, when the cbmc_dihedral_generation is set to 'ideal', or when the cbmc_dihedral_generation is set to 'global gaussian' and a hybridization match is not found for the "from" and "prev" atom pair (f-p).

Equation DIHED.3
DIST_GAUSSIAN: generates trials for non-rigid angles when the cbmc_dihedral_generation is set to 'global gaussian' and a hybridization match is found for the "from" and "prev" atom pair (f-p), or when the cbmc_dihedral_generation is set to 'autofit gaussian'. The (-π, π) interval is broken into some number of subregions (n_sub) and each of these subregions is described by a Gaussian distribution. Each subregion (i) is equally likely (uniform on the number of subregions) and is described by an upper limit (hi(φ)), a lower limit (lo(φ)), a mean (μ(φ)) and a standard deviation (σ(φ)).

Equation DIHED.4
DIST_UNIFORM_GAUSSIAN: generates trials using a linear combination of the uniform and Gaussian distributions for non-rigid dihedral angles when the cbmc_dihedral_generation is set to 'ideal + autofit gaussian'. The uniform (ideal) distribution is used with a probability equation to the dihedral_ideal_fraction (f_ideal). Otherwise the (-π, π) interval is broken into some number of subregions (n_sub) and each of these subregions is described by a Gaussian distribution. In that case, each subregion (i) is equally likely (uniform on the number of subregions) and is described by an upper limit (hi(φ)), a lower limit (lo(φ)), a mean (μ(φ)) and a standard deviation (σ(φ)).

Equation DIHED.5