3.2. Four steps of the mcce program
There are 4 major steps in a mcce calculation. These 4 steps are connected by a few files. The program is designed to run through without stopping although you can stop the program at each step and edit the files to instruct the next step. Here is the summary of the fuction of these 4 steps.
3.2.1. Program start, initialization
- run.prm: mcce control file
- param (specified in "run.prm"): paramter directory
- extra.tpl (optional, specified in "run.prm"): extra parameters
- new.tpl (optional, created by step 1): temporary parameter file for unrecognized cofactor
Output files: None
The initialization reads in control file "run.prm" and loads parameters. It is done everytime mcce is invoked regardless of the steps the program will do.
The control file "run.prm" must be in the working directory where mcce program is called. The lines of this file are interpreted by the mcce program as key-value pairs. The string in parentheses is the key and the first string of the same line is the value. For example the line
can be viewed as $(MCCE_HOME) = "/usr/local/mcce". Other lines are interpreted in the same way. The resulting key-value pairs define the mcce environment variables, input and output file names, and flags etc.
The mcce program will load parameter files from the parameter directory defined in "run.prm". The way of defining this directory is composed by two variables: $(MCCE_HOME) + "/param" + $(EPSILON_PROT). Thus for these lines in the run.prm file:
8.0 Protein dielectric constant for DelPhi (EPSILON_PROT)
the path name is "/usr/local/mcce/param08", in which the dielectric constant is converted to a 2-digit integer. In this directory, mcce program will read in files with extension ".tpl". Files whose names do not end up with ".tpl" will be ignored. This makes it possible to store backup files in the same directory.
The file "extra.tpl" is defined by the variable $(EXTRA) in "run.prm". This file is optional. If the file exists, mcce reads it in with the same way as the files in the parameter directory. This file has the same format as a regular parameter file. The function of this file is
- to provide the means to overwrite specific entries in the parameter files of the global parameter directory;
- to add extra parameters specific to individual proteins such as scaling factors of energy terms;
- to serve as test parameter file for a new cofactor;
Unlike the parameter directory, this file is intended be customized for individual proteins.
The file "new.tpl" is a parameter file similar to those in the parameter directory except no charges are assigned to any atom. This file is for unrecognized residues or cofactors. In step 1, mcce program treats these cofactors as non-charged atom assemblies and writes out this file for two purposes:
- This file can be a template of writing the formal parameter file for the unrecognized cofactors;
- The mcce program will use this temporary parameter file to load unrecognized cofactors when mcce resumes from step 2, 3 and 4.
There is no output file from the program initialization, but the initilization creates a parameter database to hold information from the parameter files and dynamically generated parameters by the program. A temporary image file of the database is found in the working directory as "~temp.dbm". While mcce is running, you can not delete this file or start another mcce job in the same directory.
3.2.2. Step 1, Formatting pdb file
- pdb file (specified by "run.prm"): input structure file in pdb format
- name.txt (optional, specified by "run.prm") : residue and atom renaming rule
- list_rot.gold (optional): hot residue spot definition
- acc.res: solvent accessibility of residues
- acc.atm: solvent accessibility of atoms
- new.tpl (not always created): parameter file template of unrecognized cofactors
- head1.lst (optionally used by step 2): summary of rotamer making policy of residues
- step1_out.pdb (used by step 2): step 1 output file is in mcce extended pdb format (note: this file has lots of extra characters but can be read as is by many visualization programs)
Step 1 prepares an extended pdb file suitable to be read into step 2. The input pdb file is in standard pdb format. It can have alternative side chain positions, but mcce can not process alternative backbone positions. Alternative side chains are treated as side chain conformers. When side chain atoms are missing, mcce will complete the side chain atoms at the torsion minimum. In this step several things will happen:
With the instructions in the renaming rule file "name.txt", residues and atoms will be renamed so that a cofactor can be split into several independent ionizable groups (for example, heme can be divided into heme and two propionates) and several groups can be combined as one (for example, heme can be grouped with the axial ligands).
Step 1 will identify unrecognized cofactors and interpret them as non-charged atom assemblies.
This step completes the missing atoms in each known residue (note: If your file has individual residues with missing atoms the tpl file for that residue will be used to add missing atoms).
This step calculates the solvent accessible surface (SAS) area and strips off exposed water and salt (HOH, NO3 and SO4). The SAS threshold of stripping off water and salt is defined by $(H2O_SASCUTOFF).
From the hot residue spot file "list_rot.gold" and the rotamer making rules defined in "run.prm", the residue specific rotamer making policy is composed and written to file "head1.lst", which can be used by the step 2.
This step identifies geometry clashes between atoms which are not supposed to be bonded.
This step extracts N terminus and C terminus of a chain.
The renaming rule file "name.txt" instructs the mcce program to rename atom name, residue name, sequce number, and chain ID. Here is several sample lines in this file:
# Symbol "*" in the first string is a wildcard that matchs any character.
# It means "do not replace" in the second string.
# The replace is accumulative in the order of appearing in this file.#
*CAA*HEM****** *****PAA****** extract PAA from heme
# ATOM 70 CB ASP A 12
The line started with "#" and the line shorter than 30 characters are comment lines. For other lines, the first 30 characters should be two 14-character strings separated by exactly two spaces, and the rest of the line is comment field. A valid line instructs mcce program to replace string 1 by string 2. The mcce program will match this string with position 13 to 26 of a coordinate line in the input pdb file. The symbol "*" is the wild card that matches any character in strings. The replace action is accumulative and order sensitive. For example, The line
HETATM 1683 CAA HEC 1 1.317 -3.987 -1.685 1.00 0.00 C
will be renamed to
HETATM 1683 CAA HEM 1 1.317 -3.987 -1.685 1.00 0.00 C
HETATM 1683 CAA PAA 1 1.317 -3.987 -1.685 1.00 0.00 C
Another file step 1 may use is "list_rot.gold". This file defines "hot spots" in a protein where rotamers will be made with a step of 30 degrees in step 2. If a coordinate line of any atom of a residue is present in this file, this residue and residues within 4 angstroms will be flagged to be "hot spots" in file "head1.lst", which will be used by step 2.
The output files of step 1 "acc.res" and "acc.atm" contain the solvent accessible surfaces of residues and atoms. In "acc.res", both absolute value and percentage of the solvent accessibility are listed.
When unrecognized cofactor is encountered in step 1, a parameter file will be created with name as "new.tpl" and a warning message will be issued. The atom connectivity is guessed and all atoms are assumed to have charge 0. This file can be the starting point of making a parameter for a new cofactor. If the program is resumed from step 2 or 3, this file will be read in by step 0 as a parameter file so step 2 and 3 can proceed without step 1.
The file "head1.lst" lists the rotamer making policy of the residues. When $(ROT_SPECIF) in "run.prm" is set to "t", this file will be used as instruction of rotamer making of step 2, otherwise, file "head1.lst" will be ignored. This file can be modified and will be effective if the program resumes from step 2.
The file "step1_out.pdb" is a formatted pdb file, which will be read in by step 2. The mcce extended pdb format contains 3 more fields than standard pdb file. Right after atom coordinates, these three fields are charge, size and rotamer making history.
3.2.3. Step 2, Making rotamers
- step1_out.pdb: input structure file of step 2 in mcce extended pdb format
- head1.lst (optional) : rotamer making policy of residues
- progress.log: progress report file, dynamically updated
- rot_stat: rotamer making statistics, dynamically updated
- hvrot.pdb: heavy atom (without hydrogen atoms) rotamer pdb file
- head2.lst (optionally used by step 3): summary of rotamers made in step 2
- step2_out.pdb (used by step 3): step 2 output file with multiple rotamers in extended pdb format
Step 2 makes and optimizes rotamers from the structure in "step1_out.pdb". In this step, the rotatable bonds (defined in parameter files) of each residue are rotated by the steps defined in "run.prm". Then the self van der Waals (VDW) potential (interaction among atoms of the same side chain excluding 1-2 and 1-3 interactions, and interaction between the side chain and backbone atoms) is calculated. Side chain rotamers with high self VDW potentials are deleted. Then the side chains are optimized with possible hydrogen bond partners. A number of repackings starting from randomized initial structures (one conformer from one residue) are performed to reduce side chain rotamers to those with low energy local packings. Ionization states are then created and protons are placed on side chains. At the end of side chain rotamer optimization, MD simulations are carried out locally to relax the structure.
The input file step1_out.pdb is the only essential file step 2 will use. You can modify this file to add, delete or edit residues without causing problems as long as the file observes mcce extended pdb format. Sometimes a little editing of this file is necessary, for example, the terminal residues are not always correctly identified by the mcce program due to "broken chains" caused by the missing residues in the pdb file.
The file "head1.lst" provides residue specific rotamer making rule. It will be used only when $(ROT_SPECIF) is set to be "t". The line of this file such as:
NTR A0003_ R t 06 S f 0.0 H t 06 M 000
is interpreted as "Rotate is true and 6 steps per bond, then swing is false (angle is 0 if any), then Hydroxyl relaxation is true and the maximum number of starting conformers per hydroxyl is 6, and the maximum total number of the conformers is not limited". If you want to investigate a specific site in details, change the step 6 to 12. But making 12 steps for many sites (>30) are not recommended because it is expensive in terms of memory and CPU time.
The file "progress.log" is a dynamically updated progress report file. It lists mainly the repacking progress.
The file "rot_stat" is an important file to review the rotamer making history. This file lists the number of rotamers of each residue. It is easy to tell what residues get rotamers and if the total number of rotamers is manageable (a 2000 conformer structure may need one day to run the next two steps, step 3 and 4).
The file hvrot.pdb is a file at the same format as step1_out.pdb and step2_out.pdb but lacks hydrogen atoms. The main use of this file is to rename it to "step1_out.pdb" and run step 2 again with "swing" rotamers instead of "rotate" rotamers. This provides a way to relax the structure by "swinging" the rotatable bond a little and reevaluate the rotamers. This feature is most for advanced users to calibrate hydrogen bond directed rotamer making algorithm and MD relaxation subroutine.
The file "head2.lst" is a summary of the rotamers made in step 2. This file is not used by step 3.
The file "step2_out.pdb" is in the mcce extended pdb format. It is the single file connecting step 2 and step 3.
3.2.4. Step 3, Calculating the energy lookup table
- step2_out.pdb: input structure file of step 3 in mcce extended pdb format
- progress.log: progress report file, dynamically updated
- energies (directory of energy lookup table used by step 4): opp files, each opp file is pairwise interaction to a conformer
- head3.lst (used by step 4): list of conformers and self energy of conformers. It is used by step 4.
- step3_out.pdb: step 3 output file with multiple rotamers in extended pdb format
Step 3 calls PB equation solver, DelPhi, to calculate reaction field energy and electrostatic pairwise interaction. The result is stored as together with van der Waals interactions as one file per conformer. These files have extension "opp" and are located under directory energies. The self-energy terms (not dependent on side chains of other residues) of conformers are listed in file "head3.lst" The progress is dynamically updated is file "progress.log".
The file "progress.log" reports the progress of DelPhi calculation, which can be used to estimate the total time of this step. There will be two parts of DelPhi calculations: the first is pairwise calculation and the second part is the reaction field energy calculation. It takes less time on reaction field energy calculation than on pairwise calculation.
The directory "energies" holds the resulted pairwise interaction lookup table. The first column is the electrostatic pairwise interaction and the second column is van der Waals interaction. The unit of the energy is Kcal/mol.
The file "head3.lst" contains self energy of each conformer and control flags of step 4. The flag is: "t" for fixed occupancy 0 or 1, or "f" for free to sample. The energy unit is Kcal/mol.
The file "step3_out.pdb" is an extended pdb file with multiple conformers. The conformer number is sorted to be continuous and consistent with the conformer numbers in file "head3.lst" and step 4 output file fort.38. This file is identical to "step2_out.pdb" if "step2_out.pdb" is an unmodified file created by step 2.
3.2.5 Step 4, Simulating pH or Eh titration with Monte Carlo sampling
- energies (directory): energy lookup table for pairwise interaction between conformers
- head3.lst: self-energy of conformers and Monte Carlo sampling flags of conformers
- mc_out: progress of Monte Carlo sampling and energy tracing
- fort.38: conformer occupancies
- sum_crg.out: residue net charges
- pK.out: pKa or Em values obtained by titration curve fitting
Step 4 is a titration simulation by Monte Carlo sampling. The Monte Carlo sampling is performed at specified set of pH/Eh. At each titration point, there will be several (predefined in "run.prm", the default is 6) independent samplings. Each sampling goes through annealing, reducing, and equilibration stages. Statistics of conformer occupancy is only done at equilibration statge. Yifan's Monte Carlo subroutine will check early convergence and quit sampling early to save time. The result is reported as conformer occupancy in file "fort.38", residue net charge in file "sum_crg.out" and fitted pKa/Em values in file "pK.out".
The file "mc_out" is the progress report of Monte Carlo sampling. It contains running energy tracing which can be used to calculate the average E or enthopy of the system, or verify if the system is trapped at local energy minima. By "grep Sg mc_out", you can find the standard deviation of independent samplings.
<Prev | Contents | Next >