Modeller
Installation
Download the software from the Modeller
site.
Follow the installation instruction, here.
Instructions (simple homology modeling)
Basic modelling (here)
Search the target sequence
Put the target sequence in PIR format.
>P1;TvLDH
sequence:TvLDH:::::::0.00: 0.00
MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKA
AFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPEN
FSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKI
GHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVV
EGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG*
- The first line contains the sequence code, in the format ">P1;code"
- The second line with ten fields separated by colons generally
contains information about the structure file, if applicable. Only two
of these fields are used for sequences, "sequence" (indicating that the
file contains a sequence without known structure) and "TvLDH" (the model
file name).
- The rest of the file contains the sequence of TvLDH, with "*"
marking its end.
- The standard one-letter amino acid codes are used. (Note that they
must be upper case; some lower case letters are used for
non-standard residues. See the file modlib/restyp.lib in the Modeller
distribution for more information.
Run Modeller
It is suggested to run Modeller scripts using a Python installation,
otherwise the scripts can be launched using mod9.24 where 9.24 is the
version installed on the computer but the version can be change, thus it
is mandatory to check the version.
To run modeller, the cmd windows must be launched from the Modeller App
in order to set the environment.
In the search windows type Modeller and then click on Modeller App.
In the open command windows change the working directory using:
cd c:\WORKING DIRECTORY
The working directory can be found with the File Explorer, it can be read
below the command section.
Search for potential related sequences of known structure
The search can be performed by the profile.build() command of MODELLER
from modeller import *
log.verbose()
env = environ()
#-- Prepare the input files
#-- Read in the sequence database
sdb = sequence_db(env)
sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR',
chains_list='ALL',
minmax_db_seq_len=(30, 4000), clean_sequences=True)
#-- Write the sequence database in binary form
sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')
#-- Now, read in the binary database
sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')
#-- Read in the target sequence/alignment
aln = alignment(env)
aln.append(file='TvLDH.ali', alignment_format='PIR', align_codes='ALL')
#-- Convert the input sequence/alignment into
# profile format
prf = aln.to_profile()
#-- Scan sequence database to pick up homologous sequences
prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',
gap_penalties_1d=(-500, -50), n_prof_iterations=1,
check_profile=False, max_aln_evalue=0.01)
#-- Write out the profile in text format
prf.write(file='build_profile.prf', profile_format='TEXT')
#-- Convert the profile back to alignment format
aln = prf.to_alignment()
#-- Write out the alignment file
aln.write(file='build_profile.ali', alignment_format='PIR')
The profile.build() command has many options
- In this example rr_file is set to use the BLOSUM62 similarity matrix
(file "blosum62.sim.mat" provided in the MODELLER
distribution)
- Accordingly, the parameters matrix_offset and gap_penalties_1d are set
to the appropriate values for the BLOSUM62 matrix
- For this example, we will run only one search iteration by setting the
parameter n_prof_iterations equal to 1
- Thus, there is no need for checking the profile for deviation
(check_profile set to False)
- Finally, the parameter max_aln_evalue is set to 0.01, indicating that
only sequences with e-values smaller than or equal to 0.01 will be
included in the final profile
- The sequences can be found in the file "pdb_95.pir", that must be
provided in the directory where the search is launched
Run Modeller:
mod9.24 01_build_profile.py
Warning: the scripts can be launched using mod9.24 where 9.24 is
the version installed on the computer but the version can be change, thus
it is mandatory to check the version.
Output:
- log file, with all information about the run
- prf file, an extract (omitting the aligned sequences).
Most important columns in the table of the results:
- The second column reports the code of the PDB sequence;
- The eleventh column reports the percentage sequence identities
- In general, a sequence identity value above approximately 25%
indicates a potential template unless the alignment is short (i.e.,
less than 100 residues)
- The twelfth column reports the e-value of the alignment
- e-values equal to 0 means very significant similarities
Alignment of template structures
To select the most appropriate template for our query sequence over the
similar structures, compare.py is used to assess the structural and
sequence similarity between the possible templates.
Here there is the script when more than one pdb structure was found.
from modeller import *
env = environ()
aln = alignment(env)
for (pdb, chain) in (('1TXU', 'A'), ('4q9u', 'A'), ('4n3z', 'A'),
('2ot3', 'A')):
m = model(env, file=pdb, model_segment=('FIRST:'+chain,
'LAST:'+chain))
aln.append_model(m, atom_files=pdb,
align_codes=pdb+chain)
aln.malign()
aln.malign3d()
aln.compare_structures()
aln.id_table(matrix_file='family.mat')
env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)
The script has to be modified according to the system that we are
studying.
- Change the pdb codes in according to own search, here 1TXU ('1TXU',
'A')
- Check the correct chain in the PDB file, here A ('1TXU', 'A')
- Check the parenthesis in the script, according to Python syntax
- Put the PDB files in the working directory of the script
When there is only one PDB structure, the script has to be modified
according to Python syntax.
from modeller import *
env = environ()
aln = alignment(env)
for (pdb, chain) in (('1txu', 'A'), ):
m = model(env, file=pdb, model_segment=('FIRST:'+chain,
'LAST:'+chain))
aln.append_model(m, atom_files=pdb,
align_codes=pdb+chain)
aln.malign()
aln.malign3d()
aln.compare_structures()
aln.id_table(matrix_file='family.mat')
env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)
Warning: attention to the comma in for instruction becuse it is
mandatory.
Finally, the script can be launched.
mod9.24 02_compare.py
- At the end of the loop, all of the structures are in the alignment,
but they are not ideally aligned to each other (append_model creates a
simple 1:1 alignment with no gaps)
- Therefore, we improve this alignment by using malign to
calculate a multiple sequence alignment
- The malign3d command then performs an iterative least-squares
superposition of the six 3D structures, using the multiple sequence
alignment as its starting point
- The compare_structures command compares the structures according to
the alignment constructed by malign3d
- It does not make an alignment, but it calculates the RMS and DRMS
deviations between atomic positions and distances, differences between
the mainchain and sidechain dihedral angles, percentage sequence
identities, and several other measures
- Finally, the id_table command writes a file with pairwise sequence
distances that can be used directly as the input to the dendrogram
command (or the clustering programs in the PHYLIP package)
- Dendrogram calculates a clustering tree from the input matrix of
pairwise distances, which helps visualizing differences among the
template candidates